Comparison of the audience of Habrahabr, Giktayms and Megamind

    Hello, Habr!
    A year ago, I wrote an article about who and how is subscribed to Habrahabr on Vkontakte social network. Literally in the very first comments to that post, a desire was expressed to see the difference between Geektimes subscribers and Habr itself. Only a year has passed and I, overcoming my laziness, fulfill this desire.

    In fact, my slowness also had objective reasons - in January Megamind was launched , and it became obvious that a comparison should be done on all three sites. And for this it was necessary to wait at least six months from the moment of the final division of Habr.

    This article will not have the next statistical calculations about on which day of the week the post on Habré receives the best rating, and on which it collects few comments - everything has been said about this long before me. But under the cut, we will try to understand how the audiences of “habber” publics differ in various parameters (from gender to attitude to bad habits), and is there a connection between user behavior in VK and on the sites themselves.



    Instead of joining


    To begin with, we turn to the subject area. What are the three once-single sites?
    If we recall the explanations of the creators, then briefly and very simplistically , the specifics of each site is as follows:
    • Habrahabr (hereinafter - XX) –for actually IT-employees
    • Gictimes (GT) - for geeks
    • Megamind (MM) - for IT Managers


    How and how are the audiences of these sites different? Perhaps, only TM employees can answer this question in detail . And we will look at how the audiences of the same public in VC differ.

    Briefly about the methodology of data collection.
    With the help of VK Api, data was collected on all subscribers of the public Habrahabr, Geektimes and Megamind. Data was collected at the end of October. At about the same date, using a parser (access to the Api Habr, alas, no), all (well, or almost all) available articles from the same sites were downloaded.

    In some places, I refer to the statistical significance or insignificance of differences. It was tested using the chi-square test. Significance level <0.05 (including for correlation coefficients).

    UPD: In addition, I still repeat here my quote from the previous article:

    “Also, I draw attention to the fact that the sample under study is the audience of the public from the social network“ Vkontakte ”. This means that the user data in it may periodically change, they may be incorrect or inaccurate. Therefore, when I say, “Habr’s readers are 146% of 91-year-old men from the Isle of Man,” this is not the ultimate truth. It’s just that information indicated by users in the profiles. ”And the conclusions made on the basis of the data of Habr’s subscribers in VK, of course, will not necessarily be valid for all the hawkers on the sites themselves.


    Firstly, you need to understand how the audience of the public echoes. For the solemnity of the moment, we present the Venn diagram with the observance of the scale:

    Public Audience Intersection Table
    HabrahabrGictimesMegamind
    Habrahabr517 553--
    Gictimes31 30945 603-
    Megamind11 1627 03413,470

    Total crossing (to users who subscribe to all three at once PUBLIC) - 6481

    We see quite logical picture. Since GT and MM are the "offspring" of Habr himself, so far they cannot compete with him either in terms of the size of the audience as a whole, or even in the relative number of "unique" subscribers.
    By “unique” subscribers here we mean users subscribed only to this public and not to one of the other two. In the figure, they are highlighted in colored areas, while “non-unique” in gray.
    In order to most clearly distinguish the differences in public audiences, we will analyze precisely “unique subscribers”, that is, we discard the gray areas in the figure. An example of why this is necessary is given below.
    So let's get started.

    Floor


    We will not be original and in the first place we will look at the differences by gender:


    Interactive version (where possible, I will provide links to interactive diagrams, because they are more visual and pleasing to the eye).

    Most of the girls in percentage terms among Megamind subscribers are almost a third. Least of all - in Giktayms (among geeks representatives of a "weaker" floor are less often found?), And Habr takes the golden mean. Moreover, these differences are statistically significant.

    Notice how the distribution is different for unique and non-unique users: the majority of GT and MM subscribers are both XX subscribers. Most of the twentieth subscribers are men. Because of this, the distribution of the trait (in this case gender) in other audiences also begins to be distorted. That is why we analyze only unique subscribers.

    In general, we did not see anything unexpected: among the "techies" there are traditionally more men. Megamind is perhaps the least “techie" project of all, which determines a relatively high percentage of girls.
    Decided on the floor, the next in line age.

    Age


    Let's look at the distribution of the relative number of subscribers by year of birth (values ​​until 1975 fluctuate around 0, so this part of the graph will be discarded for clarity):


    Interactive version

    Habr and GT have rather smooth curves. Megamind’s line is “sausage” more than anyone else - probably this is due to the relatively small number of respondents. But even despite this, it is obvious that Habr’s “peak” falls at a more solid age than that of his “subsidiary” sites, albeit only for a couple of years. Probably, such differences are quite logical. Although I personally expected that Megamind would have an older audience. But, as you know, my expectations are my problems.

    At the same time, the differences between XX and HT, XX and MM are statistically significant, but between GT and MM there is no (which, in general, can be seen from the figure). A surge in activity in the 2000-2001 range, observed primarily at Habr, is also curious; I did not find an explanation for it. A strong surge in the number of Vkontakte audience this year of birth is not observed. So let's hope that young people are just growing interest in IT. Or is it somehow connected with the "default" ages when registering in social networks.

    Geography


    This time (unlike the previous study), we restrict ourselves to the countries of the "Big Four" Habré - Russia, Ukraine, Belarus, Kazakhstan. We reject foreign countries, because even if the country in the user’s profile is truthfully indicated (remember yourself that sometimes the Habrachians indicate in the “country” column), the vast majority of users from such countries are emigrants from the post-Soviet space. The countries of the former USSR remain. We will not take them into account either, because they do not give any significant (and sometimes even nothing at all) number of unique subscribers for Megamind.

    In the end, about 92% of the subscribers are in the four above-mentioned countries, so we won’t miss a lot. And this is how the breakdown of the “normalized” number of subscribers by them looks like:


    Interactive option

    If you remember, last year Belarus became the most inveterate country. She still does not miss her, but only regarding Habrahabr. While subsidiary projects are interesting, first of all, to users from Russia. Kazakhstan closes the four, except in the case of Megamind, where third place was torn out in a bitter struggle against Ukraine. But according to MM, the most uniform distribution is generally observed.

    The most dramatic decline in interest in daughter publics is observed among Ukrainian users. Either Ukraine is less interested in the topics of these resources, or over the past year, users from this country have become less likely to subscribe to public VK. Testing the first hypothesis is beyond the scope of our study, but the second is easy to refute - just look at the growth rates of Habrahabr subscribers over the past year (since the last study) by country:


    Interactive version

    As we can see, all the countries of the Big Four showed the same growth, with the exception of Kazakhstan, which is here in the clear leaders.

    Universities


    University statistics this time will not be, sorry. And here's why: as you remember, we only look at unique users. But the division of universities divides subscribers into too small groups. So small that even for GT (not to mention MM) often there are no unique users left. Because of this, the university may be present in the list of universities of the Habr’s subscriber, but it will not be on the list for GT. Which will create the false impression that the students / graduates of this university are not interested in Geektimes at all.

    A clear example. There is such a university, or rather the faculty of the university - FSF ITMO. From it 30 people are subscribed to Habr and 5 people to Geektimes. Moreover, all those subscribed to GT are subscribed to XX. As a result, the number of unique GT subscribers is 0. What should I do with such a university? Ignore? Include in statistics with a special note? Analyze by non-unique users? In general, there are too many questions, and the value of comparison is doubtful. So if someone is interested in statistics for a particular university - contact, I will unload.

    Bad habits


    In relation to smoking and alcohol, subscribers express surprising indifference, even uninteresting:


    Interactive option


    Interactive option

    True, you can see that megamindists are a little more loyal to bad habits. Apparently, the work is more nervous. But in reality these are all not significant differences.

    Political Views


    But the differences in political views turned out to be significant:


    Interactive version

    The subscribers of Megamind turned out to be the most partial, liberal (but also conservative!). And the least and most moderate ones are “geeks” and Habrachians, respectively.

    Family status


    Even more interesting are the differences in love affairs.
    Vkontakte provides several options for the relationship the user is in. We will arrange them a little to make it more visual and convenient:

    Relationship Status Table
    Status for analysis Status from VK
    Have a partnerThere is a partner
    Married
    Engaged in
    love (yes, you can be in love unrequitedly, but do not be boring)
    No partnerNo partner
    Actively lookingActively looking
    -It's Complicated

    The status “everything is difficult” is excluded - it is difficult to interpret, and only 3.2% of subscribers chose it.
    In addition, we divide respondents by gender. And we get an entertaining picture:


    Interactive option

    Firstly, in all public places, girls are more successful in finding the second half than guys (and statistically significant).

    Now let's look at the number of subscribers without a second half. In total, the statuses “free” and “in search” give approximately the same results for all publics. But at the same time, the Habrachians are almost twice as "bolder" as their colleagues and are actively looking for a soul mate. Any comment on this seems like a flat joke, even if it was said seriously. So leave no comment. Well, the Megamind's female subscriber, apparently, is so good, even if they are single.

    The connection between VK and sites (likes, ratings, that's all)


    The next step would be to link user behavior in the VK and on the sites themselves. I will make a reservation right away that we will only consider data for the 2015th year. Firstly, because it was at the beginning of this year that the final division into three different sites took place. And secondly, I'm not sure that the creators of Habr would like to see a comparison of indicators, for example, the number of views. Especially in the context of years.

    For entries in VK, we will consider three main numerical indicators:
    • Number of likes
    • Number of Reposts
    • Number of Comments


    Posts on sites have slightly more indicators:
    • • Rating
    • • Views
    • • Comments
    • • Favorites

    But, of course, in addition to the above, there are a number of factors that can affect the performance of posts. Some of them were described in other articles on the topic (the day the post was published, for example), some require a deeper analysis, which is beyond the scope of this article, so we will not try to take them into account. After all, we do not have a task to build a regression model, we just want to look at the relationship of indicators among themselves.

    But at least one more factor we must take into account, namely, the date of publication. Indeed, over time, the number of subscribers can grow, and this, in turn, can affect the number of reposts and likes (more subscribers - more likes). Then we can’t just compare the record created on January 1, 2015 with the record from today’s date - we will also need to take into account how many likes are put today.

    To begin with, let's determine the change in the number of subscribers for 2015. The good old web archive will help us with this , with the help of which we can find several values ​​of the number of subscribers of each public for several different dates. We display these points on the graph:



    We see that the audience of Megamind is growing the fastest in relative terms (near Giktayms), and the slowest is Habr. This is logical, given the age of public - young publics grow faster.

    But the main good news for us is that the change in the number of subscribers is almost perfectly described by a linear function. We don’t have to suffer much further if we want to take into account the influence of this factor. By the simplest regression, we can predict the audience size of any of the publics at any date in the study period.

    But will this factor have to be taken into account? It seems that not:



    Huskies are fairly evenly “smeared” throughout the year. It turns out that no matter how the audience of the public grows, it does not become more generous with likes and reposts.

    By the way, pay attention to the “notches” below on the distribution of HH. This is the very weekend that has been talked about so many times in reviews of Habr’s articles - apparently because there are few articles and hawkers are becoming more generous in rating. This pattern to some extent migrated to the social network. But only for Habr - on other publics, as can be seen from the graphs, it does not apply. This is also confirmed by correlation coefficients for the values ​​“number of records per day” and “average number of likes”.
    • Habrahabr -0.455
    • Gictimes -0.237
    • Megamind -0.169


    Now that we have clarified the issues with the most obvious dependencies, I want to see how things are with other indicators. To do this, we construct the correlation matrices for each public. But we will remember that the correlation speaks of the tightness of the connection, but in the general case does not allow us to establish cause and effect. For clarity, we display the matrix in the following form:



    As we can see, the situation is approximately the same for all publics. Serious differences are only in relation to the “favorites” indicator from likes and reposts. Habr has a fairly clear relationship, the rest is much weaker.
    It should also be noted the almost linear relationship between likes and reposts, although this was pretty expected.

    Nothing depends on the day of the year (and, as a result, on the number of subscribers). But there is a rather strong correlation between the views of the article and its rating / number of adding to favorites. Which is quite logical - a bad article is unlikely to be viewed a lot, and a good article written for a small audience does not gain a lot of advantages.

    Likes and reposts from VK are weakly related to the rating put on the sites (but at Habr and GT they are not strong, but they correlate with the number of article views). This, in fact, is one of the main conclusions of the comparison. It turns out that the audience of the habropubliks on Vkontakte and the audience on the sites do not agree too much in evaluating posts.

    It is interesting that the number of comments on websites and the number of comments in VK are very weakly dependent on each other, although they are designed to serve the same purpose - discussion of the article. Another confirmation of the different behavior of users in VK and on the portals themselves.

    Instead of a conclusion


    One can argue for a long time whether the division of Habr was justified and for what purpose it was done, but now, after just less than a year, differences begin to appear between the audiences of three different sites (or at least their publics). Summing up, we can say that gradually both Giktayms and Megamind begin to live their own lives, gaining their partly unique audience. Although so far incomparable in number with the audience of his "dad." How separation affected the life of Habr himself is another question that goes beyond the scope of this post.

    On this philosophical note, we will round off. Until new meetings, if destined to be. And remember, statistics are just the third kind of lie.

    PS I apologize for posting the same in the VK Api hub, but I haven’t provided any code (it’s trivial). But as far as I saw, there are sometimes such articles. I think this is quite a suitable public for a post devoted to the processing of data extracted from VK.

    Also popular now: