About Habrahabr, statistics and cakes

    Lyrical digression


    Hello!
    Once, on a dark winter evening, I had nothing to do and I killed time by reading my beloved Habrahabr. In comments once again the phrase slipped that Habr, say, is no longer a cake .

    Statistics, stats and again statistics


    I was wondering whether it is possible to somehow evaluate the quality of articles on Habré numerically and whether it will be visible from the received assessment how it changed over time, or, in fact, all these comments are nothing more than a grumbling about what used to be grass was greener? It was in the evening and there was nothing to do, so I took my will into my fist and wrote a simple bot that slowly dug up almost 2800 pages of the main Habr and collected statistics on hackneyed articles from the moment Habr was opened and until December 31, 2009.
    A traditional picture to attract attention, a graph of the number of articles by month:



    Theme


    The very first thought that came to my mind was to check how the thematic content of Habr had changed during its existence. As you know, blogs on Habrahabr are divided into categories, which can be found here . To begin with, I tried to calculate the number of articles for each category by year (there is too much noise in the statistics for the months, so I had to abandon it). Unfortunately, not all blogs have a category, for those it is marked as "n / a".


    More clearly, these same data can be represented in the form of pie charts:




    The positive trend is obvious - on Habré the quantity of an offtopic decreased and there was more than profile content. The percentage of programming has grown a lot. But the iron, about which there is an opinion that it has become more recent, has actually not grown much - although, perhaps, the efforts of the same Bumburum improved the quality of articles on iron.

    Grades


    How has the quality of a spherical article in the vacuum of the Habr’s main page changed during its existence? The first thing that comes to mind is to calculate the average rating for such an article. The following graph illustrates this monthly assessment: The

    peak that we see in August 2008 is nothing more than the launch of SuperHabr and the introduction of invites.

    Comments


    Another interesting indicator is the average number of comments on the article:

    Everything is predictable: unlike articles, all registered users can leave comments, so the introduction of invites has suspended the growth of this indicator. The average number of comments reflects the size of Habr's active audience well. Oh yes, the peak on the left is the only article in July 2006 that is still being commented on - after all, it is the very, very first.

    Holivors


    One of the most interesting questions I asked myself before starting this article is whether there have really been more contentious topics on Habré recently that cause a storm of emotions in readers and a desire to beat up interlocutors? How can such an indicator be evaluated? After much deliberation, I decided that with a certain error this indicator can be illustrated using the ratio of the number of negative ratings of an article to the total number of ratings. So, I called “controversial” an article in which the number of “minuses” is more than a third of the total number of ratings. The following graph shows the controversial articles with a red line and everything with a blue line:

    It can be seen quite poorly, let's try to calculate the relative number of controversial articles from the total:

    Here you can already see better: the number of controversial articles is growing and now it has almost reached the maximum that was observed before the invites were introduced (then there were rumors about the botnet that minus objectionable and plus articles pleasing to its creator). The introduction of invites and new rules slowed down this process, but not for long. This is probably the only alarm bell that I saw, analyzing the data collected.

    conclusions


    It is quite obvious that Habr's whole life can be divided into two parts - in August 2008, with the introduction of a new engine and rules, the project matured and stabilized. 2009 was the first year of the adult life of this project and it lived just fine: the number and quality of articles grew, not to mention attendance.
    However, not everything is so smooth in the Danish kingdom - you need to do something with articles that are minus simply because they mention a topic that a fan doesn't like about any technology or, on the contrary, plus because it says about a sacred cow a fan. The concept of hidden articles for IMHO blog subscribers does not justify itself. However, the answers to the questions “who is to blame?” and "what to do?" go far beyond the boundaries of this article, and I will dwell on this. The only comment is that the new steering of Habr will need to seriously think about this issue.

    Post Script


    If the reader has any ideas on how else to analyze the collected data - write to me, I will listen to them with pleasure.

    Also popular now: