Writers and readers - analysis of the structure of comments LJ TOP-500, part 1

    Start


    I continue a series of research publications on the structural analysis of the Russian-language segment of the LiveJournal . The first publication was devoted to some analysis of the audiences of 10 top bloggers. During its preparation, a graph of Russian LJ links was compiled, covering more than 2 million blogs and 58 million links between them . I will return to this graph in the next series (until I have comprehended it), but today about something else. Namely, about who, how often, and who comments in the most lively corner of LiveJournal fights and discussions - in magazines from the TOP-500 .

    Based on the status of the LiveJournal rating at the beginning of April and plucking 500 top positions from itI started collecting data using the following procedure. Each blog from the list requested 25 recent publications (available through regular LJ tools). A list of commentators (name, comment id, comment location in the tree) was pulled out of each publication, unless, of course, the comments on the post are open to outsiders.

    The regular LiveJournal tools do not allow this, attempts to make a feint with your ears and strip the RSS feed for blog search from Yandex came across a very strange and somewhat illogical behavior of this feed (this is not a complaint, it's just a fact), so the information about the structure of comments had to be extracted from the pages magazines. But it turned out for the better :) By the way, if that: DDos on LJ - it's not me :)

    As a result, after several days of collecting information (the initial version of the crawler was not buggy, LJ slowed down - at that time there was another DDoS on it), these were the initial data:

    487 magazines with at least one commented post;
    10546 posts with at least one comment;
    809,563 comments (excluding anonymous ones), of which 115,326 (14.2%) - answers from magazine owners;
    114,412 commentators, of which 3884 (3.4%) are logged in using external services (twitter, facebook, etc.)

    Further in the program:

    1. Statistics of various characteristics of TOP-500 magazines
    2. Some implicit but curious ratings
    3. Search for the answer “how to become a popular blogger” using cluster and correlation analysis (this, however, will be in the second part of the study)

    1 Statistics of magazines and publications


    The distributions of some statistical characteristics of the journals from the studied sample are presented below. In view of the power-law distribution characteristic of social networks (of which the Pareto curve is a special case ) having a “long tail” on histograms, this “tail” will be collected in the last increased interval. And along with the arithmetic mean, I will give the median of the series as a more robust estimate of the average value.

    By the way, an interesting detail. The function of dependence of the number of friends on the position of a blogger in the top is almost ideally approximated by a power function with R2= 0.9932. But similar approximations to the number of comments and commentators are significantly worse: R2 = 0.2355 for comments and R2 = 0.3074 for commentators.

    It would be interesting to look at these figures after a while and for more posts. So their desire for unity would mean a gradual movement of blogs with heated discussions in the comments today to the “head” of the top among readers, i.e. "Trickle" of the consolidated rating .

    1.1 Publications, comments, commentators

    The two histograms below give an idea of ​​the distribution of such characteristics of publications (all authors) as the number of comments and the number of unique commentators .



    In the studied sample, only 198 posts with the number of comments from 500 to 1000, and 69 typed more than 1000 comments. A typical publication of even a top blogger is gaining 26 comments (median).

    Of course, the publications of the “top” of the top collect more comments; this can be seen in the change in the median of the set of comments for different “cut-offs” of the rating. The larger the sample, the faster these indicators dissolve:

    TOP-10 211
    TOP-30 149
    TOP-100 70
    TOP-200 44
    TOP-500 26

    The same picture for the number of unique commentators in each publication.



    A typical LiveJournal entry has 16 “discussers”. More than a hundred people gathered for only 725 publications (6.85% of all), of which from 500 to 1000 commentators in 42 entries (0.4%) and as many as 4 entries were collected by more than 1000 readers who have something to say about this .

    1.2 Authors and their fans - analysis of the discussion audience

    It is very likely (and this I will try to identify in the second part of the study) that a significant contribution to the interest in the journal is made by the nature of user activity in the comments: the presence of a regular audience , the involvement of the author of the journal in the discussion, the presence of the discussions themselves , and not just the comments of “fitterka” and "A lot".

    So, for example, you can evaluate the activity of the author of the journal through the share of his answers in the total number of comments. The distribution of the authors in this section is shown in the histogram:



    So, the share of answers of 50% means that for each comment of the visitor, the author left his answer. Accordingly, a share of 20% means that the author replied to every fourth (yes, exactly the fourth, not the fifth) comment. The average value for all journals is approximately 16% of the responses . Those. for every fifth comment the abstract author gives an answer.

    Commentators

    Magazines can be ranked by the number of unique commentators - i.e. according to the audience, not only reading, but also participating in discussions of the written.
    Number of CommentatorsNumber of magazines
    0 - 200206
    200 - 400118
    400 - 60065
    600 - 80034
    800 - 1000eleven
    more than 100053

    The average TOP-500 magazine has about 260 commentators (of course, for the last 25 posts).

    To isolate the core of commentators, we will make three additional (and very revealing) sections and present the obtained average values ​​for them:
    1. 61% of blog commentators left only one comment in the journal
    2. 29% left 2-4 comments
    3. and only 10% of commentators actively take part in the life of the blog, leaving 5 or more posts

    Discussions

    The most interesting in my opinion is the definition of the debatable attractiveness of the magazine. There are many different metrics that can be used to search for a magazine for a chat lover, since comments have a tree, a tree has a graph, and you can count a lot on the graphs.

    After some thought, I took the following indicator: the average number of comments in the thread . Very clear indicator. But not visual. Then that the average will fluctuate around two in the best case, or even slide to unity.

    Therefore, we take the number of threads with more than N comments in the journal. For simplicity, N is taken as half of the median of the maximum thread lengths. With a median of 22 comments in the thread, N = 11.

    The number of "heavy" threadsNumber of magazines
    0-10346
    10 - 2069
    20 - 3021
    30 - 4014
    40 - 505
    50 - 10019
    more than 100thirteen

    The average journal has only 4 threads with more than 11 comments.

    2 Additional Ratings


    Next, I’ll give you a few additional ratings (three top positions) based on the commentability indicators discussed above.

    Number of comments (total)

    JournalNumber of comments
    nikitabesogon42752
    alexsword33057
    krispotupchik15465

    Commentators audience (total)

    JournalCommentators, total
    pesen_net5989
    toster4626
    mzadornov4184

    The number of responses of the owner of the journal (total, share in the number of comments)

    JournalAnswers totalReplies,% of the number of comments
    mcheburashkina479940.5%
    alexsword422112.8%
    kitya335142.5%

    The core of the audience (total, percentage of the total number of commentators)

    JournalCommentators% of the total
    nikitabesogon83523.1%
    navalny82727.0%
    fritzmorgen61022.9%

    The core of the audience is the number of commentators who left 5 or more comments in the journal.

    Pause...


    This concludes the first part of the study. In the second part, I will try to put forward a couple of hypotheses, confirm or refute them, and also look for common features in such a motley crowd of bloggers :) In a week. Stay, as they say, tuned!

    LiveJournal Cross-post: infist-xxi.livejournal.com/79250.html

    Also popular now: