About a month ago I published an article about habracottas on the habr . A by-product of this article turned out to be a dump of the user’s pages, and I wanted to extract some more information from it. Articles on user analysis , articles , comments , karma regularly appear on the habr, but I did not find a single article in which to analyze the habits. Therefore, I built a graph of habrainweights and looked at some of its characteristics.

    Let me remind you that the pages were downloaded in January 2016, so everything that happened after (registration of new users, removal of old users, changes in karma) was not taken into account. Removing all read-only and deactivated users from the list of downloaded users, we get 79870. As far as I know, this number approximately corresponds to the actual number of users (plus or minus a thousand). Further, to get a graph without holes, we had to add 955 read-only users and 382 deactivated (these are the users who invited someone or were invited, but were drunk from the hub or transferred to RO for one reason or another). As a result, we get a graph at 81,207 vertices.

    It is worth noting that getting a list of Habr users is not easy. Most were obtained a couple of years ago, when lists of hub subscribers were still available. But now there are no such lists, so the usernames for 2015 and 2016 were extracted from articles, comments, pages of already known users, subscriber lists, user lists from given cities, countries, and I also cut frequently encountered prefixes of user names (of the form Alex * , admin *, Captain *, etc.) and made several thousand requests on the Habr search page. I also added active users on giktaymsa and megamind, so if you are not on my list, you are well encrypted.

    So, we have a directed graph with 81207 vertices and 20195 arcs. As you can see, only about 20 thousand users were registered on invites from other users, the rest either registered before invites (more than 40 thousand) or were invited by UFOs. Let's

    call habraklanweak connection component of this directed graph. It is worth noting that these components, generally speaking, are not trees, since one person can receive invites several times. Therefore, we have loops: for example, @ tangro invited @ Milla, and @ Milla invited @ tangro; loops: for example, @ aavezel invited himself; vertex, which includes several arcs: the user @ shara was invited 6 times (@ Deeman, @ myagi, @ homm, @ Azya, @ veveve, @ shifttstas). Although all these are more likely exceptions, in general the graph looks like a forest.

    In our column there are 61021 habraclan. The size distribution is as follows:
    Component SizeNumber of components
    more than 1001one
    Let's look at the biggest components.
    No.The sizeRoot Peaks
    one1027@ Davekeinz (sent 412 invites - more than anyone else on the hub, also in this component @ Mithgol, which sent 78 invites)
    2584@ Mudhoney (sent 242 invites) @ valemak
    3316@ XaocCPS (sent 65 items)
    four272@ Alaunquirie (invited @ BarsMonster, who invited 73 users) @ kip
    five189@ Deeman @ homm @ DorBer @ myagi @ Azya @ maovrn @ fil9 @ yoihj
    6106@ Rossomachin
    7104@ Garyan
    eight97@ Kukutz (Yandex.Component)
    990@ Eosunknown
    ten85@ Cigulev @ tyr
    eleven80@ Mdevils
    1280@ Nuzgul
    1377@ Ni404 @ tronix286 @ Rembish
    1477@ Tigger
    1576@ Gaidar
    sixteen70@ Auren
    1769@ Saltommeister
    1868@ Kalan
    nineteen68@ Alisadenisova
    2067@ Horsev
    Below are pictures of these 20 graphs. Green circles - users with positive karma, red - with negative, blue - with zero, gray - read-only or deactivated users. The area of ​​the circle is proportional to the karma modulus (if this number is greater than 1). All pictures are links to a large version. Let's also look at the "heights" of the habraclans. If we throw away the negligible number of graphs with cycles, then dag_longest_path_length (G) gives the following result.

    The length of the longest chainNumber of components
    The longest chain is: @ Garyan invited @ Andrey_Rogovsky, who invited @ DmitryGushin, who invited @ Uncle_Sam, who invited @ RootHell, who sent an invite @ alexey_qwe, who invited @ Doom2, who called for @ Odnoklassniki_ru and who finally invited @ DarkDefender.

    The analysis coincides with the expectation that most habraclans are small and with a small "height".

    Now remember that users have karma. Compound summation gives that there are at least 450323.4 units of positive karma on the habr . (By the way, 10,579 habra users have more than or equal to 10 karma, so theoretically this article can gain 10,578 pluses.)

    Let's see which habraclans have the largest karma reserves.
    No.Total karmaRoot Peaks
    one6184.4@ Mudhoney @ valemak
    25333.7@ Davekeinz
    34720.8@ XaocCPS
    four3587.1@ Alaunquirie @ kip (@ BarsMonster here)
    five2464.5@ Deeman @ homm @ DorBer @ myagi @ Azya @ maovrn @ fil9 @ yoihj
    62390.1@ Horsev (@ PapaBubaDiop and @ Milfgard here)
    71984.9@ Cigulev @ tyr (@ Zelenyikot here)
    eight1780.2@ Ni404 @ tronix286 @ Rembish
    91606.1@ Eosunknown
    ten1526.9There is no root, and it all starts with the @ tangro - @ Milla loop
    eleven1319.3@ Kit
    121304.1@ Ocelot
    131299.5,@ Auren
    141104.5@ Kalan
    151009.1@ Rossomachin
    sixteen985.5@ Easy_john
    17932.3@ Assuri
    18871.7@ Sourcerer
    nineteen845.2@ LukaSafonov
    20838.6@ Mdevils
    Below are pictures of graphs that have not met before. Also, for some users, the country is indicated on the page in the "From" field. The top countries by users can be found on the hub itself, but it was interesting for me to look at the invites in which the inviter and the invitee are in different countries. Such invites characterize the “geographical” connectivity of the habrasociety. At first I wanted to build the so-called. chord diagram , but did not find an easy way to do this in python, so I give the upper left corner of the corresponding matrix. (If someone tells me how to build the diagram, I’d be grateful.) The blue the cell in the picture, the more the logarithm of the number of invites from country 1 to country 2. The connection between Russia, Ukraine, Belarus, the USA and Germany is noticeable.

    Another piece of information that is not related to invites, but is easily extracted from user pages, is the date of registration and the date of the last appearance. The following table shows how many users registered in a given year and how many of them appeared on the hub from January 1, 2015 (otherwise, we think that the user has ceased to be habraactive).
    Same thing in chart form.

    We see that half of the users registered in 2007 and 2008, as well as many old-timers are active.

    That's all. A table with source data and a script for drawing graphs are available on the github . A raw data archive is available upon request.

    Also popular now: