Habraclans
About a month ago I published an article about habracottas on the habr . A by-product of this article turned out to be a dump of the user’s pages, and I wanted to extract some more information from it. Articles on user analysis , articles , comments , karma regularly appear on the habr, but I did not find a single article in which to analyze the habits. Therefore, I built a graph of habrainweights and looked at some of its characteristics.
![](https://habrastorage.org/files/ed4/dff/582/ed4dff5825c243c3806ed4ac1b9372de.png)
Let me remind you that the pages were downloaded in January 2016, so everything that happened after (registration of new users, removal of old users, changes in karma) was not taken into account. Removing all read-only and deactivated users from the list of downloaded users, we get 79870. As far as I know, this number approximately corresponds to the actual number of users (plus or minus a thousand). Further, to get a graph without holes, we had to add 955 read-only users and 382 deactivated (these are the users who invited someone or were invited, but were drunk from the hub or transferred to RO for one reason or another). As a result, we get a graph at 81,207 vertices.
It is worth noting that getting a list of Habr users is not easy. Most were obtained a couple of years ago, when lists of hub subscribers were still available. But now there are no such lists, so the usernames for 2015 and 2016 were extracted from articles, comments, pages of already known users, subscriber lists, user lists from given cities, countries, and I also cut frequently encountered prefixes of user names (of the form Alex * , admin *, Captain *, etc.) and made several thousand requests on the Habr search page. I also added active users on giktaymsa and megamind, so if you are not on my list, you are well encrypted.
So, we have a directed graph with 81207 vertices and 20195 arcs. As you can see, only about 20 thousand users were registered on invites from other users, the rest either registered before invites (more than 40 thousand) or were invited by UFOs. Let's
call habraklanweak connection component of this directed graph. It is worth noting that these components, generally speaking, are not trees, since one person can receive invites several times. Therefore, we have loops: for example, @ tangro invited @ Milla, and @ Milla invited @ tangro; loops: for example, @ aavezel invited himself; vertex, which includes several arcs: the user @ shara was invited 6 times (@ Deeman, @ myagi, @ homm, @ Azya, @ veveve, @ shifttstas). Although all these are more likely exceptions, in general the graph looks like a forest.
In our column there are 61021 habraclan. The size distribution is as follows:
Let's look at the biggest components.
Below are pictures of these 20 graphs. Green circles - users with positive karma, red - with negative, blue - with zero, gray - read-only or deactivated users. The area of the circle is proportional to the karma modulus (if this number is greater than 1). All pictures are links to a large version.
Let's also look at the "heights" of the habraclans. If we throw away the negligible number of graphs with cycles, then dag_longest_path_length (G) gives the following result.
![](https://habrastorage.org/files/da4/f8a/8d2/da4f8a8d21ba4860b2f9c2e88a478a4b.png)
![](https://habrastorage.org/files/36c/e83/639/36ce836398d440868a7b05379289efa1.png)
![](https://habrastorage.org/files/96e/025/cc8/96e025cc83bd475c9632c8b9413ddebc.png)
![](https://habrastorage.org/files/ccf/6fe/47c/ccf6fe47c3ad4d4fbde8940aa0dcb637.png)
![](https://habrastorage.org/files/886/382/066/886382066512416688010861183483f8.png)
![](https://habrastorage.org/files/386/9be/cb8/3869becb8a7b457692329d5b9538b32d.png)
![](https://habrastorage.org/files/dd2/928/5bd/dd29285bd40c4a2a99121180d5a2e919.png)
![](https://habrastorage.org/files/8d9/25b/b3a/8d925bb3a3ee4243802acaa7fe5704ea.png)
![](https://habrastorage.org/files/265/762/dbc/265762dbccd1403b94cc223fd2bb9b56.png)
![](https://habrastorage.org/files/13f/5cf/627/13f5cf6279464f068994085f9502addf.png)
![](https://habrastorage.org/files/f71/1cb/19c/f711cb19c09f46979c040919fc28bfcb.png)
![](https://habrastorage.org/files/aee/30b/348/aee30b3481c54c8aa8d0589f474d4c91.png)
![](https://habrastorage.org/files/461/efa/81e/461efa81e1034657bb264f4a20f0bb0d.png)
![](https://habrastorage.org/files/c53/9d3/1d2/c539d31d27954c5b939a5d48e8f51170.png)
![](https://habrastorage.org/files/d26/09c/726/d2609c7262ab4988972f51cd2ffa4d55.png)
![](https://habrastorage.org/files/2f2/a3b/fd8/2f2a3bfd8302466ea210d72d45a5d0f1.png)
![](https://habrastorage.org/files/96e/5b9/c99/96e5b9c99c714170839c98b2bc8523b4.png)
![](https://habrastorage.org/files/181/b2c/f3c/181b2cf3c8d845eaa99c5586e6077e37.png)
![](https://habrastorage.org/files/ed2/29c/e86/ed229ce868174b1092f927521593794b.png)
![](https://habrastorage.org/files/c60/4a4/71a/c604a471a516459cba5a68dfc43f2bf2.png)
The longest chain is: @ Garyan invited @ Andrey_Rogovsky, who invited @ DmitryGushin, who invited @ Uncle_Sam, who invited @ RootHell, who sent an invite @ alexey_qwe, who invited @ Doom2, who called for @ Odnoklassniki_ru and who finally invited @ DarkDefender.
The analysis coincides with the expectation that most habraclans are small and with a small "height".
Now remember that users have karma. Compound summation gives that there are at least 450323.4 units of positive karma on the habr . (By the way, 10,579 habra users have more than or equal to 10 karma, so theoretically this article can gain 10,578 pluses.)
Let's see which habraclans have the largest karma reserves.
Below are pictures of graphs that have not met before.
Also, for some users, the country is indicated on the page in the "From" field. The top countries by users can be found on the hub itself, but it was interesting for me to look at the invites in which the inviter and the invitee are in different countries. Such invites characterize the “geographical” connectivity of the habrasociety.
At first I wanted to build the so-called. chord diagram , but did not find an easy way to do this in python, so I give the upper left corner of the corresponding matrix. (If someone tells me how to build the diagram, I’d be grateful.) The blue the cell in the picture, the more the logarithm of the number of invites from country 1 to country 2. The
connection between Russia, Ukraine, Belarus, the USA and Germany is noticeable.
![](https://habrastorage.org/files/008/939/684/008939684b1e48179ca2edab223fd48f.png)
![](https://habrastorage.org/files/f70/dd0/fe0/f70dd0fe0d6a492a908c3ed2d5afdb16.png)
![](https://habrastorage.org/files/221/2c7/4ff/2212c74ff790497dbdf0d927d4c25f16.png)
![](https://habrastorage.org/files/fef/9b0/1bf/fef9b01bf7df48a5a7d21d0854bdc3a7.png)
![](https://habrastorage.org/files/7a8/19f/d0f/7a819fd0fa2e4cd181dbe59efc3035e3.png)
![](https://habrastorage.org/files/df8/818/b84/df8818b845a940cb93cea72f95d264f8.png)
![](https://habrastorage.org/files/0bb/279/0df/0bb2790df6614573ba15f8c5513357ec.png)
![](https://habrastorage.org/files/2d2/4c6/3bd/2d24c63bd06a45978736e902a891ed68.png)
Another piece of information that is not related to invites, but is easily extracted from user pages, is the date of registration and the date of the last appearance. The following table shows how many users registered in a given year and how many of them appeared on the hub from January 1, 2015 (otherwise, we think that the user has ceased to be habraactive).
Same thing in chart form.
![](https://habrastorage.org/files/0f1/0d5/1e3/0f10d51e338e4cc3b3e5b9907b31fa14.png)
We see that half of the users registered in 2007 and 2008, as well as many old-timers are active.
That's all. A table with source data and a script for drawing graphs are available on the github . A raw data archive is available upon request.
![](https://habrastorage.org/files/ed4/dff/582/ed4dff5825c243c3806ed4ac1b9372de.png)
Let me remind you that the pages were downloaded in January 2016, so everything that happened after (registration of new users, removal of old users, changes in karma) was not taken into account. Removing all read-only and deactivated users from the list of downloaded users, we get 79870. As far as I know, this number approximately corresponds to the actual number of users (plus or minus a thousand). Further, to get a graph without holes, we had to add 955 read-only users and 382 deactivated (these are the users who invited someone or were invited, but were drunk from the hub or transferred to RO for one reason or another). As a result, we get a graph at 81,207 vertices.
It is worth noting that getting a list of Habr users is not easy. Most were obtained a couple of years ago, when lists of hub subscribers were still available. But now there are no such lists, so the usernames for 2015 and 2016 were extracted from articles, comments, pages of already known users, subscriber lists, user lists from given cities, countries, and I also cut frequently encountered prefixes of user names (of the form Alex * , admin *, Captain *, etc.) and made several thousand requests on the Habr search page. I also added active users on giktaymsa and megamind, so if you are not on my list, you are well encrypted.
So, we have a directed graph with 81207 vertices and 20195 arcs. As you can see, only about 20 thousand users were registered on invites from other users, the rest either registered before invites (more than 40 thousand) or were invited by UFOs. Let's
call habraklanweak connection component of this directed graph. It is worth noting that these components, generally speaking, are not trees, since one person can receive invites several times. Therefore, we have loops: for example, @ tangro invited @ Milla, and @ Milla invited @ tangro; loops: for example, @ aavezel invited himself; vertex, which includes several arcs: the user @ shara was invited 6 times (@ Deeman, @ myagi, @ homm, @ Azya, @ veveve, @ shifttstas). Although all these are more likely exceptions, in general the graph looks like a forest.
In our column there are 61021 habraclan. The size distribution is as follows:
Component Size | Number of components |
---|---|
more than 1001 | one |
101-1000 | 6 |
11–100 | 436 |
2-10 | 3110 |
one | 57468 |
No. | The size | Root Peaks |
---|---|---|
one | 1027 | @ Davekeinz (sent 412 invites - more than anyone else on the hub, also in this component @ Mithgol, which sent 78 invites) |
2 | 584 | @ Mudhoney (sent 242 invites) @ valemak |
3 | 316 | @ XaocCPS (sent 65 items) |
four | 272 | @ Alaunquirie (invited @ BarsMonster, who invited 73 users) @ kip |
five | 189 | @ Deeman @ homm @ DorBer @ myagi @ Azya @ maovrn @ fil9 @ yoihj |
6 | 106 | @ Rossomachin |
7 | 104 | @ Garyan |
eight | 97 | @ Kukutz (Yandex.Component) |
9 | 90 | @ Eosunknown |
ten | 85 | @ Cigulev @ tyr |
eleven | 80 | @ Mdevils |
12 | 80 | @ Nuzgul |
13 | 77 | @ Ni404 @ tronix286 @ Rembish |
14 | 77 | @ Tigger |
15 | 76 | @ Gaidar |
sixteen | 70 | @ Auren |
17 | 69 | @ Saltommeister |
18 | 68 | @ Kalan |
nineteen | 68 | @ Alisadenisova |
20 | 67 | @ Horsev |
![](https://habrastorage.org/files/da4/f8a/8d2/da4f8a8d21ba4860b2f9c2e88a478a4b.png)
![](https://habrastorage.org/files/36c/e83/639/36ce836398d440868a7b05379289efa1.png)
![](https://habrastorage.org/files/96e/025/cc8/96e025cc83bd475c9632c8b9413ddebc.png)
![](https://habrastorage.org/files/ccf/6fe/47c/ccf6fe47c3ad4d4fbde8940aa0dcb637.png)
![](https://habrastorage.org/files/886/382/066/886382066512416688010861183483f8.png)
![](https://habrastorage.org/files/386/9be/cb8/3869becb8a7b457692329d5b9538b32d.png)
![](https://habrastorage.org/files/dd2/928/5bd/dd29285bd40c4a2a99121180d5a2e919.png)
![](https://habrastorage.org/files/8d9/25b/b3a/8d925bb3a3ee4243802acaa7fe5704ea.png)
![](https://habrastorage.org/files/265/762/dbc/265762dbccd1403b94cc223fd2bb9b56.png)
![](https://habrastorage.org/files/13f/5cf/627/13f5cf6279464f068994085f9502addf.png)
![](https://habrastorage.org/files/f71/1cb/19c/f711cb19c09f46979c040919fc28bfcb.png)
![](https://habrastorage.org/files/aee/30b/348/aee30b3481c54c8aa8d0589f474d4c91.png)
![](https://habrastorage.org/files/461/efa/81e/461efa81e1034657bb264f4a20f0bb0d.png)
![](https://habrastorage.org/files/c53/9d3/1d2/c539d31d27954c5b939a5d48e8f51170.png)
![](https://habrastorage.org/files/d26/09c/726/d2609c7262ab4988972f51cd2ffa4d55.png)
![](https://habrastorage.org/files/2f2/a3b/fd8/2f2a3bfd8302466ea210d72d45a5d0f1.png)
![](https://habrastorage.org/files/96e/5b9/c99/96e5b9c99c714170839c98b2bc8523b4.png)
![](https://habrastorage.org/files/181/b2c/f3c/181b2cf3c8d845eaa99c5586e6077e37.png)
![](https://habrastorage.org/files/ed2/29c/e86/ed229ce868174b1092f927521593794b.png)
![](https://habrastorage.org/files/c60/4a4/71a/c604a471a516459cba5a68dfc43f2bf2.png)
The length of the longest chain | Number of components |
---|---|
9 | one |
7 | 2 |
6 | eleven |
five | 39 |
four | 125 |
3 | 479 |
2 | 2888 |
one | 57468 |
The analysis coincides with the expectation that most habraclans are small and with a small "height".
Now remember that users have karma. Compound summation gives that there are at least 450323.4 units of positive karma on the habr . (By the way, 10,579 habra users have more than or equal to 10 karma, so theoretically this article can gain 10,578 pluses.)
Let's see which habraclans have the largest karma reserves.
No. | Total karma | Root Peaks |
---|---|---|
one | 6184.4 | @ Mudhoney @ valemak |
2 | 5333.7 | @ Davekeinz |
3 | 4720.8 | @ XaocCPS |
four | 3587.1 | @ Alaunquirie @ kip (@ BarsMonster here) |
five | 2464.5 | @ Deeman @ homm @ DorBer @ myagi @ Azya @ maovrn @ fil9 @ yoihj |
6 | 2390.1 | @ Horsev (@ PapaBubaDiop and @ Milfgard here) |
7 | 1984.9 | @ Cigulev @ tyr (@ Zelenyikot here) |
eight | 1780.2 | @ Ni404 @ tronix286 @ Rembish |
9 | 1606.1 | @ Eosunknown |
ten | 1526.9 | There is no root, and it all starts with the @ tangro - @ Milla loop |
eleven | 1319.3 | @ Kit |
12 | 1304.1 | @ Ocelot |
13 | 1299.5, | @ Auren |
14 | 1104.5 | @ Kalan |
15 | 1009.1 | @ Rossomachin |
sixteen | 985.5 | @ Easy_john |
17 | 932.3 | @ Assuri |
18 | 871.7 | @ Sourcerer |
nineteen | 845.2 | @ LukaSafonov |
20 | 838.6 | @ Mdevils |
![](https://habrastorage.org/files/008/939/684/008939684b1e48179ca2edab223fd48f.png)
![](https://habrastorage.org/files/f70/dd0/fe0/f70dd0fe0d6a492a908c3ed2d5afdb16.png)
![](https://habrastorage.org/files/221/2c7/4ff/2212c74ff790497dbdf0d927d4c25f16.png)
![](https://habrastorage.org/files/fef/9b0/1bf/fef9b01bf7df48a5a7d21d0854bdc3a7.png)
![](https://habrastorage.org/files/7a8/19f/d0f/7a819fd0fa2e4cd181dbe59efc3035e3.png)
![](https://habrastorage.org/files/df8/818/b84/df8818b845a940cb93cea72f95d264f8.png)
![](https://habrastorage.org/files/0bb/279/0df/0bb2790df6614573ba15f8c5513357ec.png)
![](https://habrastorage.org/files/2d2/4c6/3bd/2d24c63bd06a45978736e902a891ed68.png)
Another piece of information that is not related to invites, but is easily extracted from user pages, is the date of registration and the date of the last appearance. The following table shows how many users registered in a given year and how many of them appeared on the hub from January 1, 2015 (otherwise, we think that the user has ceased to be habraactive).
2006 | 3091 | 909 |
2007 | 19433 | 5511 |
2008 | 22031 | 6348 |
2009 | 6032 | 3094 |
2010 | 6826 | 3345 |
2011 | 9341 | 6355 |
2012 | 5841 | 4160 |
2013 | 4029 | 2819 |
2014 | 2684 | 2100 |
2015 | 1473 | 1473 |
Total | 80781 | 36114 |
![](https://habrastorage.org/files/0f1/0d5/1e3/0f10d51e338e4cc3b3e5b9907b31fa14.png)
We see that half of the users registered in 2007 and 2008, as well as many old-timers are active.
That's all. A table with source data and a script for drawing graphs are available on the github . A raw data archive is available upon request.