About Habrahabr, statistics and ... tag clouds

    About a month ago I wrote an article in which I presented several graphs to the court of the esteemed habrapublic. The graphs as much as possible reflected the history of the development of this site and tried to evaluate the quality of the content on it. I don’t even know if they managed to achieve their goal, but on the whole the article was rather warmly received. Even then, I promised myself that if possible I would try to continue the topic.

    And again about the thematic content

    Last time I tried to reflect the thematic content of Habrahabr using the graph and pie charts, which operated on the number of articles that belong to a particular category of blogs . And, honestly, I think that I managed it very badly.

    I thought that after all there must be some ways to reflect the thematic content of the site in time. Unfortunately, googling did not produce results - humanity has not come up with anything better than good old schedules. But hey! After all, we live in the era of Web 2.0 and one of its main symbols is ... an ordinary tag cloud. You have all seen him many times, it is banal to impossibility and already managed to bother you with your order on every second site, and even in various 3d-shno flash forms. But, in fairness, the tag cloud has gained its popularity precisely because it very well reflects the thematic content of the site. But what if you try to draw a tag cloud and make it dynamically change over time? Such a time tag machine. Then the matter remained for small: to think about how it would all look, make it more dynamic, set aside an evening for the implementation of what was conceived at WPF and another evening for rendering and coding in video. What I did, I called "Tag Tornado" or "Twister of Tags." You can see, in fact, the hero of the occasion in the following video:
    * I recommend watching a video on a YouTube site in a "large" player and a resolution of 480p
    ** I have not been able to select music for the video, I leave it to the viewer. Sit back, sit back and enjoy

    How it works?

    Yes, it works very simply - blog names revolve around a common center in a circle at a constant angular speed. For each successive moment (in this case, in two-hour increments), the "weight" of each blog is calculated. The weight of the blog depends on the total rating of articles in it over a period of about two weeks, and the rating of each article is multiplied by the coefficient of removal of the moment of its publication from the moment we are considering. In general, the closer the article was to a given moment, the more its score affects weight. After calculating the weights of all blogs for a given moment, they are normalized and fit into the interval [0; 1] by dividing by the maximum rating. Those blogs whose maximum weight has never exceeded 0.1 are generally thrown away.
    Weight affects three parameters - the larger it is, the smaller the distance the tag is from the center, the larger the font size and the less transparency the text. Tags are sorted lexicographically. To all this, several additional factors have been added, which ensure the overall smoothness of departures and arrivals of tags on the "scene".

    Instead of a conclusion

    An attentive and boring reader will notice that it’s not Habrahabr’s blogs that are spinning on the video, but I already have the answer: the thing is that there are too many tags and they are too different, so without visual synonyms and semantic links between the tags, such visualization would not be very useful - too much important information would be lost. Visualizing the same categories, as in the previous article, does not make sense, because there are too few of them.
    Also, tags sometimes creep into each other, but if you look at the dynamics, rather than individual frames, readability is almost no problem.
    And yes, as for the name of the visualization, at first I wanted to call it a “whirlpool”, but as a result I came to the conclusion that the “atmospheric” analogy with a tag cloud would be better.
    Thanks for attention.

    Also popular now: