Analysis of VKontakte on the example of book preferences of participants in cultural communities
Fig. 3. - Books of which authors from the top100 rating are read by Vkontakte users.
For all diagrams in the article there are interactive visualizations: graphgrail.com/gg-client/vk_books.html
By 2014, the potential of traditional approaches to the development of analytics of social processes was exhausted for several reasons, the main one is the inability of the decisions created within the framework of these approaches to adapt to the changing conditions for the formation of social laws. We are talking about their lack of dynamism and inability to process data arriving in large volumes in a time mode close to real. But the most serious blow to classical analytics has been the explosive growth of unstructured data. 
In the analysis of the social network in this work, we rely on the concept of “Big Data” (BIG Data) - a series of approaches that allow you to work with large amounts of data that are difficult or even impossible to manage using conventional means - they have a different structure and significant replenishment speed .
Within the framework of the used special technological stack, many of the listed problems are solved, the stack combines the following technologies in a single interface:
- Graph theory as an innovative component of unstructured data processing technology 
- Natural Language Processing
- Information Retrieval Technologies (Data Mining)
In this paper, the collection and statistical analysis of user data of the VKontakte social network is considered using 13 different types of groups, events and communities of a cultural orientation: theaters, cinemas, museums, festivals, libraries, bikers, night clubs, music groups, philharmonic societies, cultural news , yoga, bars, art cafes, anticafe . In total, 899 communities of the above categories were collected and processed with a restriction on geography: the communities of the city of Rostov-on-Don were considered. A total of over 65,000 participants were collected from these communities. Information about the participant includes a wide range of both personal and socially significant fields: gender, date of birth, education, political views, attitude to alcohol and smoking, the participant is married, interests, list of favorite books.
One of the important criteria for involvement in cultural processes is reading literature. Members of cultural communities often indicate in their personal data those books or authors that they love. We set the task to analyze the book preferences of the participants in order to obtain relevant data on the cultural trends of modern society. Analyzing the social network, we obtain the following data:
- The overall picture of the book preferences of the most cultural representatives of the social network,
- Detailed statistical sections for various categories of groups, with gender, age and other data of participants,
- A quantitative analysis of the book preferences of community members with a division into works and authors,
- A qualitative analysis of the participants' favorite books, with the possibility of subsequent comparison with the cultural needs and trends of the state and society.
The collected data allows, for example, to assess the degree of correspondence of the favorite books of the group members to the opinion of Russian expert book lovers, who made a list of the top 100 books.
The rankings are based on the results of voting of visitors 100bestbooks.ru . Voting involves works of fiction of any length, of any genre, written in any language at any time. The voting system allows you to vote both for and against. Registration is not required to vote. Voting is unlimited. At the moment, the list is as follows:
1. Mikhail Bulgakov - Master and Margarita
2. Leo Tolstoy - War and Peace
3. Fedor Dostoevsky - Crime and Punishment
4. Fedor Dostoevsky - The Brothers Karamazov
5. Leo Tolstoy - Anna Karenina
6. Fedor Dostoevsky - Idiot
7. Nikolai Gogol - Dead Souls
8. Alexander Pushkin - Eugene Onegin
9. Mikhail Bulgakov - Dog Heart
10. Mikhail Lermontov - Hero of Our Time
11 Anton Chekhov - Stories
12. Victor Hugo - Les Miserables
13. Ilya Ilf, Evgeny Petrov - Twelve Chairs
14. Erich Maria Remarque - Three Companions
15. Alexander Dumas - Count Monte Cristo
16. Ivan Turgenev - Fathers and Sons
17. Fedor Dostoyevsky - Demons
18. Arthur Conan Doyle - The Adventures of Sherlock Holmes
19. Nikolai Gog ol - Taras Bulba
20. Alexander Griboedov - Woe from Wit
Listing. 1. - Rating of the 100 best books (see the full and current list at http://www.100bestbooks.ru/ )
Given the various and rather diverse spellings of your favorite books by group members, the rating was divided into two lists: a list of authors of works and a list of themselves titles of works. This separation made it possible to obtain detailed sections.
Consider the age composition of all participants in cultural groups (see Fig. 1). You can observe 2 pronounced peaks in the dates of birth of the participants: from 1987 to 1989 more than 8000 people were born, and the age of most of the active users of the considered groups ranges from 20 to 30 years. These data directly correlate with the average age of users of the social network.
Fig. 1. - The age composition of all participants in cultural groups.
Moreover, the age distribution practically does not depend on the subjects of the groups (the exception is the “Cinemas” group, where while maintaining the average age of the participants 20-30 years old, there is no clear peak, the maxima in the histogram of the distribution of birthdays are relatively even fall in the period from 1985 to 1992.).
An analysis of the book preferences of participants in cultural groups showed that M. Bulgakov and his novel “The Master and Margarita” are absolute leaders by reference. In the top are also Dostoevsky, Strugatsky and Remarque. It is worth noting that the list of favorite books contains various genres, as well as classics and books by contemporary authors. For example, among modern authors, V. Pelevin and P. Coelho (not represented in the list of 100bestbooks.ru) are leading, mystical / esoteric authors are represented by K. Castaneda and R. Bach (see Fig. 2).
Fig. 2. - Which books are most often indicated in the “favorite books” field by Vkontakte users
Understanding the preferences of a cultural audience, you can compare them with a rating of 100bestbooks.ru. Such a comparison will show which authors and works from the rating are read by the participants. Observation shows that Dostoevsky and Tolstoy (in various spellings) are more common than Bulgakov. On the whole, the top ten is 90% identical to the top ten ranking of the top 100 best books (see Fig. 3).
Fig. 3. - Books of which authors from the top100 rating are read by Vkontakte users
Characteristically, the Bikers group is knocked out of the general trend, where modern writer Sergey Lukyanenko (absent from the 100bestbooks.ru rating) takes first place. In addition, it should be noted that the “Musical Groups” group turned out to be the only one that did not express a positive attitude towards reading: the place in the histogram of your favorite books is “no”, the second “all” (obviously, this answer is not sincere), and the sixth the most popular answer is “I don’t like reading”.
Similar literary preferences are observed among members of the “Artkafe”, “Antikafe” and “Bars” groups, and these groups do not show similarity of preferences with the “Nightclubs” group.
Fig. 4. - Comparison of several groups by authors
Let us now consider which works from the rating are most often found in the audience (see Fig. 4). An interesting observation of success is the one-hundred-year-old novel by G. Marquez, ranked 45th in the rating — he ranks second in the participants ’preferences, even ahead of F. Dostoevsky’s Crime and Punishment.
Fig. 5. - What works are read by Vkontakte users.
We can also compare different groups in pairs. On the diagram “The books of which authors from the top 100 rating are read by Vkontakte users”, 2 groups of messages are compared: bikers and visitors to cultural events. An interesting observation: communities are similar in love to Pushkin, Bulgakov and Remark. But they differ greatly in another: among bikers are not popular Dostoevsky, Tolstoy and Gogol.
Fig. 6. - Comparison of biker communities and cultural events
Another interesting comparison: how do the participants in the groups of bars and cinemas differ in their preferences? The figure shows that Crime and Punishment are not among the favorite books of movie theater visitors. At the same time, there are some similarities in foreign classics (Three Comrades, Romeo and Juliet).
Fig. 7. - Comparison of bar and movie theater communities by product
We can also compare the age difference: the figure shows that, in general, the distribution of birth dates of visitors to theaters and nightclubs is similar, there is only a slight shift towards theaters from 1980-1987. This is expected: at the age of 30-35, people are more interested in live theater performances, and they are less attracted to the “special effects” of films.
Fig. 8. - Distribution of ages of participants in cultural communities of Vkontakte: theaters and nightclubs.
Consider the basic statistical samples for theater communities (theater), see Fig. 9.
Fig. 9. - Theater statistics
In addition to standard information, such as the expected predominance of women in theatrical communities, data on relationships, bad habits (attitude to alcohol, smoking), books and the interests of participants are also received. In particular, analyzing the gender composition of the participants in theater groups, an extremely uneven distribution can be noted: the proportion of women is more than 70%. This observation is explained by an understandable and consistently high interest in theatrical productions in women. At the same time, the picture of statistics on cinema communities (cinema) looks different, (see Fig. 10):
Fig. 10. - Cinema statistics
The ratio of men and women in these groups is approximately equal; you can also evaluate the books  they read.
So, the analysis of data from social networks, in particular the VKontakte social network, allows you to quickly receive a large stream of data about the preferences and interests of community audiences. But the greatest value is real-time data acquisition, which makes it possible to track dynamics, analyze cultural trends, help in the formation of state policy in the field of cultural development of a society, quickly identify weaknesses in cultural and moral education, and conduct informational warfare for “minds” and values . This, incidentally, is reflected in the new military doctrine of Russia.
To learn more and read more such articles, please visit our website http://graphgrail.com/ In the comments write what kind of analytics you would be interested to read.
- Rozin M.D., Svechkarev V.P., Kontorovich S.D., Litvinov S.V., Nosko V.I. Problems of monitoring social networks as a platform for RuNet social communication // Scientific Thought of the Caucasus. Interdisciplinary and Special Studies, 2011, No. 2. S.65-77.
- Nosko V.I. The system of automated graph construction of a social network // Engineering Bulletin of the Don, 2012, No. 4. URL: ivdon.ru/magazine/archive/n4p2y2012/1428
- Kontorovich S.D., Litvinov S.V., Nosko V.I. Methodology for monitoring and modeling the structure of the politically active segment of social networks // Engineering Bulletin of the Don, 2011, No. 4 URL: ivdon.ru/ru/magazine/archive/n4y2011/642
- MongoDB is an open-source document database, and the leading NoSQL database. Written in C ++. URL: mongodb.org
- Newman, Mark EJ “The structure and function of complex networks.” SIAM review 45, no. 2 (2003): pp. 167-256.
- Bird Steven. Natural Language Processing with Python. - O'Reilly Media Inc, 2009 .-- 482 p.