Search for the most influential objects of a subset of a social network

In the modern world, relations between people, except for the social level, have occupied another one - digital. With the spread of virtual social networks, there is a tendency to have your own page with personal data, to search for friends by interests, create groups, etc. At one IT Talk meeting held by DataArt, I met a person who studied social network topologies. On this day, I completely decided on the topic of the master's thesis, which is presented by the title of the article. The fact is that the amount of information in social networks is constantly increasing, and most of this information is presented in raw form. In itself, it is of no interest. There was an idea to process such data and get results that could well serve a good cause.

This article discusses the search for the most influential objects. This information can be useful both for conducting various virtual marketing campaigns and for identifying users with suspiciously high activity.


We will designate a social network in the form of a graph whose nodes are people. If the objects are somehow interconnected (are friends, or correspond), then there is an arc between these objects.

Fig. 1

We give an intuitive concept of influence. Consider the following figure.

Fig. 2

Here the object in the center has 6 connections. But influence is not limited to the number of connections. We must take into account the degree of influence of the objects with which the target object is associated. Let's consider fig. 3.

Fig. 3

In the above figure, it can be seen that the object has 3 connections, but the objects with which it is associated have some influence in the network. It is necessary in this way to formalize the concept of influence in order to take into account both the number of connections and the influence of objects with which the target object is associated.


The impetus for research in this direction was this article [1]. I made corrections to the proposed algorithm, which in my opinion seem appropriate.
The graph will be represented as an adjacency matrix.

We introduce the concept of the iterated force of an object i of order k.

Note that the first-order iterated force of object i is the number of connections of this object with others. It does not yet take into account the influence of other objects. Starting from the second order, the influence of other objects is included in this amount.

Question : to what order is the vector of iterated forces taken?
Answer : either 2 or 3.

The fact is that the iterated force of an object i of order k expresses the degree of influence of object i, given that it can spread its influence no more than in radius k. So laid down in the sum itself. Remember when you asked your friend to ask someone to do something for you? This influence is in radius 2, and it is expressed by an iterated second-order force. If we want to take into account chains with another intermediate participant, then we need to consider the vector of iterated forces of the 3rd order. It seems impractical to consider iterated forces of greater order in view of the negligible probability of the emergence of such long chains in real life.
Thus, if we want to calculate the influence of objects on a large city scale and higher, then we count to the 3rd order. For smaller scales, it is advisable to use the second order.

Note that it is important to us not the numerical value of the iterated force, but how the forces for different objects relate to each other. Therefore, after calculating the vector of the iterated force of the next order, it is advisable to normalize this vector. As a norm, the maximum is taken according to the absolute values ​​of its elements.

We collect information

Information was collected from the social network Vkontakte. An application was written that downloads 2 sets of friends: up to the 2nd and up to the 3rd level. For clarity, consider Figure 4.
Fig. 4
Here friends of the 1st level are marked in red, green - of the second, yellow - of the third. Due to the fact that the VK API does not allow you to make more than 3 requests per second, you had to leave the machine turned on for the night to load the third level.

Fig. 5

Note that if you download 4 levels, then most likely we will download 99% of network users. There is a theory of six handshakes [2], which claims that between any two people on Earth there are no more than 5 levels of common acquaintances. Due to the fact that VK is distributed mainly in the CIS, this figure should be less.

Analyzing data

So, we have 2 data sets:
  • For 2 levels of friends - approximately 40 thousand users.
  • For 3 levels of friends - approximately 4.2 million users.

I myself do not sit in social networks, so in both cases the center of the sample was my friend, a student at the Voronezh State University.

Let's start by analyzing the first set.

Fig. 6

Here, the sampling center took first place in terms of influence. This can be explained by how the data was loaded. The fact is that the entire last level of friends contains only one connection. For these objects, the friends list is empty. Therefore, the influence of friends of the 1st level (which is largely based on friends of the 2nd level) is rated low. Because of this, we get not quite objective results. Nevertheless, interesting data can be observed: in the first 30 lines about half of the activists of the Voronezh State University are located.

Let's move on to the study of the second data set. Here, the center of the sampling by rating takes 3539 place.
It is understandable: here, influence is already being considered within the city.

Fig. 7

At the beginning of this table, the following entries are traced:
  • Voronezh Poster
  • Excursions
  • Voronezh
  • One hundred streams
  • Art Reality Vrn

As for the people in the first lines of this list, we can say that some of them are really famous people (mostly photographers) of our city. But the activity of some objects causes suspicion: too much degree of influence with a rather meager information about the person. These objects can be reviewed by network administrators for compliance with the information of reality.

PS If anyone has thoughts on how to improve this model, please unsubscribe in the comments.


  2. % B8_% D1% 80% D1% 83% D0% BA% D0% BE% D0% BF% D0% BE% D0% B6% D0% B0% D1% 82% D0% B8% D0% B9

Also popular now: