Analysis of VK friendships with Python

Recently, an article appeared on Habré about the implementation of friendships in VKontakte using Wolfram Mathematica. I liked the idea, and, of course, I wanted to make the same graph using Python and d3 . That's what came out of it.

Attention! Parts of the code will be present in the article, describing the most important actions, but it should be noted that the project will undergo more than one change in its code base. Those interested can find the sources on GitHub .

We divide the task into elements:
  • Creation and authorization of the application.
  • Receive data.
  • Graph visualization.

What we need for this:
  • Python 3.4
  • requests
  • d3
  • Mozilla FireFox, since in Chrome you cannot use XMLHttpRequest to load local files (no one bothers to make python -m http.server 8000 )

Creation and authorization of the application


To get access to the VKontakte API, we need to create a Standalone application , after which we can use the API methods we need, which will be described later. The application is created here - select the Standalone application . We will be asked to enter a confirmation code sent to your mobile, after which we will be taken to the application management page. On the Settings tab, the application ID for access_token is useful to us .
Next, we need to authorize our application. This process consists of 3 stages.

User Authentication on VK

To do this, configure the url, as shown below:


https://oauth.vk.com/authorize?client_id=IDприложения&scope=friends,offline&redirect_uri=https://oauth.vk.com/blank.html&display=page&v=5.21&response_type=token

Quoting vk.com/dev/auth_mobile :
APP_ID - identifier of your application;
PERMISSIONS - requested application permissions;
DISPLAY - appearance of the authorization window, supported: page, popup and mobile.
REDIRECT_URI - the address to which access_token will be transmitted.
API_VERSION is the version of the API you are using.

In our case, PERMISSIONS is access to friends and to the API at any time from a third-party server (perpetual token). If the address is formed correctly, we will be prompted to enter a username and password.

Permission to access your data

Next, we allow the application access to the necessary information:

Getting access_token

After authorization of the application, the client will be redirected to REDIRECT_URI. The information we need will be enclosed in a link.

https://oauth.vk.com/blank.html#access_token=ACCESS_TOKEN&expires_in=0&user_id=USER_ID

We edit the settings.py file , inserting the received access_token and user_id there . Now we can make requests to VK API.

Data retrieval


First, we will analyze the methods that we will use for this purpose.

Since we need at least some information about the id of the user on which the graph will be built, users.get is useful to us . It takes both one id and several, a list of fields, information from which we need, as well as a case in which the last name and first name will be inclined. My base_info () method gets an id list and returns information about a user with a photo.

def base_info(self, ids):
		"""read https://vk.com/dev/users.get"""
		r = requests.get(self.request_url('users.get', 'user_ids=%s&fields=photo' % (','.join(map(str, ids))))).json()
		if 'error' in r.keys():
			raise VkException('Error message: %s. Error code: %s' % (r['error']['error_msg'], r['error']['error_code']))
		r = r['response']
		# Проверяем, если id из settings.py не деактивирован
		if 'deactivated' in r[0].keys():
			raise VkException("User deactivated")
		return r

This may be important for those who want to send the id from friends.getMutual into it , thus giving rise to a huge number of requests. More about this later.
Now we need to get information about the user's friends, which the friends.get method will help us with . Of all its parameters listed in the documentation, we use user_id , which is located in our setting.py and fields . Additional fields will be id of friends, their names, surnames and photos. After all, I want the nodes to have thumbnails of their photos.

def friends(self, id):
		"""
		read https://vk.com/dev/friends.get
		Принимает идентификатор пользователя
		"""
		r = requests.get(self.request_url('friends.get',
				'user_id=%s&fields=uid,first_name,last_name,photo' % id)).json()['response']
		#self.count_friends = r['count']
		return {item['id']: item for item in r['items']}

Then comes the fun part.
The id list of common friends between two users returns the friends.getMutual method . This is good because we only get id, and we already have more advanced information, thanks to friends.get . But no one forbids you to make an extra hundred or two requests using users.get . Schemes are located a little lower.
Now let's decide how we will use friends.getMutual . If the user has N-friends, then you need to make N-requests so that for each friend we get a list of mutual friends. In addition, we will need to make delays so that we have a valid number of requests per second.
Suppose the id we scan has 25 friends.
Only 52 requests is too much, so remember that users.get can accept an id list:
25 friends - 28 requests, but as described above, we already have information, thanks friends.get .

And here execute is useful to us , which will allow us to run a sequence of methods. It has a single parameter code , it can contain up to 25 calls to API methods.
That is, in the end, the code in VKScript will be something like this:

return {
“id": API.friends.getMutual({"source_uid":source, "target_uid":target}), // * 25
...
};

There are those who write how to shorten this code without using API.friends.getMutual all the time .
Now we just need to send in batches of friends id of 25 each. In our example, the circuit will look like this:

But we could use for to send each friend to friends.getMutual , and then find out more detailed information through users.get .
Next, we will compose a human-readable structure, where instead of the id of a friend and the id list of your mutual friends, there will be information from friends.get . As a result, we get something like:

[({Ваш друг}, [{общий друг}, {еще один общий друг}]),({Ваша подруга}, None)]

In the dictionaries there is id, first name, last name, photo, in the lists - dictionaries of common friends, if there are no common friends, then None. Tuples are all divided.

def common_friends(self):
		"""
		read https://vk.com/dev/friends.getMutual and read https://vk.com/dev/execute
		Возвращает в словаре кортежи с инфой о цели и списком общих друзей с инфой
		"""
		def parts(lst, n=25):
			""" разбиваем список на части - по 25 в каждой """
			return [lst[i:i + n] for i in iter(range(0, len(lst), n))]
		result = []
		for i in parts(list(self.all_friends.keys())):
			# Формируем code (параметр execute)
			code = 'return {'
			for id in i:
				code = '%s%s' % (code, '"%s": API.friends.getMutual({"source_uid":%s, "target_uid":%s}),' % (id, 
								self.my_id, id))
			code = '%s%s' % (code, '};')
			for key, val in requests.get(self.request_url('execute', 'code=%s' % code)).json()['response'].items():
				if int(key) in list(self.all_friends.keys()):
					# берем инфу из уже полного списка
					result.append((self.all_friends[int(key)], [self.all_friends[int(i)] for i in val] if val else None))
		return result

So, if you want to see your list of friends and common friends with them, run:

python main.py

Graph visualization


The choice fell on d3 , namely on Curved Links . To do this, generate json , which will be something like this:

{ 
"nodes": [ 
        {"name":"Myriel","group":1, "photo": "path"}, 
        {"name":"Napoleon","group":1, "photo": "path"}, 
        {"name":"Mlle.Baptistine","group":1, "photo": "path"} 
        ], 
"links":[ 
        {"source":1,"target":0,"value":1}, 
        {"source":2,"target":0,"value":8} 
        ] 
}

Modifying index.html a bit , photos of friends become nodes.

If you want to immediately visualize the graph:

python 2d3.py

The miserables.json file appears in the web folder . Do not forget to open index.html in Mozilla FireFox or use python -m http.server 8000 and open it in Chrome.

Visualization slows down when there are a lot of friends, so for the future I’m thinking about using WebGL.

This is how the friendship graph of one of my friends looks like. Connections are everything.

Of course, I was wondering who works faster.

The article that inspired me says:
On my 333 friends, it took 119 seconds.

At the time of this writing, Himura had 321 friends on VKontakte. It took me 9 seconds (the whole program, not just friends.getMutual ).

Finally


All the necessary information about the methods used can be found in the generously written documentation of VKontakte, however, I found a couple of errors: the error code 15 was not described ( 'error_msg': 'Access denied: user deactivated', 'error_code': 15 ), you can guess , what it means, and uid instead of user_id in the documentation for the friends.get method. 2 days later:


As stated at the beginning, the project can be found on GitHub , I will be glad if someone else likes it and I get a lot of delicious pull requests ...

UPD (05/27/2014):
As WTFRU7 prompted me , I added the ability to use stored procedures. To do this, follow the link .
Create the getMutual stored procedure . Copy the contents of execute_getMutual.js to the form and save. Do not forget to download a newer version. The final form of our scheme will be as follows:

UPD (06/16/2014):
We get an unlimited token.
UPD (07/11/2014):
Explanation schemes have been added.
UPD (11/14/2014):
Continuation

Also popular now: