nil October 27, 2008 at 05:40

How the search for the path between users on LinkedIn works.

Recently, my colleagues and I discussed the social network LinkedIn and raised an interesting question. This site has the function of displaying the path from your account to some other account. The maximum number of steps is three, i.e. the way looks like this: you-> your friend-> friend of your friend-> the person you are looking for. The question is, how is this path searched? Is it searched every time you log in to someone else's account? Or have all the paths for all users been already calculated in advance? The first option leads to great temporal s m cost, the second - high costs for storage.

My colleagues and I were lucky, just in May this year at the JavaOne 2008 conference in San Francisco, LinkedIn representatives made two presentations about the architecture of their social network. In this blogAn excellent overview of the reports is given, but here is a translation of it. By the way, both posts contain links to the original presentations of LinkedIn representatives, if anyone is interested.

So it turned out that the entire user graph occupies 12 GB and is constantly held in RAM. But at the beginning of each session, a sample is created from a common graph personally for a given user, i.e. like a look at a social network from his point of view. This personal graph is stored in the cache throughout the session and takes approximately 2 MB for each user. It is updated only if the user makes any changes to it, for example, adds or removes one of his friends. In case changes come from other users, the graph remains unchanged. Obviously, the search for the path to any account occurs in this column.

Thus, LinkedIn implements a hybrid option, i.e. the search for the path occurs on the fly, but at the same time some part of the information (personal graph) is built in advance and stored in memory. An inquisitive reader may ask: “As in the case of a complete graph, as in the case of a personal graph, we will have to sort out all possible paths from the user to the person you are looking for. What then is the advantage of a personal graph? ” The thing is that graphs are represented either by arrays or linear lists. In both cases, in order to find an element, in the worst case scenario, iterate over all available elements. There are 22 million such items in the full graph (that's the number of registered users on LinkedIn). The size of the personal graph is obviously much smaller. For example, if the user Each of his friends and friends of their friends has 30 contacts, then the size of such a graph will be only 27931. In addition, in order to make sure that the path exists, you do not need to go through all possible paths, just check if the graph is present The account you are looking for. Indeed, if this graph was built as a “personal view of the network” with a maximum number of steps equal to three, then the presence of an account in it automatically means that the path to the user within three steps exists and it makes sense to look for it in general.

And finally, a small offtopic. I was surprised to find that the graph of the entire social network has 22 million vertices and 120 million edges. This means that each account has an average of just 5.45 friends. Apparently this is due to the fact that many people register, but then do not use their account.

Tags:

How the search for the path between users on LinkedIn works.

Also popular now: