langoner November 7, 2013 at 12:19

Reader problems. Is heaven somewhere nearby?

From the sandbox

Each of us on the network has a set of sites on which we constantly go to get new, interesting information. I tried many times to start using the RSS aggregator and always abandoned this idea. Having patiently laid out all the tapes in my daddies, I soon realized that I had received not what I had expected. If you wanted to read about .NET, then the articles were mixed with others that were not related to the topic. Sometimes I want to read just about information technology, but alas, in my rubricator there was no such section. After some time, I forgot about the reader.

With the advent of social networks, the hope arose that the full power of the social graph would help in solving the problem of finding interesting materials. But if there is no particular need to make “virtual friends”, then you are on the edge of this graph, and the information does not reach, and if it does, then it is very late. Standing away from the place where "life is in full swing", only echoes of loud phrases are heard. You try to approach, but then it starts to fill up with a pile of unnecessary information and the hope of getting what you want is lost. People need to share photos, chat with friends. They do not upload photos of dogs in order to please their near and dear ones, but in order to receive a like in the piggy bank of their social capital. There is a misleading impression that a tiny percentage of people are striving to gain new experience and new knowledge in our time.

If we consider an ordinary social graph, the situation is as follows. If node A publishes some content that is interesting to node B, then node B will receive it if there is only a chain of network nodes connecting A and B, each node of which has "approved" this publication.

In order for you to get more information, you need to have more social connections. But if there are too many of them, it becomes unprofitable. Therefore, some turned their eyes toward the graphs of interest.

Hope for paradise

In the vastness of the network, the “model of arbitrary typed objects connected by arbitrary typed connections” was considered. Including there was an attempt to understand what can be built on this. On Habré the idea of the graph of interests was discussed, which is a special case of this model. The bottom line was that all interests (tags) are always unique, so the entire relevant audience focuses around them. Next, the problem was to arrange everything related to a specific tag. The authors of these publications outlined the principles by which this interaction will be built using the proposed ideology. Recently, many have taken steps towards this concept, including the main social networks. Watching the development of services implementing this idea with genuine interest, like many, I had my own view on the issues,

General wishes for the development of such networks can be expressed in the following points:

In such networks there will never be drunk pictures, or, more precisely, they will be, but only those who want it themselves will see them;
In such networks they will discuss useful things and solve pressing problems;
It will be important what you write, not how many subscribers you have;
We will decide for ourselves what will be in our tape, and not our social ties;
Any point of view has a right to exist. Any person can accept it, or not accept it, and technology should create a choice for a person which way to go;
There is no spam - the "self-moderated" community very quickly takes spammers beyond the relevance of the request.

The image of an “ideal social network” is a “window” into a multidimensional information space. This "window" is positioned in such a way as to provide a person with a slice of the information that meets his current needs. The needs in this context can be conditionally divided into two information flows - updates from areas with which constant contact is maintained and updates in areas that the user is interested in. New social ties are formed through local circles of communication. The interlocutors see each other's activity in the context of overlapping interests. Private data plays a smaller role. It is important who you are, not how old your dog is, and where you are at the moment.

Is heaven already here?

If we consider the steps of the main social networks in this direction, then this is:

Hashtags
Pages of Interest;
Groups.

In fairness, I must say that hashtags implement similar functionality, but quoting a comment on one of the articles - “if I want to read what they write about isomorphic-palliative dissonance, then what hashtag should I search on Twitter for?” Many people have the feeling that tags are useful and powerful things, but something is missing. The main problem of tags is the multiplicity of writing a tag with one value (including in other languages), and vice versa - the same spelling of tags with a multiple value.

About groups I must say separately. It is possible to project groups into interests, in fact this is the same interest to which activity is attached - this is the prototype of the group of “like-minded people”. And if you are not comfortable in the global interest group, you can "go down deeper." But each group has a critical mass of participants. If there are more people, this causes the “destruction” of a large group and / or the formation of smaller groups of identical topics. Many people recall the usenet and fido conferences: “When there were few people, it was interesting when the number of people grew, the best ones left.”

The problem of the groups is also that they have clear boundaries. You are either a member of the group or not. The publication belongs to the group and does not leave its borders. Ideally, the publication should relate with different weights to a particular topic / interest / group. Different things interest a person to varying degrees, in addition, there can be many interests, and it’s hard to browse groups for each interest. Based on interests, you can generate a single feed for the user.

It is also necessary to note the presence of connections between groups and their relationships with each other. This pushes some to study this phenomenon and use it in their projects. Just such connections fit well into the model of the graph of interests and like-minded people, when there are groups of like-minded people within the same interest, and the connections between them are made through “borderline” (common) users. Groups “approximate” such relationships, while interests with like-minded groups represent them more harmoniously. Another problem of the groups is that the participants in these communities do not know anything about other communities and do not want to hear anything about them. They perceive the new community as an encroachment on their territory, which impedes the exchange of valuable information.

Analyzing the above, we can conclude that the groups in this case do not live up to expectations. If you need something serious, it usually ends with harsh moderation and totalitarian measures by group administrators, a ban on entry, publication. While interests create end-to-end relationships. Interest (marker, tag) harmoniously implements the possibility of a refinement search among unrelated hierarchies.

We will build a new paradise

We will not delve into the jungle of the description of the theory, if we briefly imagine a graph in which the vertices are people and interests, and the ribs are the fact that the user has shown attention to any topic (interest).

This approach is implemented in many services inspired by this model. A post in the system is tied to a certain set of topics and spreads through the connections between all “fans” of interests. Thus, providing "delivery" of content on the network.

In such a system, several important questions arise:

How to maintain a base of interests?
How to rank content under one interest?
Incorrect publication interests.
Incomplete publication interests.

Base of interests

You can give a database to those at the mercy of users. As practice shows, this leads to chaos, the base is full of duplicates and insignificant topics.
You can fix the catalog and fill it with the forces of service administrators as needed. This entails the problem of initial content, the difficulty of adding new topics, and therefore the development of the resource.
It seems that the best way out is to take the existing rubricator as a basis. Wikipedia immediately comes to mind. This is a well-structured knowledge base that is ideal for the role of a directory of categories. Each of her articles is a heading to which a post can become attached.

Ranking content with one interest

Content ranking is a rather complicated topic, on which a sufficient number of articles have been written. If we recall the groups in social networks, then the appearance of groups of similar / identical topics can be considered as some kind of attempt to rank content within the network under one interest.

Some topics may be perceived differently by people. The relevance issue is rather complicated. Different people want to see different content in the same interest. Suppose I choose the theme. NET Framework. Personally, I'm not interested in entry-level articles on how to write Hello World. I would like to cut off these materials and get extra results were more or less interesting to me. Since there are many sources, all of them contain content of various values, I would like to get a tool that helps with the search for really interesting publications.

Deciding by the majority (average in the hospital) is a bad option, since each of us has different requirements for the quality of the content and outlook on life. Therefore, in this case, various algorithms of recommendations and personalization of issuance are better suited, on the subject of which not a lot of material has been written. I will say right away that I am a supporter of the application of this approach, I believe in mathematics and big data. But on the other hand, there is a skepticism about these technologies. In particular, there is the concept of a “filter bubble” that has received sufficient support.

Here's how Wikipedia describes Bubble filters. The concept developed by Eli Paraiser is a phenomenon in which websites use selective guessing algorithms what information the user would like to see based on information about his location, past clicks and movements of the mouse and his search history. As a result, websites only display information that is consistent with the user's current views. This is similar to a phenomenon in which people and organizations look for information that initially seems right to them, but it turns out to be completely useless or almost useless, and they avoid information that seems and is perceived by them as incorrect and inconsequential, but turns out to be useful.

Paraser in his book The Filter Bubble warns that a potential drawback of filtering search queries is that it “closes us from new ideas, objects, and important information” and “gives the impression that our narrow own interests are all that exists and surrounds us. " This brings potential harm to both the individual and society as a whole. Freedom of choice is very important - a person finds information himself and determines its usefulness.

I really liked the comment on one of the articles on the topic of personalization: “So we would never have met our fiancées. They would live with hackers like us. ” Nevertheless, the US Temporary Release Commissions use a special computer algorithm that recommends what to do with a particular prisoner based on 50–100 factors. In my opinion, the problem is not personalization itself, but the lack of controls for this tool (other than turning on / off).

It is possible to assume that in such a network the information of interest to the user will reach him faster. The user will receive high-quality content (in his personal opinion) with a greater probability.

Incorrect publication interests

Some users, trying to achieve the maximum distribution of their publications, form an excessive set of topics that are irrelevant to them. By this they destroy the very essence of the project. We need an effective tool to protect the service from turning into garbage.
As practice Habra shows, there should be people who will "collect" and organize information. But ideally, I would like to give the task to the algorithms.

Incomplete publication interests

The main task in implementing the concept of “graph of interests” is to make sure that any information is tied to all the interests to which it relates, but at the same time that these interests have different weight relative to each other. And so that this weight of interests would form a more accurate delivery of information for those who are really interested in it. In this case, automatic categorization of content can help.

Epilogue

Each of the above tasks deserves a separate discussion. I must say that they are solved with varying degrees of success by various projects, including those mentioned on Habré.
In the future, I would like to talk about my decision. Share thoughts on the difficulties of implementing each of them, both technical and algorithmic.
Thank you for your time in this publication.

Related Publications

Tags: