mefa April 27, 2009 at 01:27

Recommendation Systems: Introduction to Hybrid Systems

^{Recommendation systems:

- Tips from the car

- Cold start

- Introduction to hybrid systems

- Artificial immune systems and the effect of idiotypes Let's}

continue from the moment we stopped at the last time : we looked at several ways to solve the cold start problem, now I suggest considering other problems recommendation systems (hereinafter simply SR) and think about how different types of SRs can complement each other. Immediately make a reservation that I will not consider in detail how to solve a particular problem. The purpose of this article is only to help developers navigate the varieties of CP and related problems.

To begin with, you still have to supplement the classification of CP.Przemyslaw Kazienko and Pawel Kolodziejski proposed to divide all SR into five types: statistical, collective, associative and informational. Let's start with the simplest.

Statistical CP (Statistical approach) - are systems that are based on statistical data collected from users. Simply put, these are the ratings of the most-usual for us: the most downloaded, the most read, the most popular, etc. Obviously, unlike all SRs that we examined earlier, this approach is not personalized and offers the same recommendations for all users.
Demographic CP (Demographic recommendation) comparing characteristics of the object with characteristics of the user. Simple discrete properties can be considered: age, gender, place of residence, nationality and more complex ones, for example, the user's interest in a particular object. Some of them can be easily determined by the system independently: the country and even the city in which the user lives can be determined by IP , the preferred language can be learned from the properties of the browser using JavaScript. Other data can be asked to specify the user explicitly. Despite the fact that the amount of information received about the user will be quite limited, it can help make important decisions, for example, to whom to recommend the doll, and to whom video for adults.
Associative CP ( Association rules ) build data-driven recommendations on what objects are used together. A striking example of the use of such systems is the analysis of user purchases. For example, someone bought a new phone and, most likely, he will need headphones, a cover, charging, an additional memory card and other accessories. Other SRs here can be powerless, because these objects do not have any common parameters and it is not possible to compare them, they are united only by the fact that they are used together.
Informational CPs (Content based; I did not find a generally accepted translation, it would be more accurate if the content-based recommendation systems , but this is too long) are systems that look for objects that are similar to those that the user has already positively rated. Unlike associative systems, they can operate with almost any data: from the simplest binary values (bought - not bought) to complex text descriptions. In a previous article, we examined a couple of important methods used in implementing systems of this type, including the popular vector model . Looking ahead, I’ll say that this is one of two systems that can solve the problem of a cold start.
Collective (collaborative) SR ( Collaborative filtering ) are the most common systems that are guided by the ratings of other users to predict the rating of a particular object. Although such systems are quite effective, their accuracy greatly depends on how much the evaluated objects intersect between individual users. The more and more often these intersections, the more accurate the system will be. This fact limits the scope of this model; if each user has their own unique information that no one else has seen, collective CPs will be powerless.

The pros and cons are illustrated more clearly by the table that I found in the article of the already mentioned Przemyslaw Kazienko and Pawel Kolodziejsk " Personalized Integration of Recommendation Methods for E-commerce " (pdf), which I supplemented with another column.

Method	Data source	User affection	Attachment to an object (context)	Solving the problem of a new facility	Solving the problem of a new user	Solving the problem of intersecting objects	Takes into account quality aspects
Statistical	Ratings, views, downloads, etc.	-	-	-	+	+	+
Demographic	Object and user characteristics	+	-	+	+	+	-
Associative	General applications	-	+	-	+	- / +	-
Information *	Object Properties	+ / -	+ / -	+	+ / -	+	-
Collective	Ratings	+	-	-	-	-	+

* In this example, the authors implied that the information method will be used solely to compare two objects, but if the results of this comparison are subsequently applied to a specific user, other problems will arise.

Attachment to the user means that the system needs to identify the user in order to give him advice. Obviously, all personalized SRs suffer from this problem. This problem can be solved by simple authorization or other methods of user identification.
Attachment to an object refers to the context in which a particular CP can be applied. For some systems, for example, associative and informational, it is required that the user select a specific object from which she can repel (look for other objects similar or complementary to the selected one). Other systems can give recommendations in any context on any topic.
Solving the problem of a new facility is one of the variations of the cold start problem. Only newly added objects can be used only by systems that use its properties directly, which are usually set during creation. For statistical and collective SRs, it will take some time until a sufficient number of users evaluate it. For associative systems, this problem can be more complicated, since you will have to look for new patterns for using the object.
Solving a new user’s problem is another variation of the cold start problem, only in this case we don’t know anything about the user. Obviously, this will not be a problem for associative and statistical SRs, since they are generally independent of the user. For a demographic system, this will not be a problem if, at the time of user creation, at least some data was specified that the system can use. For a collective system, it will take time until it can find out the information it needs.
The solution to the problem of intersection of objects is characteristic of almost all SRs, except for collective and partially associative ones. As I already wrote, their effectiveness depends on how often the compared objects intersect. For a collective system, this means that the more people look at some material, the more accurately it can form an opinion about it. For associative - that the more often the object intersects with some narrow set of other objects, the easier it will be for the system to identify patterns of its application. Note that if it intersects with a very large number of objects, this will only confuse the system.
It takes into account the qualitative aspects - this is the point that is responsible for whether the system takes into account the quality of the object. It was not in the original table, but which, in my opinion, is also worth considering. The fact is that most methods looking for similarities between objects do not take into account their quality. They can find two similar news, but if one of them is interesting, then this does not mean that the second also. In this case, the system cannot be sure that what it recommends to the user is not complete trash. This problem can be solved only by those systems that take into account the ratings of objects, that is, collective and statistical. Moreover, ratings can be obtained both by explicit methods and by not explicit observations.

The table clearly shows that different systems can effectively complement each other. In order to choose a successful bundle of systems, it is imperative to consider what data they will work with and in what context to apply. If this is not done, then all the enormous work to introduce a new system may not give a noticeable increase in accuracy.

PS The
article turned out to be quite voluminous and without concrete examples, but I hope that it helped someone more clearly imagine what different recommendation systems are. I’m afraid that in the near future I will not be able to continue the cycle, but in order to compensate for your time spent I can give a few links that I should pay attention to further study this topic (unfortunately, they are all English-speaking, I did not find anything at all in Runet ):
- Robin van Meteren and Maarten van Someren: " Using Content-Based Filtering for Recommendation " (pdf);
- Przemyslaw Kazienko and Pawel Kolodziejski: " Personalized Integration of Recommendation Methods for E-commerce " (pdf);
- Michael J. Pazzani: " A Framework for Collaborative, Content-Based and Demographic Filtering " (pdf);
- Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze: " Introduction to Information Retrieval " - everything related to the classification of texts.
Almost all mathematical algorithms mentioned in them have information on Wikipedia.

Original on my blog

Tags:

Recommendation Systems: Introduction to Hybrid Systems

Also popular now: