Analysis of implicit user preferences. Scientific and technical seminar in Yandex

    Analysis of implicit user preferences, expressed in clicks and page views, is the most important factor in ranking documents in search results or, for example, displaying ads and recommending news. Click analysis algorithms are well understood. But is it possible to find out something else about individual preferences of a person using more information about his behavior on the site? It turns out that the mouse trajectory allows you to find out which fragments of the viewed document interest the user.

    This issue was the subject of a study conducted by me, Mikhail Ageev , together with Dmitry Lagun and Evgeny Agishtein at Emory Intelligent Information Access Lab at Emory University.




    We studied data collection methods and algorithms for analyzing user behavior by mouse movements, as well as the possibility of applying these methods in practice. They can significantly improve the formation of snippets (annotations) of documents in the search results. The work with the description of these algorithms was awarded the diploma “Best Paper Shortlisted Nominee” at the international conference ACM SIGIR in 2013. Later, I presented a report on the results of the work done in the framework of scientific and technical seminars in Yandex. You will find his summary under the cut.

    Snippets are an essential part of any search engine. They help users search for information, and the usability of the search engine depends on their quality. A good snippet should be readable, should show parts of the document that match the user's request. Ideally, the snippet should contain a direct answer to the user's question or an indication that the answer is in the document.



    The general principle is that the text of the request is compared with the text of the document, in which the most relevant sentences containing the words of the query or the extension of the query are highlighted. The formula for calculating the most relevant fragments takes into account matches with the query. It takes into account the density of the text, the location of the text, the structure of the document. However, for highly relevant documents that pop up at the top of search results, textual factors are often not enough. Words from the query can repeatedly occur in the text, and it is impossible to determine which fragments of the text answer the user's question based on only textual information. Therefore, additional factors are required.

    When viewing a page, user attention is not evenly distributed. The main attention is paid to those fragments that contain the required information.

    We conducted experiments using equipment that tracks the movement of the eye pupil with an accuracy of several tens of pixels. Here is an example of the distribution of the heat map of the trajectory of the pupil of a user who was looking for an answer to the question of how many broken pixels should be on the iPad 3 so that it can be replaced under warranty. He enters the query [how many dead pixels ipad 3 replace], which leads him to the Apple Community Forums page with a similar question. On the page, the words from the query are found many times, however, the user focuses on the fragment that really contains the answer, which is visible on the heat map.



    If we could track and analyze the movements of the pupils of a larger number of users, we could only select ideal snippets for various queries based on these data. The problem is that users do not have the tools for eye tracking installed, so you need to look for other ways to obtain the necessary information.

    When viewing web documents, users usually make mouse movements and scroll pages. In their 2010 article, K. Guo and E. Agistein note that along the trajectory it is possible to predict eye pupil movements with an accuracy of 150 pixels and a fullness of 70%.



    Below is a heat map of mouse movements when viewing a document found by query [worst drought in US]. It can be seen that the most activity is traced precisely on the fragment containing information about the most severe droughts in the USA, it is from it that you can form the perfect snippet.



    The idea of ​​our study is that data on mouse movements can be collected using the JavaScript API that works in most browsers. Based on user behavior, we can predict which fragments contain information relevant to the request, and then use this data to improve the quality of snippets. In order to implement and test this idea, it is necessary to solve several problems. First, you need to understand how to collect realistic and fairly large-scale data on user behavior behind the search results page. Secondly, you need to learn how to identify the fragments most interested in the user by using the mouse movements. Users have different habits: some like to highlight readable text or just hover over it, while others open a document and read it from top to bottom, occasionally flipping it down. At the same time, users can have different browsers and input devices. In addition, the volume of data on mouse movements is two orders of magnitude higher than the volume of data on clicks. Also, the task is to combine behavioral factors with traditional textual ones.

    How to collect data


    To collect data, we used the infrastructure that we developed in 2011. The main idea is to create a game similar to the Yandex Search Cup. The player is set a goal for a limited time using the search engine to find the answer to the question on the Internet. The player finds the answer and sends it to us along with the URL of the page where it was found. Participants are selected through Amazon Mechanical Turk. Each game consists of 12 questions. A guaranteed payment of $ 1 is assumed for participation in the game for approximately forty minutes. Another one dollar get 25% of the best players. This is a fairly cheap way to collect data, which at the same time gives a wide variety of users from all over the world. Questions were taken at Wiki.answers.com, Yahoo! Answers and the like. The main condition was the lack of ready-made answers on these sites themselves. At the same time, the questions should not have been too simple, but had a clear short answer, which can be found on the Internet. In order to cut off robots and unscrupulous participants, it was required to implement several stages of checking the quality of results. Firstly, there is captcha at the entrance to the system, secondly, the user needs to answer 1-2 trivial questions, and thirdly, the user must complete the task using our proxy server, so that we can verify that he really asked questions to the search engine and visited the answer page.

    Using standard modules for the Apache HTTP server mod_proxy_html and mod_sed, we implemented proxying of all calls to search services. A user visited our page, saw the familiar search engine interface, but all the links there were replaced by ours. By clicking on such a link, the user will be taken to the desired page, but our JavaScript code was already built into it that monitors behavior.

    When logging, a small problem arises: the position of the mouse is represented by the coordinates in the browser window, and the coordinates of the text in it depend on the screen resolution, version and settings. We need an exact binding precisely to the text. Accordingly, we need to calculate the coordinates of each word on the client and store this information on the server.

    The results of the experiments were the following data:



    From the point of view of statistics, the data is as follows:



    Code and collected data are freely available at this link .

    Prediction of fragments of interest to users


    To highlight snippets, the text is divided into fragments of five words. For each fragment, six behavioral factors are distinguished:

    • Duration of the cursor over the fragment;
    • Duration of the cursor next to the fragment (± 100px);
    • The average speed of the mouse over the fragment;
    • The average speed of the mouse next to the fragment;
    • The time the fragment was displayed in the visible part of the viewing window (scrollabar);
    • The time the fragment was displayed in the middle of the viewport.

    With the help of machine learning, all these six factors are collapsed into one number - the probability of the fragment's interest. But first, we need to form a learning set. At the same time, we do not know for certain what really interested the reader, what he read, and where he found the answer. But we can take as positive examples fragments that intersect with the user's response, and as negative examples, all other fragments. This training set is inaccurate and incomplete, but it is quite enough for learning the algorithm and improving the quality of snippets.

    The first experiment is to verify the adequacy of our model. We have trained the algorithm for predicting the interest of a fragment on one set of pages and apply it to another set. The graph on the x-axis shows the predicted probability of interest of the fragment, and the y-axis shows the average value of the measure of intersection of the fragment with the user's response:



    We see that if the algorithm is pretty sure that the fragment is good, then this fragment has a large intersection with the user's response.

    When constructing the machine learning method, the most important factors were DispMiddleTime (the time during which a piece of text was visible on the screen) and MouseOverTime (the time during which the mouse cursor was over a piece of text).

    Improved behavior analysis snippets


    So, we can determine which fragments interested the user. How can we use this to improve snippets? As a starting point, we implemented a modern snippet generation algorithm published by researchers from Yahoo! in 2008. For each sentence, a set of textual factors is calculated and a machine learning method is constructed to predict the quality of the fragment from the point of view of highlighting snippets using assessors on a scale of {0,1}. Then, several machine learning methods are compared: SVM , ranking SVM, and GBDT. We added more factors and expanded the rating scale to {0,1,2,3,4,5}. To form a snippet, one to four sentences from the set of the best are selected. Fragments are selected using the greedy algorithm, which collects fragments with the total best weight.

    We use the following set of textual factors:

    • Exact match;
    • The number of query words and synonyms found (3 factors);
    • BM25 -like (4 factors);
    • The distance between the words of the query (3 factors);
    • Offer length;
    • Position in the document;
    • Readability: the number of punctuation marks, headwords, various words (9 factors).

    Now that we have the fragment weight in terms of text relevance, we need to combine it with the fragment's interest factor, calculated from the user's behavior. We use a simple linear combination of factors, and the weight λ in the fragment quality calculation formula is the weight of the behavior.



    We need to choose the correct weight λ. There are two extremes: if the value of λ is too small, then the behavior is not taken into account and snippets are different from baseline, if the value of λ is too large, there is a risk that we will lose as snippets. To select λ, we conduct an experiment with a choice of five values ​​from zero to unity {0.1,0.3,0.5,0.7.7.9}. To compare the experiments, we scored assessors who compared snippets in pairs by three criteria:

    • Representativeness: which of the snippets better reflects the compliance of the document with the request? You must read the document before answering the question.
    • Readability: which of the snippets is better written, easier to read?
    • Judjeability: which of the snippets better helps to find the relevant answer and decide whether to click on the link?

    The graphs below show the fractions of pairs of snippets in which the behavioral algorithm showed an improvement in quality for three criteria and five values ​​of λ. For each of the values ​​of λ, the assessors gave a different number of estimates, and a different number of snippets differ in quality. Therefore, confidence intervals for each of λ are somewhat different. We see that for λ = 0.7 we get a statistically significant improvement in the quality of the snippet for each of the criteria. Coverage for these snippets is also quite large: 40% of snippets based on behavior are different from baseline.



    Key assumptions and limitations of the considered approach


    First, experiments were conducted on informational issues when the user searches for the text of the answer in the documents. However, there are other types of user intent: for example, commercial, navigation. For such requests, behavioral factors may interfere or require a different accounting method. Secondly, by setting up the experiment, we assume that page views are grouped by information need. In our experiments, all users for each document request pair searched for the same thing. Therefore, we aggregate data for all users, calculating the average value of the fragment weight for all users. In the real world, users can ask the same query and view the same document for different purposes. And for each request we need to group users by intent, to be able to apply these methods and aggregate behavior data. And thirdly, in order to implement this technology in a real system, you need to find a way to collect data about user behavior. There are already browser plugins, ad networks, and hit counters that collect data about user clicks. Their functionality can be expanded by adding the ability to collect data on mouse movements.

    Among other applications of the method, the following can be noted:

    • Improved Click Model by predicting P (Examine | Click = 0). If we track only clicks, then we can’t say with certainty why the user did not click on the link in the search results. He could read the snippet, and decide that the document is irrelevant, or he simply did not see the document. With the use of mouse tracking, this problem disappears, and we can significantly improve the prediction of the relevance of the document.
    • User behavior on mobile devices.
    • Classification of mouse movements by intent. If you complicate the model, you can learn to distinguish random mouse movements from intentional ones, when the user really helps himself to read with the cursor. In addition, it is possible to take into account moments of inaction as one of the additional signs of the fragment's interest.


    After the report, a Q&A session took place, which can be viewed on video .

    Also popular now: