How we learned to predict a user’s request and accelerated the load of search results

    Search hints (sjest) is not only a user service, but also a very powerful language model that stores billions of search queries, supports fuzzy search, personalization, and much more. We learned how to use the sujest in order to predict the final query of the user and load the search results before clicking the "Find" button.


    The introduction of this technology - the pre-renderer - required many interesting solutions in mobile development, the development of search runtime, logs, and metrics. And, of course, we needed a cool classifier that determines whether to load a search query in advance: this classifier must strike a balance between download acceleration, additional traffic and Search load. Today I will talk about how we managed to create such a classifier.




    1. Will the idea work?


    In research tasks, it is rarely clear in advance that a good solution exists. And in our case, we, too, initially did not know what data was needed in order to build a sufficiently good classifier. In such a situation, it is useful to start with a few very simple models that will allow you to evaluate the potential benefits of the development.


    The simplest idea turned out to be the following: we will load the search results at the first prompt from the search suggestion; when the prompt changes, we throw out the previous download and start downloading a new candidate. It turned out that such an algorithm works well and almost all requests can be preloaded, however, the load on search backends increases accordingly and user traffic increases accordingly. It is clear that such a solution cannot be implemented.


    The following idea was also quite simple: it is necessary to load probable search hints not in all cases, but only when we are confident enough that they are really needed. The simplest solution will be a classifier that works directly in runtime according to the data that the sajest already has.


    The first classifier was built using only ten factors. These factors depended on the probability distribution over the set of prompts (idea: the greater the “weight” of the first prompt, the more likely it will be entered) and the length of the input (idea: the less letters the user has to enter, the safer to preload). The beauty of this classifier was also that in order to build it, it was not necessary to release anything. The necessary factors for the candidate can be collected by making one http-request in a sajest daemon, and the targets are based on the simplest logs: the candidate is considered “good” if the user's total request completely matches it. It was possible to assemble such a pool, train several logistic regressions and build a scatter diagram in just a few hours.


    The metrics for the pre-renderer are not completely arranged as in the usual binary classification. Only two parameters are important, but this is not accuracy or completeness.


    Let be $ r $ - total number of requests, $ p $ - the total number of all prerenders, $ ep $- the total number of successful prerenders, i.e. those that ultimately match user input. Then two interesting characteristics are calculated as follows:


    $ overhead = \ frac {p + r - ep} {r} - 1 $


    $ efficiency = \ frac {ep} {r} $


    For example, if exactly one prerender per request is made, and half of the prerenders are successful, then the prerender's efficiency will be 50%, and this means that it was possible to speed up the loading of half of the requests. At the same time, for those requests in which the prerender worked successfully, additional traffic was not created; for those requests in which the prerender failed, one additional request had to be set; so the total number of requests is one and a half times larger than the original, “extra” requests 50% of the original number, therefore$ overhead = 0.5 $.


    In these coordinates, I drew the first scatter plot. He looked like this:



    These values ​​contained a fair amount of assumptions, but at least it was already clear that, most likely, a good classifier will work out: speeding up loading for several tens of percent of requests, increasing the load by several tens of percent is an interesting exchange.


    It was interesting to watch how the classifier works. Indeed, it turned out that the length of the request turned out to be a very strong factor: if the user has already almost entered the first hint, and it is at the same time quite probable, you can prefetch. So the prediction of the classifier increases sharply towards the end of the query.


    margin          prefix                 candidate
    -180.424        м                      майл
    -96.096         мо                     мос ру
    -67.425         мос                    мос ру
    -198.641        моск                   московское время
    -138.851        моско                  московское время
    -123.803        москов                 московское время
    -109.841        московс                московское время
    -96.805         московск               московское время
    -146.568        московска              московская олимпиада школьников
    -135.725        московская             московская олимпиада школьников
    -125.448        московская             московская олимпиада школьников
    -58.615         московская о           московская олимпиада школьников
    31.414          московская об          московская область
    -66.754         московская область     московская область карта
    1.716           московская область з   московская область запись к врачу

    A prerender will be useful even if it occurred at the moment of entering the very last letter of the request. The fact is that users still spend some time clicking on the “Find” button after entering the request. This time can also be saved.


    2. First implementation


    After some time, it was possible to assemble a fully working design: the application went for search hints into the demon of the sajest. He sent, among other things, information about whether to preload the first clue. The application, having received the appropriate flag, downloaded the results and, if the user input ultimately coincided with the candidate, rendered the results.


    At that moment, the classifier acquired new factors, and the model was no longer logistic regression, but quite a CatBoost . Initially, we rolled out fairly conservative thresholds for the classifier, but even they allowed us to instantly load almost 10% of search results. It was a very successful release, because we managed to significantly shift the minor quantiles of the search load speed, and users noticed this and began to statistically significantly return to the search: a significant acceleration of the search affected the frequency of users performing search sessions!


    3. Further improvements


    Although the implementation was successful, the solution was still extremely imperfect. A careful study of the operation logs showed that there are several problems.


    1. The classifier is unstable. For example, it may turn out that by the Yandex prefix it predicts the Yandex query, by the Yandex prefix it does not predict anything, and by the Yandex prefix it again predicts the Yandex query. Then our first straightforward implementation makes two requests, although it could well manage with one.


    2. The prerender does not know how to process word-for-word hints. Clicking on a word-of-word prompt leads to the appearance of an additional space in the request. For example, if a user entered Yandex, his first prompt would be Yandex; but if the user used the word-by-word hint, the input will already be the string “Yandex”, and the first prompt will be “Yandex cards”. This will lead to disastrous consequences: the already downloaded Yandex request will be thrown out, instead the Yandex card request will be loaded. After that, the user clicks on the “Find” button and ... will wait for the full delivery of the issuance at the request of Yandex.


    3. In some cases, candidates have no chance of becoming successful. A search for inaccurate coincidence works in the sujest, so the candidate can, for example, contain only one word from the ones entered by the user; or the user can make a typo, and then the first prompt will never match exactly with its input.



    Of course, leaving a prerender with such imperfections was a shame, even if it was useful. I was especially offended by the problem with word-for-word clues. I consider the introduction of word-of-word tips in Yandex mobile search to be one of my best implementations for all the time I worked in the company, but here the prerender does not know how to work with them! Shame, not otherwise.


    First of all, we fixed the problem of classifier instability. The solution was chosen very simple: even if the classifier returned a negative prediction, we do not stop loading the previous candidate. This helps to save additional requests, because when the same candidate returns next time, you will not need to download the corresponding issue again. At the same time, this allows you to load results faster, as the candidate is downloaded at the moment when the classifier first worked for him.


    Then came the turn of proverbial clues. A sagest server is a stateless daemon, so it’s hard to implement the logic associated with processing the previous candidate for the same user. To search for prompts simultaneously for a user request and for a user request without a trailing space means actually double the RPS by a sajest daemon, so this was not a good option either. As a result, we did this: the client passes in a special parameter the text of the candidate, which is loading right now; if this candidate, up to spaces, is similar to user input, we give it away, even if the candidate for the current input has changed.


    After this release, it finally became possible to enter queries using word-for-word prompts and enjoy prefetch! It's funny enough that before this release, only those users who finished entering their request using the keyboard, without a sjest, used the prefetch.


    Finally, we sorted out the third problem with the help of ML: added factors about the sources of hints, coincidence with user input; in addition, thanks to the first launch, we were able to collect more statistics and learn from monthly data.


    4. What is the result


    Each of these releases generated tens of percent increase in instant downloads. The best part is that we were able to improve the prerender performance by more than two times, practically without touching the part about machine learning, but only improving the physics of the process. This is an important lesson: often the quality of the classifier is not the bottleneck in the product, but its improvement is the most interesting task and therefore development is distracted by it.


    Unfortunately, instantly loaded SERPs are not a complete success; the loaded output still needs to be rendered , which does not happen instantly. So we still have to work to better convert instant data downloads to instant renderings of search results.


    Fortunately, the implemented implementations already allow us to talk about the pre-renderer as a fairly stable feature; we additionally tested the implementations described in clause 2: all of them together also lead to the fact that users themselves begin to make search sessions more often. This is another useful lesson: significant improvements in the speed of the service can statistically significantly affect its retention.


    In the video below you can see how the prerender is now working on my phone.



    Also popular now: