How data sharing affects the quality of recommendations

    Hello, Habr!

    We pay special attention to verification of integration when connecting a new client to the platform and constantly monitor the status of integration in the process. Why is this critical? Because data collection is the basis for generating quality recommendations.

    The work of the recommender system is based on several important components: data collection, storage, processing, issuing recommendations and growth hacking. Plus “iron” for providing computational power of algorithms and layout process. Thus, we get at least 7 points on which the quality of recommendations depends, not to mention an expensive team of analysts. Both the external service and the internal system of recommendations of the online store should cover all these points and provide quality work at all stages.

    Algorithms are considered a black box, and there is an opinion that you can quickly get 50-70% of the effectiveness of a recommendation system based on open source software. But software is only one of seven points, which if it gives an efficiency of 50-70%, then only one point. Those. without the rest, this is about 7-10% of the system’s efficiency.

    One way or another, we communicate with almost all technology companies in the Russian market and in other countries where we conduct operations, and we can confidently say that our approach to ensuring the quality of data exchange is one of the most thought out in the world.

    Over the years, we have accumulated tremendous expertise at every point. We often tellabout what results our algorithms give and what successes the Growth Hacker team is achieving for stores, but today we want to stop at the very first point - data collection.

    If any problems arise at the collection stage, they will directly affect the further stages of the formation of recommendations and, thus, can destroy almost all the value that the online store could get. That is why we place such high demands on the quality and completeness of the data.

    In our practice, we have come across various cases where, due to the peculiarities of internal processes, technical failures, or simply inattention, incorrect or incomplete data is transmitted, which negatively affects the quality of the service. For example, the id of a product or category changes several times a week, products are duplicated, or vice versa, not all are transferred, etc. Such factors greatly affect the accuracy of recommendations.

    We offer our clients not only our technologies, but also expertise, including in terms of quality control of data exchange, therefore, we will talk about several important parameters that must be taken into account when transferring data to a recommendation system.

    Retail Rocket Integration

    Integration with the Retail Rocket system is carried out through the installation of tracking codes. On different pages of the site scripts are placed according to certain rules. It takes about a couple of hours to install the tracking codes directly. But due to the high demands on the quality and completeness of the data, the process can be delayed. For example, an online store prepares a YML file according to the standard requirements of other services and lacks important details for forming recommendations. Since data collection is of great importance for all further stages, we carefully work out each integration point.

    Immediately after installing the tracking codes, our experts check the correctness of the scripts on all pages, as well as the quality and completeness of the transmitted data. There are situations when codes are not placed on all pages, not all events are tracked (additions to the basket, orders) or not all products, their properties and other important parameters are transferred in the feed. For our team, this is a whole process where, at every stage, the correct installation of trackers and data transfer is monitored by specialists - from account managers to technical support.
    This is part of our integration kanban board that every client goes through:

    Integration Check Options

    During the integration, we check dozens of parameters. We talk about the most important.

    The completeness of the product base

    The correctness and completeness of the transmitted goods and categories, as well as the conformity of their goods and categories on the site is one of the most important parameters for the formation of recommendations.
    The catalog of goods transmitted through XML Feed must exactly match the structure of the site menu. For verification, the manager selects a number of random products from different categories to make sure that the product is in the same nested categories as on the website of the online store.

    In addition to the structure, the number of products and categories in the file should correspond to the number of products on the site. With the help of special reports, we can track whether all products posted on the site are transferred in a YML file. The list of products that are not in the feed is formed on a separate page, and the account manager can immediately send it to the client.

    In addition, it’s important not to remove an item from the feed when it leaves stock. Contextual advertising, links from search and other resources can lead to goods not in stock. The user who is on the product page out of stock has already formed demand, i.e. he is ready to buy and he can recommend the most similar alternatives. Therefore, it is important not only not to remove the goods not available from the feed, but also to transfer the maximum parameters so that the recommendation algorithms can form the delivery of alternative goods.

    Consideration of regional parameters

    For online stores with representative offices in several cities, it is important to transfer data for each of the regions. In different regions, goods may have different prices and different availability, so it’s important to consider these data and transmit them in the feed.

    In addition, this can be important for marketing tasks, for example, in some cities, discounts are more important for customers, and bonus points in others. Using the regionality parameter, you can optimize your marketing campaigns.

    Passing product properties

    Another important parameter of the data that must be transmitted for the qualitative formation of recommendations is the properties of the goods, such as color, size, brand, etc.

    Two points worth noting here. Firstly, for some industries, taking into account certain properties of goods is crucial for making the right recommendations. For example, for a shoe store, this parameter will be size, because the recommendation of a very similar pair, but of the 40th size instead of the 36th, is unlikely to interest the buyer. To solve this issue, in October 2017, we improved personalization mechanisms for fashion segment stores to take into account the size of the user's clothes or shoes in the recommendation blocks.

    Secondly, if stores consider each size and color as a separate product, other sizes / colors of the same product may be shown in recommendations of similar products, since for the algorithm they look as similar as possible. For example, recommendations for exactly the same rings but of a different size are made as alternatives to a ring of one size. In the Retail Rocket platform, this is solved using group products.

    Change of product ID or its properties

    If the ID or other properties of the product change too often, for example, the product constantly moves from category to category, the quality of recommendations may noticeably drop. If at the time of calculating the recommendations of related products, the product was in one category, and at the time of requesting recommendations it already moved to another - the issue may not be relevant, and it will take some time for the algorithm to re-form the relationship between the goods and categories.

    In addition, this may occur due to violation of business rules, for example, when conditions are prescribed not to recommend goods of one category to goods of another. When the id of the goods or categories change, business rules collapse and they need to be re-written so that recommendations are built with high quality.

    Analytics data reconciliation

    Another important indicator by which you can check the completeness of the data and the correctness of their transmission is the coincidence between the numbers in the sales report of our platform and the client’s web analytics system (for example, Google Analytics). We compare the number of orders and the volume of revenue - the difference should be minimal.

    Thus, we understand whether all tracking codes work correctly and whether all the data is transferred to the system accurately. If any differences appear, we begin to “dig” and look for reasons. For example, the tracking code may not be installed on some page, and because of this, the data varies. That is, this is a kind of marker, which immediately shows if something is wrong.

    Site and email synchronization

    This item is important for several reasons. Firstly, thanks to this synchronization, we understand how effectively trigger emails will work in a particular online store. For example, is it possible to send an email about abandoned browsing to a user whose email the store has.

    We also daily monitor the performance of trigger emails. If we see on the chart that today the number of emails sent is less than the average for a certain period, we check what happened.

    Secondly, in order to receive as much information as possible about the user, it is important to connect his profile on the site and in the email. Our trackers track addresses on all pages where the user can agree to receive newsletters (registration, login, placing an order, subscription form, etc.) so that we can collect the maximum number of customer emails. This increases the coverage, the number of shipments and, as a result, the number of orders increases.

    In addition, when the user opens the letter and follows the link, we have the opportunity to track his behavior on the site, the history of views and purchases, and thus make more accurate recommendations on the site and email newsletters.

    Real-time integration status tracking

    The main value of the personalization platform in its algorithms, but in order for them to work at full capacity, it is necessary, including, to constantly monitor the correct integration on all pages of the site.

    We often write that one of our features is that we do not finish work at the installation stage, but try to constantly improve all metrics. This applies not only to A / B testing of various algorithms, but also to constantly tracking the correctness of our recommendations.

    We have developed a special interface with the help of which our specialists can monitor the integration status in real time and respond quickly in case of any problems.

    For each possible situation, a report is automatically generated with a description of the necessary actions, which can be sent to an online store representative.

    For example, we have developed a separate subsystem that downloads pictures from the website of an online store, resizes them and stores them on its own CDN. In the integration status, we monitor whether there are problems with some images, see the percentage of such images and immediately we can send the client a report and tips to correct the situation, automatically generated by the system.

    In case of technical failures on the client side, storing images on our CDN helps to keep the display of recommendations on the site and in letters.


    No matter how high the quality of the external service is, no matter how powerful and smart the algorithms are, without effective data exchange, the results can be lower than the online store expects. As well as integration, it may take longer if tracking codes are not installed everywhere, and the necessary data is not enough in the YML file.

    The quality of data exchange directly affects the performance of any external service, but not every service pays enough attention to checking such details. We strive not only to make our algorithms efficient and constantly improve them , but we monitor the integration with the client’s website.

    Also popular now: