Machine Learning in Dodo. How to launch a new direction if you are a developer

    Under the cut is the story of how the machine learning area appeared in the Dodo. Spoiler: I launched it. Hardcore technical details will not be here, be sure to devote a separate article to them. Today is more about the motivation and support of colleagues.



    Training


    I came across the topic of machine learning thrice, until something worthwhile came out of it.

    Russian school


    The first time I came across machine learning at the HSE - I got a second tower in the direction of Big Data Systems when I got a job in Dodo. After going through this huge hype topic on a tangent, I did not understand why I had spent three years of my life at all. And even more so I didn’t think about how this could be useful in the company. I was not ready for this challenge of fate then.

    Czech voyage


    The second time I came across this topic in Prague, at the Microsoft closed machine learning hackathon. Together with guys from other companies, we worked on the task of forecasting demand in Dodo during the holidays and peak days. I returned back with a ready-made model that predicts demand. It was after this hackathon that thoughts appeared that I would be able to apply the acquired knowledge in the company. There it was.

    Well, do you have a model in Jupyter, so what? How to use it? All attempts to explain this to business were faced with a harsh reality: and so it is clear that there will be many orders on holidays and peak days. Adult pizzerias are able to predict sales based on data from last year, and new ones had problems without it. We have postponed attempts to develop machine learning. But the idea that we can do more with data is too firmly stuck in my head and did not want to get out of there. Now I was ready for the challenge, but the company was not.

    The American dream


    The third meeting became fateful. Our team got a difficult but interesting task: to develop a customized pizza module for the USA. This is when you can order pizza with any set of ingredients, create your own recipe. Everything needed to be worked out in the project: from changes in the database architecture to the client code on the site. We grabbed onto the task and developed a product that I consider to be a real victory. The main assessment flew into slack from Alena, our CEO in the United States.



    We did the module, but I saw a problem in scaling. What if the functionality does not appear in one or two pizzerias in the states, but in a large network? How to manage such a product, plan stocks? I decided that this case could prove the need for the development of machine learning in Dodo. I felt that this time both I and the company were ready to launch a new direction.

    One on one with cars


    In the background, I started analyzing the sales of American customized pizza. Using clustering algorithms, it was possible to show that all the recipes created by users are based on six basic sets of ingredients plus a couple of random ones. Even a simple report based on this algorithm would allow semi-manual forecast sales and plan inventory. Due to the lack of bureaucracy and the ability to rebuild on the go, we were given the green light to begin to engage in this direction.

    The technical director and I understood and discussed more than once that I would need to leave the current team and start developing a new direction, to show that we needed it. I needed to plunge into a new sphere at a fast pace. I understood that if it doesn’t work out, there are two ways. The first is to return to development in another Dodo team. The second is to update your resume on HH and look for a new job. I did not want either one or the other. I was in this state for about three months, until I got hooked on the additional sales module.

    First project


    Another spoiler: it turned out that to run ML you do not need to run into something complicated. Obviously, isn't it? But it is very difficult to understand at the beginning of the journey.

    The module, which suggests adding an additional product to the order, is not directly controlled by anyone. That means I can do whatever I want with him. Cherry on the cake - an opportunity to increase sales with the help of more personalized offers. Previously, the module worked simply: if pizza was added to the order, the category of drinks was displayed in additional sales, if pizza and drink, then desserts and so on.

    The indifference of a huge number of people has again shown that I work in a company where support can be provided by absolutely everyone. I spent hours working on data and additional offers with a marketing colleague. We managed to cluster all users according to their taste preferences and loyalty, for each group to make static offers based on the top products in the cluster.

    Figures and proofs


    I screwed up the logging of additional products and launched new offers on a sample of 2 million users.

    A sample of users is only a small part of sales. It was necessary to move towards unauthorized and new customers. I have shoveled enough articles and literature on Collaborative Filtering and various offer algorithms for users. The idea of ​​recommendations based on the products in the basket won. Item-Based recommendations and a cosine measure of convergence formed the basis of a new, albeit simple, but already working model.

    In December, we launched the Item-Based Recommendations module. Statistics have shown that buyers may indeed be interested in completely different products, not just drinks. Perhaps it was after this that the Dodo believed that the data and the development of machine learning would allow them to compete in the future overloaded markets.

    Some statistics.


    Top 10 best-selling products on site.


    Top 10 best-selling products in mobile app.


    Weekly sales growth.

    Technical trailer


    Below are some technical details on why the model is based on a cosine measure of similarity. This is a preview of the article, which will be released in a couple of months. If you don’t like mathematics, feel free to jump to the last section.

    The initial table below shows the number of orders with the purchased goods of each user. We can determine the similarity of purchases of one user with another - for this we need to calculate the distance between user vectors.


    Customer sales table for products.

    The distance will depend on the selected metric. The calculation of the Euclidean space includes the weight and magnitude of the vector:

    where a and b are two different client vectors from the table. Let's see how this distance will look on an abstract example.

    Suppose we look at the history of three customers - a, b, and c. Let's build a matrix of their purchases.


    Having calculated the Euclidean distances between customers, we get the following values:

    d (a, b) = 16.22;
    d (b, c) = 13.38;
    d (a, c) = 13.64.

    These values ​​indicate that clients b and c are closest to each other. But if you look at the source data, the picture is the opposite. Customers a and b prefer to order more Pepperoni and occasionally other products, while client c prefers Supreme pizza. We can conclude that the magnitude of the vector has a negative effect for calculating the distances between customers. The cosine measure of similarity just takes into account the angle between the vectors, discarding the significance of the magnitude of the vector:



    Calculating the distance using this formula, we get:
    d (a, b) = 0.9183;
    d (b, c) = 0.5848;
    d (a, c) = 0.7947;

    We see that clients a and b are closer to each other. They prefer one set of goods without taking into account the difference in the number of orders placed. This logic agrees with our expert opinion and suggests that the preferences of customers a and b are closest to each other.

    This is a trailer, details in two months.

    Search for your


    Now we are at the stage of forming a team in which there will be specialists in organizing data storage, developing machine learning models, and putting them into production. But most importantly, we now better understand why we need all this. We are free to do really cool things, from organizing an intelligent logistics system and inventory planning to fantastic ideas for automating pizzerias using Computer Vision technologies.

    Believe in yourself and your strengths, even if the result is not visible on the horizon. I would like to end the article with someone else’s thought - a quote from Max Weber from his report to students of the University of Munich: “You can’t do anything with sadness and expectation, and you need to act differently - you need to turn to your work and meet the“ demand of the day ”- as a human being, so professionally. And this requirement will be simple and clear if everyone finds his own demon and obeys this demon, weaving the thread of his life. ” Find yours.

    Also popular now: