“Data Science, like mathematics and physics, is another way to learn about the world around it.”

Published on November 26, 2018

“Data Science, like mathematics and physics, is another way to learn about the world around it.”

Habr, hello! We are continuing a series of interviews with Newprolab alumni, in which they tell about their history of transition to the field of work with big data. The stories are different and will be interesting to those who are thinking about changing their career trajectory or how new knowledge can help solve current tasks. Meet Oleg Homyuk, Head of R & D in Lamoda.

Oleg told about his career path, values, why he chose Lamoda, and not the company in the Valley, about current projects, his team, about the most successful and most unsuccessful projects, about attitude to data science and much more.

image

- Oleg, how was your professional path to Head of R & D in Lamoda?

- It seems to me that any professional way is the result of several reasons and sometimes of accidents. Among these reasons there are several basic ones: the peculiarities of thinking, life values ​​and in general, as a person understands what success is. This understanding of success is the very vector that we use as a compass, choosing a professional path.

In this sense, everything worked out quite simply for me: the school clearly showed abilities in the exact sciences, constantly participated in olympiads, and even managed to take 3rd place in the 9th grade at the regional olympiad in mathematics among schoolchildren. On the whole, it has always been very interesting to solve puzzles, to look for patterns, I still love to be smart about tasks.

I liked to study at the university too: I graduated from the Moscow State Technical University. N.E. Bauman with a red diploma in specialty "Optoelectronic Instrument Engineering", we were taught to design a fairly complex, from the point of view of physics and microelectronics, technology: thermal imagers, digital cameras, telescopes, even sniper sights, homing systems and night vision devices. I must say that this is an incredibly interesting profession, and our teaching staff was a star one. Such a real engineering at the junction of several areas of knowledge. Sometimes a little pity that did not work out on this topic.

- Why didn't it happen?

- On the last courses I was a little disappointed in what I was doing. It turned out that the demand for a profession in the country is low, everything is very local, the best of the best engineers work mainly in the institute laboratories, rare factories are able to realize the projects of engineers, equipment is outdated, and so on. There were, of course, some successes, but the scope was not the one that I imagined at the beginning of my studies. To this factor were added low rates for researchers, it was possible to engage in private carrying and earn more. There were, of course, other options to earn money, working not quite officially for Japanese companies, naturally without intellectual property rights.

At some point, my friends called me to work at a fairly large Moscow region Internet provider, and I agreed. He was quite ready to learn new things; technical education gives a lot of space in this sense.

There I acquired new technical skills, became acquainted with the topic of quality management and in general touched world practice in this regard. There is such a standard for quality management, even a series of ISO 9000 standards, which offer some practices on the organization of processes in an enterprise, taking as an axiom the connection between the quality of the final product and how well the company internally manages its processes. The basic idea is that if you do everything within the framework of a standard, then the quality of your products is constantly improved, because you measure, think, plan, do and again measure every process that can affect this quality. This cyclical constant improvement activity even has a name - the Deming cycle. I was somehow captured by this topic, such as management, but it is very mathematical.

As a result, I worked there for about 2 years, did various things, including managing a small department, built processes, communicated a lot with the quality department.

Next was Yandex. At some point I saw that they were hiring project managers in the search quality department. The vacancy itself is not so hooked, more interested in the test task: describe the existing problem of Yandex search and figure out how to solve it. Well, the trigger in the head on the word "quality" worked, probably. I worked 10 hours in a row on the task, it turned out to be several pages. As a result, they contacted me, called me for an interview and made an offer, which I gladly accepted.

While I worked at Yandex, specifically for me, everything fell into place, I saw how big data, mathematics, algorithms, focus on the user, his needs work together as a single mechanism and allow creating breakthrough products on the one hand and making money with another. It seems to me that I from Yandex took this formed desire to make products based on data and engage in machine learning. Since then, began to actively develop in this direction.

- It was 2011, the topic of big data was not very popular yet, there were no programs in particular. Where did you study, read?

- The content available was certainly not enough, but we were all so eager for knowledge. But Coursera was already and, by the way, the SCH too. I listened to Vorontsov’s lectures 15 times and did not understand anything. Many went through this, an interesting era.

In general, I began to move away from the topic of information retrieval little by little, I liked working with data, I was attracted to a new area related to machine learning, and in 2012 I left the company.

- And what after Yandex?

After Yandex was “Consultant Plus”. Already more consciously chose the direction associated with data analysis. Just the data of user actions were just beginning to be extensively collected, so I joined this activity, started doing projects.

In general, it was an interesting time, now there are a lot of libraries available for machine learning, for example, xgboost, and we wrote our gradient boosting on C ++ trees, now, of course, not every team can afford this, and there’s no need - everything is already implemented. Such a story.

- Did you write with your own forces or did you already have a team?

- The team has already been, yes, besides of the talents. In the second year of my work at Consultant Plus, we were joined by a talented student of the VMC, who in a couple of months wrote his implementation of boosting and began teaching models.

By that time, we were already aiming at forming a whole team of data scientists, we felt that there were many new opportunities in the data. Then, the opportunity to take the team of two ShAD graduates, who probably knew a little more than me, and the developers for building repositories, turned up very well. Everyone tried, worked mainly on the Hadoop cluster, although by modern standards there was not very much data.
At the peak of us, probably, only 9 people were there, they solved good problems. For example, we were looking for outbursts of user interest in various topics, this helped the authors to more optimally approach the choice of topics for which it makes sense to write new material.

After that, I worked at Ezhome, a startup in Palo Alto. There, by the way, was recommended by Mitya Kataev, with whom I studied on the program “Big Data Specialist”. His acquaintance, Kirill Klokov, working as a development director at Ezhome, was just looking for a data scientist in the team. The main idea of ​​the company is the creation of the Uber-experience for home services; As a starting point, the care service for the local area was chosen - from lawn mowing to cleaning, planting plants and trees. As a result, I started working there as a Data Scientist, I really wanted to try my hand at a startup, and I wanted to work with my hands. I periodically have this analytical itch, I want to do something meaningful myself, even though for a long time I have mainly focused on organizational processes. I used to hope that someday the itch would subside, but no, I still try to “sit on two chairs”, that is, to develop both as a manager and as a specialist.

- Even now?

- Even now. Although at the moment, of course, there is not enough time for a lot: a large team, a lot of management tasks, I am delaying at the weekend, there is now plenty of opportunities for this — kaggle, for example. I also want to do something with my own hands, but I have guys in the team who are clearly better than everyone in their field. But, in my opinion, for effective project management in the field of data analysis and the manager must have hard skills. I am constantly learning. Right now, for example, I decided to go through a programming specialization, just to remember what was happening.

- Coming back to Ezhome: why did they need a data scientist? What tasks did you have?

- This is a good question. At the very beginning I asked what result they expected from me. The answer was in the spirit: "we ourselves do not understand exactly yet, let's try." But a good task was quickly found: at that time, there was a bottleneck in attracting new customers, because each new application was processed by a person, measured the site from a satellite image, tried to understand how much maintenance should cost such a site. There was an expert linear model that dealt with this assessment. It is clear that the quality of the forecast wanted to improve, and how to take into account the greater number of parameters expertly, is no longer certain. That's where machine learning came in handy. We began to predict the time that the gardener will spend, using the parameters of the site. Parameters of the sites were taken from open sources, and “teachers” - from historical data.

As a result, the task was fired, for most of the incoming calls data was available, it was possible to form individual prices on the fly. Classic automation - robots work, people relax. Then I was invited to come to the head office in the Valley for a while, about a month and a half.

Before that, I worked remotely, there almost the entire remote team was there: the USA, India, Greece, Poland, Russia. The team was very cool, it was a pleasure to work. I managed to do a lot of cool tasks, in the end I was offered the position of the team leader analysts. We made some improvements in infrastructure, which allowed us to increase many times the number of projects that we did. Then they offered to unite with another team, which was engaged in developing software for building routes for workers: 5 thousand clients, 150 gardeners, how can you get around them in an optimal way. It was very exciting, and now it seems to me that the tasks that are more about computer science than about data are also very interesting.

- In parallel with Lamoda, you considered several proposals, why was the choice made in favor of Lamoda? What was critical for you?

- Yes, there were several proposals. What got me hooked in Lamoda? A clear strategy, understandable expectations from me, trust and a realistic resource plan in finance, that is, I have a clear task outlined to me: “we are here now, we need here, we want to develop R & D, we are ready to invest X, we expect such an economic effect” . Everything. No reasoning about how spacecraft will surf the universe, or that all will be replaced by robots. Plus, an honest story about how the company is doing. Everything was transparent, clear and this, in general, was bribed, because there was a complete feeling that I was joining the team of people who are really result-oriented and understand what they want. In addition, I was given a blank check on the development of this area. For me it was some kind of personal challenge, I have never had the opportunity to assemble such a large team. Now there are 17 people and we are still growing.

- This is not the first company where you build a R & D department from scratch, you assemble a team. What are the first 5 steps you take when you come to the company?

- The R & D department was in Lamoda and before me, in 7 years even a few teams and managers changed. In addition, we have about half of the current team gathered inside. So not quite from scratch.

The first five steps in a new company? The algorithm, I think, is not specific to R & D; in principle, it can be so, if you come to a new company for at least some kind of management position.

First, you need to understand the current strategy of the company, understand what goals the company has, what KPIs will be measured for achievements.

The second is to describe exactly how you can influence these KPIs, given your competence or role in the company, there should be some set of available tools and ideas. Describe the needs of the business and the target state, that is, what we generally want to come to, and then evaluate the available tools. Machine learning is only one of them, and not every task is optimal.

The third point - you need to conduct an audit of the current state - people, competencies, processes, data, products, infrastructure, especially infrastructure.
In general, it is only at the 4th step after the audit of the current state that it becomes possible to describe the further strategy of transition from the current state to the target state. In fact, a lot of work, including a lot of consultations with interested parties, stakeholders, the results of which need to develop several possible development scenarios. In my practice it was useful to make at least 3 - conservative, realistic and aggressive in the sense of resource costs. Then everything is easier: after choosing a strategy, we make a roadmap, clarify the assessment of resources and get down to work.

- What is data science for you?

- Data Science is my favorite tool. This is an extremely exciting field, it is like math and physics, another way to explore the world around you. I first felt this clearly for the first time in Yandex, when we were engaged in the analysis of search queries, understood what the users needs, how they solve them, what generally happens in the world. That is, you can look at the world through a small crack of the data with which you work. This is interesting and, in my opinion, is no different from other methods of cognition, just another “channel”, consider that this is the 7th sense. The same thing happened in “Consultant Plus”: we looked at which users solved problems when they were looking for court decisions, that is, what exactly worried people, what disputes they had about, which needed to be resolved in court. If we talk about the data which we analyze at Lamoda, this is no less exciting. Especially when you find out that blouses and skirts are bought in different colors rather than the same. A curious observation with which you can go further in life. A lot of things you can learn about the world around you through the data. Therefore, I say that this is my favorite tool. And here he is, on the one hand, a cognitive tool, and on the other hand, an active tool, with the help of which you can create something new.

- If you take a business, what role in business do you assign to the data?

- Here the most important thing is not to succumb to HYIP. If we talk about business, then the data should certainly work. The results of data analysis should be profitable or reduce costs. If they do not, then something has gone wrong somewhere. At the same time, a data-driven culture does not need to be understood literally, we can make decisions without relying on data, this is normal. Moreover, in some cases, the only way to do so.

- Tell me, what projects are you doing in Lamoda? What is the most successful project implemented by your team?

- Probably the first thing worth mentioning is a platform for A / B testing - in fact a service that breaks users into groups and controls switching on and off experimental features. Why is this important to us? Because, in general, this field of machine learning itself cannot exist without constant testing of various hypotheses and ideas. We cannot know in advance that our users will like it more or less. Any new idea must be tested. Amazon cites interesting statistics; they say that 70% of the ideas they are testing lose the test. This should be treated calmly, even if the rate is higher. This means that in order to release 5 successful projects per quarter, it is necessary to do ± 17. Therefore, a reliable platform for controlled experiments is the basis, without which it is absolutely impossible to move forward in terms of product development. Considering our ambitious plans, it was necessary to make some upgrade to this system. Before me, the first version was made, we significantly updated it: now you can run more experiments at the same time, before there were some limitations in this sense.

- And what other directions?

- Search, and there are differences from major players like Yandex and Google, because we can work very well with our subject area, compared to the “universal search on the Internet”, it is quite narrow. It is impossible to make an ontology of everything, to describe all the interrelations, but in a small specific area you can make very good decisions that will work. We do our linguistics for the search engine, which could take into account some implicit relationships between different entities. For example, there are some brands that are grouped together, and formally, if you are looking for a thing of one brand, you can show the thing of the same brand, just another brand. As an example, Tommy Hilfiger and Tommy Jeans, in fact, this is one brand. Or understand that a stud is also a formal heel, and loafers are generally shoes. Generally,

Of course, one of the brightest examples of projects in which we are engaged is the ranking of products in the catalog. This is the same ranking in popularity. We try to make sure that the user who comes to the site, as soon as possible found what he likes.
There are also projects with recommender systems, pricing optimization, personalization, and a lot of things.

- Oleg, tell us about your most successful project.

- The most successful project is now just the introduction of a new ranking in the catalog. It has become a little smarter, beginning to take into account more interesting data. For example, we have solved the problem of context for unisex products, that is, in the context of the men's catalog of shoes, a well-selling product, and not in the context of the women's one. According to the behavior of users, it turns out that these are rather men's shoes, although formally and unisex. Many such nuances that I want to take into account. So we do not stop, we test new hypotheses, we try to actively cooperate with the commercial department and so on.

- How do you work on projects? How do you select? How long do you take into production?

- The statistics have so far been collected on this topic, but in general, our work is structured in this way: despite the fact that the organization is already quite large, there are more projects than people, therefore we collect a micro-team for each direction. For example, I have a separate micro-team that deals with recommender systems. The same people may be involved in other projects, this is normal. Everything is solved mainly within the micro-team, regular meetings and brainstorms, planning and retrospectives, as well as internal meetings and demos are held. No demo anywhere.

This year it takes 4-6 weeks for the project to go from idea to release. But it is clear that such projects are not all. Some require much larger resource investments, especially if you need to invest in architecture or do something completely new or long and expensive to integrate with other systems. The maximum period is about several months. If you need to improve something that is already working, then this can be done quite quickly, if building from scratch is a different job.

- You mentioned Amazon with their 70% failed experiments, and what percentage in Lamoda?

- I would rather call them unsuccessful than failing. These, of course, we have. But we believe that from any experiment there are only two ways - it is either success, or learning. We do not call unsuccessful experiments a failure. A real failure is when we did not learn any lessons from a project that did not bring economic benefits. If a new idea lost the current one or at least did not win, it means you need to thoroughly figure out why it happened, rethink the task and, possibly, do another iteration. Just need some knowledge to endure.

- Can you talk about a project in your career that didn't take off? About the biggest disappointment and learning'e that you endured.

- Yes, there are even a few of them. For example, I really wanted to introduce machine learning into a search engine ranking in one of the companies. We spent a lot of time on this project, and as a result it turned out that there was simply no resources to implement such a solution, and the project had to be closed. For me, as a manager, it was very good learning, I am sorry that dear. Determining the boundaries of what is permissible (what we can do, what resources we have) is needed at the start, before at least a line of code is written, otherwise a similar situation could turn out. Moreover, the team did serious work, and when modeling on the stand, even good quality worked, but for the implementation architectural changes were required in the application, and for the sake of a single search the company did not go for it.

- What does the team mean to you? For the year you more than doubled your team, and you continue to grow. How do you pick people that matter to you?

- I consider one of my main achievements of the year of work in this company that we have a really great team atmosphere: it is friendly, based on mutual support and respect, it’s important to keep it when expanding the team. Therefore, in addition to professional qualities, we, in particular, try to understand at the interview whether we work with a person or not. All successful candidates get acquainted with the team, this is important, I listen to the opinion of the team.

- Half of your team, like yourself, either went through the programs with us at Newprolab before joining Lamoda, or you sent them to study. Is this a coincidence or did you select people from the alumni community, from those with whom you studied, intersect at our events?

- I would like to say that I, of course, selected, but I think that these are coincidences, although randomness is not accidental. I would like to quote here Grisha Sapunov (teacher Newprolab - approx. Ed.) That the correlation does not mean causation, that is, does not guarantee the existence of cause-effect relationships. Now from the lyrics to the problematic. It seems to me that all the graduates of Newprolab are united by qualities that, including me, seem useful in a team. There is some third reason that affects, conditionally, the attractiveness of the program for the listener and the candidate for me. For example, greed for information and a high level of intrinsic motivation. A three-month course with a load of 3 lectures of three hours a week and 10 hours of independent work requires a person to be a certain temperament, and this is what I really like the atmosphere, which you appear on the courses. Because it is quite similar to a normal workflow. And people who withstand this load have a head start in advance of those who are not conditionally prepared for such a regime, in general, there is a difference.

- Many may argue to you here that getting a certificate for online programs is more difficult, it needs motivation, maybe higher, nobody pushes you, there is often no one to ask, you know everything.

- So we are a team, we have no goal for a person to go into himself for 4 months, for example, as in the specialization on Coursera, and work with himself, we have the task to work in a team. We helped each other on the program, we had chats, we talked, everyone shared their solutions with each other. It is very similar to the workflow, we also work, we each do their part, for which he took responsibility, but at the same time everyone consults with each other, communicate constantly, this is teamwork.

image

- You and Petya Yermakov teach at the “Big Data Specialist” , other members of your team also teach, speak at conferences. Why is it needed, what does it specifically give you?

- For me personally, a performance is a way to communicate with the community and convey some of your thoughts to a wide audience. It seems to me that this is useful, because all the same we all have a little different understanding of what we are doing. And to show some kind of own individuality, and to find like-minded people is very useful. If we talk about teaching, for me this is quite a new experience. What motivates me to do this? I see social responsibility in this: I learned to do something myself, teach another. It seems to me exactly the way it should be. Of course, this is inspired, among other things, by the presence of some problems in the education system, because often the people who teach are not practicing experts in this field. Therefore, it seems that it is necessary. And I need it too, because if I share my knowledge, I am thus at least for a small share, but I will still spur the community and industry to develop. I later work with these people, there is now a clear lack of expertise on the market, we need to help people.

- You worked in an American startup for a year and a half, you lived in San Francisco. Why not continue to build a career there in the States? Why did you choose to stay here?

- Now it sounds rather strange, but I don’t make a big distinction between “there” and “here”, that is, for me the territorial location is not so important, I don’t really understand people who say that you must go somewhere. When I went to the Valley, I expected some wow effect from the experience and level of specialists who work there. I did not see it. Honestly, I can say that in Moscow you can assemble a team of engineers that will not yield to anything in the abstract startup team in the Valley, and this is also normal. I was following the project, Lamoda was more interesting. If there is a proposal in my career about some very interesting project in the States, I do not exclude the possibility of participating.

- What are the profile blogs, tg channels do you read at your leisure?

- I read Slack ODS, articles on Habré, I watch videos from all sorts of meetings, Saturday trainings on machine learning from Yandex. Yes, in principle, probably everything, it’s such a mess, I just don’t have enough time, I work a lot, and I still have a personal life.

- At the very beginning you spoke about values, about personal and public success, about particular thinking. Can you tell what is important to you, what are your values ​​and what is personal and public success for you?

- In short, I see it this way: social, social significance is valuable for me. I think that if I didn’t study data science, I’d do medicine. And, perhaps, someday I will be able to combine these two gestalts.

- You are already closing one, it seems to me fine.

- It seems so. But the story about the analysis of medical data is very attractive. Nothing is more valuable than human life, no. It seems to me that if some day I manage to take a small step forward in this direction, it will be very cool. Now the large-scale applicability of machine learning in medicine is still questionable - the medical data are scattered, unstructured, there are no uniform standards, they are classified, there are many problems with them. In addition, you need to get a lot of accreditation to make some more or less good products. And, I think, it will be cool to prepare for that era, when everything is normalized in this context, to gain experience and skills, and maybe someday I will work with this too, the idea of ​​this periodically pops into my head.

If we talk about personal success, satisfied users and profits for a business - these are probably two measures of success that are significant. Even as part of personal success, I think you need to spend your resource with maximum benefit. What I have in mind here is: wherever I come, I always try to take that role position in which I can bring maximum benefit, I try to see narrow places and zones of growth. Ezhome is a good example: I came there as a data scientist, because it was interesting for me to do something with my hands, then I saw that I could work more hard on other tasks. There are people who are strongly focused on something specific in advance. In this regard, I am a little more open to the new, if so necessary for the common good. I'm basically a fan of optimization, whatever that means.

That's probably why I love my job so much, it allows me to use my strengths to achieve goals that are valuable both to companies and to me personally.