“If you want to create something really cool, you have to dig deeper and know how your code works in the system, on hardware”

    Habr, hello! It is interesting how many programmers and developers discovered data science or data engineering, and build a successful career in the field of big data. Ilya Markin, Software engineer in Directual , is just one of the developers who switched to data engineering. We talked about the experience of being a tmlid, a favorite tool in data engineering, Ilya talked about conferences and interesting profile channels of Javista, about Directual from the user side and technical, about computer games, etc. - Ilya, thanks for taking the

    image

    time to meet. Congratulations on the relatively recent transition to a new company, and on the birth of your daughter, your worries and worries are many. Immediately the first question: what was so interesting you were offered to do in Directual, that you left DCA?

    - Probably, first you need to tell me what I did in DCA. I got into the DCA (Data-Centric Alliance) after completing the Big Data Specialist program. At that moment I was actively interested in the topic of big data and realized that this is the area in which I want to develop. After all, where there is a lot of data, there are plenty of interesting engineering problems that need to be solved. The program helped me to quickly immerse myself in the ecosystem of the world big date, there I got the necessary basic knowledge of Hadoop, YARN, the Map-Reduce, HBase, Spark, Flink paradigm, and much more, and how it works under high load.

    I was invited for an interview by the guys from DCA. DCA is a major participant in the RTB market ( Real Time Bidding is an advertising technology that allows you to organize an auction between sellers and buyers of real-time advertising. The object of bargaining at an online auction is the right to show ads to a specific user. RTB is based on the maximum selection accuracy of the target visitor - Ed.). DCA had a high coverage of runet users: about 600 million cookies, a cookie is not equal to the user - one user can have many cookies: different browsers, different devices. We received dozens of terabytes of data on web page visits per day. All this was processed and the cookie was laid out in a certain set of segments. Thus, we could determine, for example, cat lovers from 20 to 25 years old living in Moscow, in order to continue to offer them to buy food for their beloved cat near the house. And there are many such examples, there are quite simple, there are complex ones. There was a lot of java, scala and C ++ under the hood. I joined the company as a developer, and six months later I became a team leader.

    I left DCA at the end of spring, by that time I was tired of the managerial load and began to look at technical positions. It turned out that I could not write the code for a week. We met with the team, discussed interesting solutions, thought through the architecture, painted tasks. When I took something from the list, I sometimes did not have time to complete the task, because there were a lot of timlide cases. Maybe the problem is in me, and I could not allocate the time correctly.

    And yet I gained a rewarding experience. First, work with the team and with the business: it is interesting to be at the junction of development and business, when you receive a request for the implementation of some functionality, you think, evaluate the possibilities. Often it is necessary to make a decision which will be more useful in this particular situation: write something quickly “on the knee” or spend 2 weeks, or even more, but issue a consistently working, normal solution.

    - And which solutions were most often chosen - “on the knee” or in 2 weeks?

    - The developer is always a perfectionist deep down, he can endlessly engage in some interesting task, redo it, optimize it. Of course, you need to know when to stop. Choosing solutions that were somewhere in the middle.

    Secondly, I finally was in a position where you can participate in decision-making, to be aware of what is happening in the company. I don’t like to just sit and code in my box, I want to know what is happening with the product, how it shows itself how users react.

    Thirdly, I started interviewing, I was “on the other side of the barricades,” so to speak. The first interview was very exciting to read, read the resume and thought: “Damn, now the star will come, and I don’t even know half of what he wrote. What I will talk to him about at all. ” And in the process of communication you become sober and understand why the demand in the IT-market exceeds the supply. It is difficult to find a good specialist, most often he sits where he is satisfied with everything. A ready-made specialist for your specific tasks and technologies, who will not need to be retrained / retrained, to find it at all unrealistic, you have to connect connections, ask friends, acquaintances, colleagues. Networking is very important here. For example, I brought my friend to the company, in which I was sure and with whom I had worked previously at the previous place. Also took a recent university graduate,

    Often people work with frameworks, rather than with specific tools, I think this is now a problem. A candidate with a two-year experience of Hadoop-Big Data-developer comes, you start asking how Hadoop works, what parts it consists of, and the person does not know. Since Hadoop provides certain interfaces to simplify working with it, this is enough for a certain range of tasks. And often a person does not even go beyond the scope of these interfaces, that is, the code he gets from this to this. And what happens to the packed code after it has been sent to the system, the person no longer cares. This is enough for many, they do not want to understand it deeper. Conducting an interview is an excellent experience not only of hiring, it also gives you confidence in yourself as a specialist, which is very useful.

    Why directual. When I was the coordinator for the Data Engineer program, Artem Marinov and Vasya Safronov from Directual came to speak to us. Artyom, by the way, at one time interviewed me at DCA (again about the benefits of networking), and now invited me to talk. They needed a rocky, but they were ready to consider a javista, who understands how jvm works under the hood. So I was here.

    - What is so interesting you offered to do in Directual? What attracted you?

    - Directual is an ambitious startup that implements all the announced projects, that is, it does what it promises. I was pleased to be part of the team and take an active part in all implementations. And for me it was important that the company pays for itself by working with clients, and does not live on the money of investors.

    I will talk a little about the project from both the user and the back side.

    Directual slogan - “Let people create!”. This is the main idea - to enable any person who does not have the knowledge and experience in writing code to program in our visual editor.

    How it works: the user through the browser in our platform can “roll up cubes” (read - functional nodes of a process) - that is, to collect the script, which will be used to process incoming data. Data can be completely any. The processed output data can have a different view - from a report to a PDF to sending a notification to several administrators. Simply put, any business process can be programmed in minutes, while not being able to write code. The company works in two directions - box solutions for corporate clients, as well as the cloud option for a wide range of users.

    In order to make it clearer how it works, I will give a few examples.
    In any online store there are a number of functional stages (“cubes” in our case) - from showing the goods to the customer to adding to the cart and arranging delivery to the final consumer. Using the platform, we can collect and analyze data: the frequency of purchases, the time they were completed, the user's path, and so on. This will allow us to more closely interact with customers (for example, develop seasonal offers, individual discounts). However, this in no way means that our platform is the designer for creating online stores!

    Directual copes well with the automation of logistics processes and the work of hr-direction of large companies, and with the creation of any other technological solutions - from a farm for growing greens to a smart home. On the platform, for example, you can create a telegram-bot in a few clicks - we have almost every employee who writes the system core has its own bot. Someone made a librarian assistant, someone - a bot that helps to learn English words.

    We kind of "select" the work of some programmers, because now there is no need to contact them for help, prepare TK, check the work done. Now you just need to know how your business should work, you need to understand the processes themselves, we do the rest.

    - Listen, but the software for the farm for growing herbs, for example, has long been there. What makes you different?

    - Yes, it is true, there are concrete solutions for green production farms. However, you are not developing this software yourself, you are buying a ready-made solution. With the help of our platform, you can customize software for yourself, for your business and your tasks, you do not need to hire developers.

    - And what exactly are you doing?

    - The company is divided into 2 parts: the development of the core of our system and the project office, which, in fact, is our zero customer, if I may say so. I am developing a system kernel.

    As I said, we want to give anyone the opportunity to work on our platform. For this we are working on our cloud. And there are many problems. What is the difficulty: for example, there are 10 thousand users, they have several data flow scenarios, and in each stream there are 10-20 branch blocks. Imagine what the load on the iron. And we need to be able to clearly differentiate everything so that the processes of one client do not interfere with the processes of another, do not inhibit work. If one client has any problem that we need to solve, then we should not hurt the work of another client.

    Since the user does not need to think about how this all works under the hood, he is free from the choice of storage. We support different databases - it can be both relational databases and NoSql. In general, the system behaves to them the same way. But the client does not need to think about it - when creating an account, depending on the tasks, the system will help to make the best choice of storage.

    Our platform is a good example of a highly loaded distributed system, and my task is to write good code so that all this works flawlessly. As a result, here I got what I wanted: I work with the tools that interest me.

    image

    - And how did you come to the field of work with data?

    - At my first job, I was mainly engaged in similar tasks in a rather narrow segment (read - parsil xml :)), and I quickly disliked it. I started listening to podcasts, I realized how big the world is around, so many technologies that everyone is talking about - Hadoop, Big Data, Kafka. Then I realized that I need to learn, and the program “ Big Data Specialist". As it turned out, I didn’t lose: the first module (MapReduce, Hadoop, Machine Learning, DMP-systems - author's note) was very useful, I wanted to study it, but the second module is about recommender systems I just did not know where to apply, I never touched it. And then I went to DCA to work with what I was interested in. There, a colleague told me that in addition to the data-scientist, there is also a data engineer in this area, he told who he was and what could be useful for the company.

    After that, you just announced a pilot launch of the program “ Data Engineer“Of course I decided to go. I already knew some of the products that were on the program, but for me it was a good overview of the tools, structured everything in my head, I finally understood what the data engineer should work with.

    - But most companies do not share these two positions, two professional profiles of a person, try to look for universal specialists, who will collect data and prepare them, and make a model, and under highload, they will bring in products. What do you think, what is it connected with, and how correct is it?

    - I really liked the performance on the program “Big Data Specialist” Pavel Klemenkov (then worked at Rambler & Co), he talked about ML-Pipeline and mentioned about programmers-mathematicians. He spoke just about such universal specialists, that they exist, there are few of them, and they are very expensive. Therefore, Rambler & Co is trying to develop them at home, to look for strong guys. These experts really hard to find.

    I believe that if you really have a lot of data and you need scrupulous work with them (and not just predict the gender and age of a person or increase the likelihood of a click, for example), then these should be two different people. Here the rule 20/80 is valid: data scientist is 80% data science, 20% - he can write and write something in the product, and data engineer - 80% software engineer and 20% he knows what models are like them apply and how what to count, without deepening in mathematics.

    - Tell us about the most important discovery for you in data science \ data engineering? Maybe the use of some tool \ algorithm radically changed your approach to solving problems?

    - Probably the fact that, having enough data, you can extract a lot of useful information for their future actions. Even if sometimes you don’t know what these raw, impersonal data is, you can still do something based on them: break up the groups, find some features, simply use mathematical methods to figure out some patterns. True, analysts have been able to do this before, but the fact that now it has become more accessible has increased the power of iron - that’s cool! The threshold for entering data science has now decreased, you may not know much, so you can try to do something with some tools.

    - What was the biggest file at work? What lesson did you learn from this?

    - Probably upset you, I haven’t had one yet, maybe ahead. I honestly thought, recalled, but nothing was so, very boring. It's like administrators: if you haven't “dropped the prod”, you haven't “wiped the base”, you're not a real administrator. That's probably not the real developer.

    - What data engineering tools do you use most often and why? What is your favorite instrument?

    - I like Apache Kafka very much. A cool tool in terms of both the functionality it provides and engineering. The specificity of Kafka's work lies in the close relationship of the program code and the operating system on which it runs - Linux (read - “works fast and well”). That is, it uses various native linux functions, which allow for excellent performance even on weak hardware. I believe that in our field it should be so - it is not enough just to know a programming language and a couple of frameworks for it. If you want to create something really cool, which will be pleasant to use not only for you, but also for others, then you need to dig deeper and know how your code works in the system, on hardware.

    - What conferences do you attend? What profile columns \ blogs \ ng channels do you read?

    - As I said, it all started with podcasts, namely, “ Debriefing ” - from the guys from the world of java.

    There is also https://radio-t.com - a cool Russian-language podcast on high and it-technology topics, one of the most popular (if I'm not mistaken) in our language.

    I follow the news from JUG.ru , the guys make cool hardcore conferences, arrange meetings. I try to go to those in Moscow, in St. Petersburg, too. The top java conference is Jpoint in Moscow (also known as Joker in St. Petersburg), I always go to Jpoint or watch it online.

    Watch what Confluent is doing- guys who earn corporate support for kafka and are the main committers to it. Also develop convenient tools around Apache Kafka in opensource. I try to use their versions.

    The Netflix techlog on the medium is a cool resource about solutions to one of the largest platforms for delivering video content to the user. Highload and distributed systems for the most “do not want”)

    Channels in the telegraph: https://t.me/hadoopusers - a place where in our language you can chat on data-engineering-new topics; https://t.me/jvmchat - java people of the world, discuss its problems, their problems and not only.

    - Maybe something else for the soul?

    - I grew up on computer games, I once played very actively, now there is no time for that. And at some point I thought: “Since I can’t play games, then what’s stopping me from studying this area?” And if I suddenly get free time, I’ll take some java, C # or C ++ framework that can play write and do something. All this rarely reaches the final product, but I get pleasure. Therefore, in the list of my podcasts there is also one that tells about the creation of games - “ How do games”- a good professional podcast is not about how to“ code your super-mega-top game ”, but about the game production process: how the sound designer works, what the game designer does, the features of 2D / 3D artists, their processes, tools, how to develop the game, how to promote it. This spring was for the first time at a game conference, it was very cool: I didn’t feel at ease, but it turned out to be a completely different world. I was glad to know that in the gaming world, big dates are also actively interested. In conversations on these topics, I felt very confident.

    Quiz:
    - Java or Python?

    - Java, of course.

    - Data Science or Data Engineering?
    - Data Engineering

    - Artist or manager?
    - It depends, but for now, rather, a performer.

    - Family or career?
    - Some time ago there was a career, and now a family.

    - Cook at home or go to a restaurant?
    - I like to eat delicious food. I think I cook well, but it rarely happens. Therefore, probably go to a restaurant.

    Also popular now: