Four paths from the Yandex Data Analysis School

    Yandex has been preparing data science specialists since 2007. Students appreciate the School of Data Analysis for the relevance of curricula and courses, but they do not always understand what awaits them after graduation. Working with data in Yandex or in another large company? But which one?



    Initially, the School had two departments: computer science and data analysis. In 2014, when big data came into vogue, a third specialization appeared - big data. This year, in order for the students to immediately become clearer in their perspectives, we carried out a department reform: now the training will take place within the framework of four professional tracks. Our primary task is to tell the student about possible ways of development and help us understand which courses will help in achieving the goal.

    Professional tracks were not singled out by accident - these are four paths that graduates most often take after completing a ShAD (and some already during their studies). For each of these four paths, we found one graduate who chose him and talked with them to understand which courses turned out to be the most useful for future work and how they chose their professional vocation.

    Data scientist (Nikita Popov, graduate of 2016):

    “Data scientist is what analysts of all stripes are now called. We at Yandex are used to thinking that a data scientist is a person who is fluent in machine learning and statistics and, most importantly, in practice can extract useful information from a huge amount of data.

    I am currently working on the Search metrics team. We are working to assess the quality of our search, to choose which way to go and which of the many experiments we are doing will really increase the user's “happiness”. I got to the team through an internship immediately after the end of the ShAD. The school of data analysis gave me an excellent base: courses on machine learning and probabilistic models - this is exactly what I use every working day.

    When I came to the ShAD, I still did not understand what I wanted to do, and I entered the company with my classmates, but from the first seminars it became clear that the ShAD is extremely interesting. It was there that I understood what I wanted to do. I think that every data scientist should be well versed in various methods of machine learning, be aware of their pros, cons and scope, be able to find dependencies in the data and draw the right conclusions based on them. Despite the fact that I work as an analyst, very often I have to deal with development. Recently, I finished the service for which I developed both the frontend, and the backend, and the algorithms themselves - the data scientist should be able to do everything. ”

    Machine learning developer (Zhenya Zakharov, graduate of 2018):

    “Even at the university, I liked the tasks most of all, where mathematics plays a significant role, but the result can be“ touched by hands ”. My current work quite well meets these two conditions: we implement various algorithms, modifying in passing so that they work faster, higher, stronger with our data. One of the key indicators for us is productivity. There is a lot of data, and the algorithm must be able to quickly predict and learn in a reasonable time.

    I had a lot of programming at the university, but the ShAD courses differ in algorithmically more complex tasks, with a greater emphasis on performance and cleanliness of the code.

    ShAD gave me a good set of basic skills that I use every day: machine learning in its various guises, applied statistics, algorithms and an idea of ​​how an industrial code should look. The project of the big data course turned out to be very relevant, where we and the guys in the team wrote gradient boosting, trying to catch up with LigthGBM speed, which we did not catch up with, but still managed to achieve a comparable time. ”

    Big Data Infrastructure Specialist (Vlad Bidzilha, graduate of 2017):

    “From high school I wanted to be professionally engaged in programming. I entered the ShAD when I was in the third year of university. He opened before me a marvelous new world of machine learning and data mining, high-performance systems with a bunch of algorithms at the interface of applied mathematics and programming.

    For several years, I worked at Yandex on the quality team for video search rankings. ShAD Advanced C ++ and Python courses helped me quickly get involved in the workflow - move from writing academic programs at the university to serious production code at the company.

    Recently, I have been working in a distributed computing technology service. We are developing the YR MapReduce system: habr.com/company/yandex/blog/311104. Here, the knowledge and skills acquired in the SAD were also extremely useful: a course on classical algorithms and data structures instilled an algorithmic culture, developed the ability to quickly write efficient and clean code with a minimum number of bugs and an understandable structure, to understand complex algorithmic solutions; A course on algorithms for working with large amounts of data demonstrated the difficulties encountered when processing a data array that does not fit in computer memory, and methods of dealing with these difficulties, provided an understanding of the basic patterns of building algorithms in external memory and streaming (streaming) algorithms, developed basic practical writing skills; the course on parallel and distributed computing introduced the basic constructs of multi-threaded and distributed programming,

    In addition, it is worth noting that, thanks to SJA, I managed to get deeply acquainted with applied mathematical courses, which often remain outside the classical university program: information theory and computational complexity, advanced discrete mathematics, statistical analysis, combinatorial and convex optimization. This knowledge combines theoretical mathematics and high-tech IT industry. ”

    Specialist in data analysis in applied sciences (Nikita Kaseev, graduate of 2015):

    “I am working on the use of machine learning methods for problems of fundamental physics at CERN in the status of a postgraduate student at the HFC and Sapienza University of Rome.

    He took a great interest in physics from school, was a prize-winner of the All-Russian Olympiad, went to the FOPF MIPT. Largely due to idealistic considerations - if you are not engaged in science, then what? But to computers always. Bachelor’s work was devoted to computer modeling of non-ideal plasma, and there were many algorithms and C ++.

    In the fourth year I entered the ShAD, a year later I was invited to the emerging group of international educational and research projects in Yandex. Now it has transformed into a joint laboratory of Yandex and HSE - LAMBDA. We not only do something with our hands, but also teach machine learning physicists, so I taught Oxford in some way. At our summer school, but still;)

    What from SHAD was useful? Many things.

    • The course of algorithms: a common programming culture and, suddenly, algorithms. It was fun for two hours to speed up the physical simulator ten times, simply by adding a kd tree instead of a full brute force.
    • Machine learning, deep learning: bread and butter, especially, suddenly, the theoretical part. In high energy physics we have to deal with non-standard problems in which import xgboost is not enough.
    • Domain adaptation: how to combine physical considerations and machine learning to make an algorithm that will be trained on simulated data, but applied to real? What if the training set is dirty, but there are negative weights that clean it up? How to measure the accuracy of the restoration of the distribution of GAN?
    • Big data processing: Hadoop had to be used.
    • A recent product course: we work in a collaboration of 1,000 people, and many of our results are not a scientific discovery in its pure form, but a tool designed for other people. For example, the project from which I started when I came as an intern — the search index for events that the detector registers — ended up not being in demand, unlike the monitoring system, which now tracks the quality of the data from the detector.

    In general, you will be in Geneva, come to visit, it is interesting here :) ”.

    Also popular now: