How I studied data science

Published on September 25, 2018

How I studied data science

    My name is Azat Bulyakkulov. I work as a risk analyst at fintech company ID Finance. I started with analytics, creating reports for risk, marketing and finance departments. In our relatively small company, I had to interact with all departments. As a result, I managed to get diverse professional experience. I participated in the calculation of reserves for finance, compared the efficiency in A / B tests, segmented clients for marketing, etc. In less than a year of work I joined the development of scoring cards. And I realized that I want to better understand the analysis and data processing.

    image


    We used classic logistic regressions to predict customer default. One of the sources of our data is financial transactions, to which, if desired, clients give us access. Working with them required a creative approach, since much useful information could be extracted from this well of data. As I learned later, this process is called feature engineering. It captured me - I became even more interested in data science.

    Working with other departments, I saw that the scope for ML application is huge. Not the last role in my interest in DS was played by the fact that we were developing in heavy SAS-e. It is not the most user-friendly interface and not full functionality. I wanted to work with a more flexible tool.

    I understood that independent study of data science, for example, on Coursera, requires a weak will and self-discipline, which I do not possess sufficiently. Therefore, I began to look not towards online courses, but “live” courses with lectures, discussions and homework.

    I informed my supervisor at work about the direction in which I want to develop. The management went to meet me, and offered to pay for courses, and subsequently go to the data science department within the holding.

    So, I started to choose courses. It is curious that online courses prevail in the DS education market. Even in Moscow there is not a large selection of serious courses not in the style of “we will teach you data science in 21 days”. I understood that high-quality training should last at least six months. Shade Yandex, I did not consider, as it requires total immersion and daily activities. Working full-time, it would be difficult to absorb and process the training material. Looking ahead, I will say that on the chosen course I had problems with time for study, not to mention a free one. As a result, I stopped at the Data Scientist course of one of the popular schools for half a year: 5 months of intensive training + a month to write a diploma.

    About the course


    Education cost about 200,000 rubles. There were many classes - 3 times a week for 3 hours. After about 2 of the 3 lessons, there was homework. The program was classical and included the basic methods of machine learning, recommendation systems, image recognition, machine vision, nature language processing (NLP), time series. Plus there were several hackathons and a diploma for those who pass the minimum necessary amount of homework.

    image

    Classes were held on Baumanskaya, 30 people were recorded in the group, but they walked steadily 15-20. I worked twice in the evenings on weekdays and on Saturday from 10:00 to 13:00. It is curious that people from different areas, not necessarily related to IT, came to the courses. Yes, there were front / backend developers, but half of the course was related to product / business or risk analytics. And for almost all of these courses meant a change of profession. Some came because now there is a certain HYIP around data science, others are bored with their current activities, others are planning to use DS in their work. Almost everyone paid for tuition on their own, so the level of interest was quite high.

    My impressions


    It all started with basic knowledge and skills to program in python, data visualization. Then we switched to a gallop and began to go through one machine learning method in one session: decisive trees, linear / logistic regression, random forests, and busings. I personally think that more time is needed to study these classical methods.

    What I liked


    • We studied almost all modern methods and approaches of machine learning.
    • There was a separate unit for feature engineering - as many as 3 classes. This is useful information, but, unfortunately, the lecturer did not read this part in the best way.
    • Part of the homework was from the Kaggle competition. After submitting the results, you could see your position. After that, there was a motivation to improve your model, adjust its parameters, and not just do your homework on the "back off".
    • There were deep courses on recommender systems, NLP and computer vision, each with 6-8 lessons. And, in my opinion, there were the best lecturers.
    • After the blocks on computer vision and time series there were 2 hackathons.

    This turned out to be a very useful exercise. The need to get an acceptable result in minimal time activates and loads the brain to the fullest. Plus, working in a team, you see the approaches of other people.

    • In my account there was a rating of students, where I saw the progress of my classmates in the homework assignments. It was helpful. Since during the break I approached the “nerds” and asked how they did this or that homework.
    • The advantage of "live" lectures - questions in the course of the lesson.
    • In the audience, on the instructions of the lecturer, we did small exercises immediately in python
    • Student community - communication with classmates, exchange of views, it was interesting to hear from others about their motivation and areas of interest to them ML.

    What did not like


    • The high density in the review of the main methods is just one lesson for each method.
    • In general, I would like 2 classes a week, but not 3. For me personally, studying was hard, I ate almost all my free time. Some of my classmates, to my envy, could study at work.
    • For unknown reasons, the block was transferred by NLP and carried it to computer vision (CV). As a result, we had to use neural networks at NLP, which we described in more detail only in the CV part.
    • There were lecturers with extremely low pedagogical abilities. In addition, they did not check homework on time.

    image
    The scope of data science has recently expanded greatly.

    Total


    I had 5 months of intensive training, where I was deep enough into the world of ML. I learned how to write data processing on Python, visualize them, build various models. Also generated text using neural networks, classified images.
    I think I got a good experience to start. My diploma mentor said that our knowledge is pulling at the middle data scientist, and the experience at junior-a. Well, we'll see in a couple of months. Since I move to the data science department of our company within two weeks.