Russian girls in Data Science

    As you know, in IT there are significantly more men than women, although the latter are often not inferior in knowledge and skills. According to our observations, in the field of Data Science this bias is even stronger, although again women process data and build models no worse than men. Confirmation of this for us was the final results of the participants of our last group, “Big Data Specialist,” when 3 girls entered the top 5 group (and there were four of them in the group).

    We set out to find girls in different companies and industries working with big data, managing teams, and we managed to collect interesting material that does not fit into the framework of one article, so wait for a series of publications!

    And we open this series with interviews with Anna Kryuchkova and Maria Anisimova, who will talk about their work, career path and the future of girls in Data Science.



    - Tell us about the company in which you work and about your position in it. What tasks related to data analysis arise in a company? What is your responsibility?

    Anna: I work for MegaFon PJSC as an expert on segmented programs. In fact, this is a product manager working on big data models, I make them “come to life” and generate income.

    Maria: I work in the Moscow Department of Information Technology in the direction of Big Data. We develop smart analytics on an urban scale. I am the head of the data modeling department, but the main direction that I accompany the launch of projects is Internet analytics. Here you need to understand that in our products it is not so much connected with web analytics and the optimization of the UI of city portals (for example,www.mos.ru ), as many people think, how much with profiling Internet users. My responsibilities include supporting projects at every stage - from the initiation and launch of work, to completion and transfer to an industrial solution, that is, the creation of an immediate product. Moreover, maintenance involves both making decisions regarding the mathematical apparatus used, collecting and analyzing the available data, and identifying the necessary technical infrastructure.

    - What was your educational and career path? How did you get into the company?

    Anna:I started my career with consulting, I went through a rigorous selection into a consulting boutique with a small number of clients, whom we helped in several areas at once. At the same time, what you rarely see in consulting, at strategic presentations, our work with clients did not end, but only began - we were responsible for implementing the changes. Since I graduated with honors from a mathematician-economist, I immediately asked for analytical and marketing projects. It was very interesting to solve real problems and see how your ideas work and forecasts come true, work became life, customers became practically relatives.

    However, I could only dream of working with “big data” - the medium-sized retail and manufacturing business teaches us how to work with “small data” - enrich it, process it to get a result that is independent of accidents or rare events. One of our clients was the company White Wind Digital, with whom we worked only a couple of years, but I managed to get involved in its tasks and wanted to go there to work full time as the head of the analytical department. I didn’t have enough data about customers, and in all cases it turned out that we should learn to accumulate these data, analyze them, make individual offers and even involve customers emotionally in our brand. The loyalty program was asking for itself - a rather expensive tool, but ways were found to bring it almost to payback.

    It was a very interesting experience, for two years we implemented a complex technical solution, built a communication system with customers, and began to consider returns. Well, it became clear to me that I wanted more. More data. Bigger, bigger. Banks or telecom - I thought for myself and ended up in telecom. And again, I was lucky to work with a super team and super ambitious tasks, but not touching the data directly, but being the so-called customer of analytics.

    Anna Kryuchkova

    Maria: I graduated from the undergraduate program of the Higher School of Economics in the field of Economics with a specialization in Statistics and Data Analysis. After graduation, she immediately went to the magistracy of the same university, but already to the direction “Management” of the specialization “Project Management”, where she “met” the current employer.

    The career path was quite simple: during my studies at the university I worked in various organizations, one way or another connected with statistics, but from Data Science, as such, these directions were still far away. This is partly due to the specifics of using statistical analysis in our country: a few years ago, such solutions were applied only in very narrow industries - banking, insurance and strategic sales planning for commercial organizations. In addition to Rosstat and students, almost no one was engaged in socio-demographic statistics.

    - What tasks of machine learning are most often encountered in your work? What algorithms and models do you use to solve them?

    Anna:Obviously, in telecom it is primarily a task of classification and segmentation. Algorithms are handled by another part of the team, which did not stop us from brainstorming together and coming up with how they would be applied.

    Maria: Most often, tasks arise under the code names “profiling” and “forecasting”. The first implies mainly clustering, segmentation of users according to available attributes, i.e. factors, which are often not units, but tens and hundreds. The second type of tasks involves constructing a vector of user behavior with a further search for “look-a-like” ones to build assumptions about whether unidentified users belong to a particular segment and to predict the next user action.

    Accordingly, the models for all these tasks use standard ones - random forests, gradient tree boosting, logistic regression and ensembles of these algorithms for classification problems, PCA (main components) and DBSCAN (for noisy data) methods for clustering problems. If there are problems of text analytics (for example, to identify thematic interests based on the types of Internet content consumed), then the naive Bayesian classifier, VSM (vector model of semantics), the k-means method and classification by the maximum entropy method are used.

    As you can see, the set of models and algorithms is similar to the set of any team that is engaged in analytics. But I believe that the solution to any big data problem is not only the construction of models - a large amount of work falls on the stage of collecting and preparing data (Data Mining) and the interpretation of the results, i.e. adapting them to business applications. Relatively speaking, it is not enough just to reveal a pattern based on the constructed correlation matrices; it is important to understand what to do next and how to use it in the product, and not just on beautiful slides with drawn infographics.


    Maria Anisimova

    - Do you think that work in the analysis of data is suitable for people only with a certain background or, with due persistence, can everyone learn data science?

    Anna:So far, only in a small number of educational institutions can you get a serious DS specialization, basically you have to retrain yourself. Of course, it will be much easier to understand the background in mathematics, but the most important thing here, as in any business, is 1% inspiration, 99% perspiration, with this formula everything is possible.

    Maria:We have an opinion that any person can learn anything, if only there was a desire. In addition, as I said earlier, the field of analytics is not limited to the construction of mathematical models - there are a lot of other important stages in this work. The model will not build on anything unless there is a proper amount of data collected by someone, structured and normalized, which at the same time is enough to solve a specific business goal. And given the versatility of the industries in which analytics is now used, you can be anyone by education / profession. A teacher who analyzes the performance of children in his class with the further development of a curriculum is also a participant in the new-fangled Data Science, even if on a small amount of data of 40-50 records in an Excel table (exaggerating).

    - What advice would you give to beginners? What online and offline courses have you attended and which ones can you advise?

    Anna: I decided to dive into this area with courses from Newprolab . After going through them, already roughly orientating myself in the topic, I began to read many books - here you’re nowhere without Sebastian Raska, the rich pantheon of O'Reilly publishers, the classic editions of Bishop and Murphy. And of course, it is best to learn in practice, so I hope to get to the machine learning competitions.

    Maria:For beginners in this field, regardless of age - you are a student or a person with work experience of more than 20 years who has decided to retrain, I advise you first to decide in which area you may be interested in studying data. I mean to choose the industry: education, healthcare, finance, or, if functional, the Internet, text analytics, analysis of photo and video materials. Start with basic online math courses on Coursera, then delve deeper into the study of existing practical works (for example, you can read the same Habrahabr or follow the competition on Kaggle). So you will understand that it is interesting specifically for you, communicate with people closely related to this area, study trends and begin to learn from practical experience. Further, if you are interested in working in this direction, study employers, developing a direction of data analysis, or by then the employer will find you. :)

    - Tell me, is there any special policy towards girls in the company? Being a woman in IT, and in DS in particular - what are the advantages and disadvantages that you see? Is there any difficulty?

    Anna: Needless to say, you rarely meet a girl in Data Science, in IT, and girls-mathematicians are by no means the vast majority. Perhaps the reason for this is some sociocultural grounds. However, a rare "instance", having reached the DS, is very motivated in its work and is so genuinely interested that it quickly gets the respect of its colleagues.

    Maria:There is no special policy in the company. For some reason, it used to happen that in programming there were mostly men. Now the world is changing, the boundaries of the distribution of professional areas relative to gender are being erased. We at the university learned to analyze data in Excel and SPSS, but at work, when you encounter arrays of tens and hundreds of millions of records, you begin to think about the need to start learning programming languages ​​that will allow you to work in certain DBMSs. In my opinion, this is not difficult, although it may be more difficult for girls to adapt to new solutions, girls are embarrassed to ask questions and begin to learn something new. Men in this regard are more mobile and decisive. If you are a girl without complexes, brave and young, then everything will work out. :) But this is not only in relation to DS, it is everywhere like that.

    - Will the percentage of women to men in data science change in the future? How do you think to draw the attention of women to the field of data analysis?

    Anna: Society has already begun to change: not only dolls are bought for daughters, but also designers with cars, they are assigned not only to dance classes, but also to programming classes. Curiosity, curiosity, the ability to notice patterns - all this is inherent in girls to the same extent as men, because interest in the industry in which all these qualities can be realized is inevitable for both sexes. Especially when you consider how much this interest is fueled by the media.

    Maria:The question is more correct for me: “How to attract the attention of people as a whole to the field of data analysis, not limited to exchange and banking projects?”. If we talk about differences in gender structure, given that universities are currently launching specialized educational programs, and literally every commercial organization creates data analytic units in its organizational structure, the ratio of women to men in DS will be smoothed out and leveled in the future. In general, of course, the more beautiful girls in teams, especially in harsh IT, the more motivation for the male half to do great things and change the world. :)

    PS By the way, at the upcoming 7th set of the Big Data Specialist program, the coordinator will also be a girl - our graduate.

    Also popular now: