What do data analysts actually do? Findings from 35 interviews

Original author: Hugo Bowne-Anderson
  • Transfer
The author of the material conducted a series of conversations with experts in the field of data analysis and processing and made conclusions about the prospects and directions for the development of data scientists.

The theory and methods of data processing simplified the solution of various problems in the field of technology. This includes the optimization of Google search results, recommendations on LinkedIn, the formation of headlines for Buzzfeed. However, working with data can significantly affect many sectors of the economy: from retail, telecommunications, agriculture to health care, freight traffic and penitentiary systems.
Nevertheless, the terms “data science”, “theory and methods of data analysis” and “data analyst” (data scientist) are not completely understood. In practice, they are used to describe a wide range of methods of working with information.
What are data scientists actually doing? As the leading podcast DataFramedI got a great opportunity to conduct interviews with more than 30 data analysis specialists from a wide range of industries and academic disciplines. Among other things, I always asked what exactly their work consisted of.
Data science is a truly vast area. My guests approached our conversations from all sorts of positions and points of view. They described a wide variety of activities, including large-scale online frameworks for developing products on booking.com and Etsy, the methods used by Buzzfeed to solve the multi-armed gangster problem while optimizing material headers and the impact that machine learning has on Airbnb business decision making.The latest example was voiced by Robert Cheng, an Airbnb data analyst. When he worked on Twitter, the company focused on growth. Now in Airbnb, Cheng is developing a massive machine learning model.
Approaches to the application of the theory of analysis and data processing can be very different, and the choice of a solution depends not only on the industry, but also on the type of business and its objectives.
However, despite the diversity, a number of common themes are clearly visible in all interviews.

What do data science experts do?

We know how data science works, at least in the technology industry. First, the researchers lay a solid foundation in the form of collected information to conduct a thorough analytical work. At the next stage, they, among other things, use online experiments to make steady progress in solving the problem. As a result, machine learning methods and specialized products are created that process the data that is needed to better understand your business and make better decisions. That is, the essence of data processing methods in the field of technology is reduced to building infrastructure, conducting tests and machine learning for decision making and creating information products.

Big steps are being taken in other non-technology industries.

At one of the meetings, Ben Skrainka and I, a data processing specialist at Convoy, looked at the effective use of information processing techniques for innovation in the North American freight industry. And Sandy Griffith from Flatiron Health spoke about the important role data analysis plays in cancer research. Together with Drew Conway, we discussed his company, Alluvium, which “uses artificial intelligence and machine learning to identify useful patterns based on large-scale data flows generated during the operation of industrial systems.” Mike Tamir, the current head of autonomous driving at Uber, spoke about working at Takt, where Tamir helped Fortune 500 companies introduce data processing and analysis methods. Among other things, he shared his experience in developing a recommendation system for Starbucks.

Data analysis is not only the prospect of the emergence of autonomous cars and artificial intelligence.

Many guests of my podcast were skeptical of the general-purpose AI fetishization by popular media (example: VentureBeat's article “By 2042, an AI god will be created who will write his bible. Will you worship him?”) And the hype around the engine and the deep learning. Of course, both of these areas are powerful approaches with important examples of practical applications. But such a stir should always be treated with a fair amount of skepticism. Almost all my guests noted that real researchers in these areas earn their living by collecting and filtering data, creating dashboards and reports, doing data visualization and statistical analysis. In addition, they need to be able to convey the essence of the results to key players and convince decision-makers.

The set of skills demanded in the profession of Data scientist is constantly changing and growing (and having experience with deep learning is not the main requirement)

In a conversation with Jonathan Nolis, one of the foremost Seattle data analyst who works with companies from the Fortune 500, we discussed the following question: “Which of the two skills is more important for a specialist working with data is the ability to use sophisticated learning or the ability to draw good slides in PowerPoint? ". Nolis argued in favor of the latter, believing that an accessible explanation of the analysis results remains a key element in working with information.
Another popular topic is the variability of basic skills. The demand for some of them may change in the foreseeable future. The rapid development of commercial and open data analysis tools has led to the fact that we are now witnessing a massive shift to automating many routine tasks, such as data cleansing and their initial preparation. Until now, it was commonplacea situation when 80% of the valuable research time was spent on simple search, screening and structuring of data, and only 20% went on analyzing them. But this situation is unlikely to continue. Today, automation has reached even the processes of machine and in-depth learning. In particular, in a separate podcast, which is entirely devoted to such issues, Randal Olson, a leading specialist in data analysis and processing in Life Epigenetics, spoke about this.
According to the interview results, the overwhelming majority of my guests believe that the ability to create and use the infrastructures of in-depth training is not at all the key. Instead, they point to the ability to learn on the fly and the ability to competently explain complex analytical calculations to key participants in the process, far from technical issues. Therefore, purposeful specialists in the field of data processing and analysis should pay a little more attention to the correct presentation of the material than to the methods of information processing. New methods come and go, but critical thinking and quantifiable professional skills will always be relevant.

Specialization is becoming more important.

Despite the lack of a clear career growth scheme and the lack of support from beginning specialists, we are already seeing the emergence of some areas of specialization. Emily Robinson described the difference between scientists of type A and B. According to her, type A includes analysts whose activities are close to traditional statistics, but representatives of type B are mainly engaged in the creation of machine learning models.
Jonathan Nolis divides data science into three components. The first component is a business analyst, which boils down to “taking the company’s data and providing it to the right people” in the form of dashboards, reports, and emails. The second is decision-making theory, which aims to “take the data and help the company make the best decision with their help”. The third component is machine learning, where specialists seek to answer the question “How can we consciously apply information and analytical models in a real project?” Despite the fact that many leading experts in their activities cover all three areas, specific career paths have already begun to form , as is the case with machine learning engineers.

Ethical and moral issues are a serious challenge.

You probably guess that the representatives of the analytical profession meet on their way a considerable amount of uncertainty. When I asked Hillary Mason in the first episode of our conversation if there were any other difficulties facing the professional community, she replied: “Do you really think that we lack the moral guidelines, standard practices and orderly terminology at this stage of development? ? ”
All three points are really important, and the first two issues cause concern for virtually all guests of the DataFramed podcast. What role will morality play in conditions when algorithms developed by information analysts dictate how we interact with the outside world?

As Omogu Miller, GitHub's main machine learning specialist, said in an interview:
It is necessary to formulate an understanding of basic moral values, develop a scheme for training specialists and make up something like the Hippocratic Oath. And we need the most genuine licenses to punish or deprive the rights of practice of a specialist who has gone beyond ethics. We must make it clear that we, as an industry, are against such actions. And, of course, it is necessary to somehow help to correct those who have committed serious violations, and those who have retreated from the rules due to ignorance, because they did not pass the necessary training.

A hot topic is the serious, harmful and immoral consequences of using data science, as happened with the COMPAS recidivism risk rating, “which was used to predict and identify future criminals,” and, according to ProPublica , turned out to be “biased against black people”. the Americans. "
We are gradually coming to an agreement on the fact that ethical standards should originate within the community of professional analysts, as well as receive support from legislators, public movements and other stakeholders. Partly, special emphasis is placed on the interpretability of models as opposed to modern black box solutions. That is, it is necessary to create models that can explain why they made a particular forecast. In-depth training copes with many tasks, but is famous for its inexplicability. Dedicated researchers, developers and data analysts are making progress in this direction through projects such as Lime , aimed at explaining the principles of machine learning models.
The large-scale revolution of data analysis in the sectors of human activity and society has just begun. It is not yet clear whether the profession of data analysis specialist will remain the most attractive work of the 21st century , whether it will become more focused or simply turn into a set of skills that researchers will have to possess. As Hilary Mason said: “Will the science of data exist in 10 years? I remember the world in which it was not, and I would not be surprised if this profession is waiting for the same fate that befell the profession of a webmaster. ”


Also popular now: