MikhailSA May 3, 2019 at 17:11

Why Data Science Teams Need Universal, Not Specialists

Transfer

HIROSHI WATANABE / GETTY IMAGES

In the book Wealth of Nations, Adam Smith shows how the division of labor is becoming a major source of productivity gains. An example is the assembly line of a factory for the production of pins: "One worker pulls the wire, the other straightens it, the third cuts, the fourth sharpens the end, the fifth grinds the other end to fit the head." Thanks to specialization focused on certain functions, each employee becomes a highly qualified specialist in his narrow task, which leads to an increase in the efficiency of the process. The output per worker increases many times, and the plant becomes more efficient in the production of pins.

This division of labor by functionality is so rooted in our minds even today that we quickly organized our teams accordingly. Data Science is no exception. Complex algorithmic business opportunities require many labor functions, so companies usually create teams of specialists: researchers, data analysis engineers, machine learning engineers, scientists involved in cause and effect relationships, and so on. The work of specialists is coordinated by the product manager with the transfer of functions in a way that resembles a pin factory: “one person receives the data, the other models it, the third performs it, the fourth measures” and so on,

Alas, we should not optimize our Data Science teams to improve performance. However, you do this when you understand what you are producing: pins or something else, and simply strive to increase efficiency. The purpose of assembly lines is to complete the task. We know exactly what we want - these are pins (as in Smith's example), but you can mention any product or service in which the requirements fully describe all aspects of the product and its behavior. The role of employees is to fulfill these requirements as efficiently as possible.

But the goal of Data Science is not to complete tasks. Rather, the goal is to explore and develop strong new business opportunities. Algorithmic products and services, such as recommendation systems, customer interactions, style preferences, sizing, clothing design, logistics optimization, seasonal trend detection, and more cannot be developed in advance. They must be studied. There are no drawings to reproduce, these are new features with their inherent uncertainty. Coefficients, models, model types, hyperparameters, all necessary elements should be studied using experiments, trial and error, and also repetition. With pins, training and design are done in advance, until they are manufactured. With Data Science, you learn in the process, not before it.

At a pin factory, when training comes first, we do not expect and do not want workers to improvise on any property of the product, in addition to increase production efficiency. The specialization of tasks makes sense, because it leads to the efficiency of processes and the coordination of production (without making changes to the final product).

But when the product is still developing and the goal is training, specialization interferes with our goals in the following cases:

1. This increases the cost of coordination.

That is, those costs that accrue during the time spent communicating, discussing, justifying and prioritizing the work that needs to be done. These costs scale superlinearly with the number of people involved. (As J. Richard Hackman taught us, the number of relations r grows similarly to the function of the number of members n in accordance with this equation: r = (n ^ 2-n) / 2. And each ratio reveals a certain amount of the cost ratio). When data analysis specialists are organized by function, at each stage, with each change, each transfer of service, etc. Many specialists are required, which increases coordination costs. For example, statisticians who want to experiment with new features will have to coordinate with data processing engineers, that complement data sets every time they want to try something new. In the same way, each new trained model means that the model developer will need someone with whom to coordinate their actions for putting it into operation. Coordination costs act as payment for the iteration, which makes them more difficult and expensive and more likely to force the study to be abandoned. This may interfere with learning. which makes them more difficult and costly and more likely to force them to abandon the study. This may interfere with learning. which makes them more difficult and costly and more likely to force them to abandon the study. This may interfere with learning.

2. This complicates the waiting time.

Even more frightening than the cost of coordination is the time lost between shifts. While coordination costs are usually measured in hours: the time it takes to conduct meetings, discussions, project reviews - waiting times are usually measured in days, weeks, or even months! The schedules of functional specialists are difficult to align, since each specialist should be distributed across several projects. A one-hour meeting to discuss changes can take several weeks to streamline the workflow. And after agreeing on the changes, it is necessary to plan the actual work itself in the context of many other projects that take up working hours of specialists. Work related to code fixing or research that takes just a few hours or days to complete, It can take a lot longer before resources become available. Until then, iteration and learning are paused.

3. It narrows the context.

The division of labor can artificially limit learning by rewarding people for remaining in their specialization. For example, a research scientist who must remain within the scope of his functionality will focus his energy on experiments with various types of algorithms: regression, neural networks, random forest, and so on. Of course, a good choice of algorithm can lead to gradual improvements, but, as a rule, much more can be learned from other activities, such as the integration of new data sources. Similarly, it will help develop a model that uses every bit of explanatory power inherent in the data. However, its strength may lie in changing the objective function or relaxing certain restrictions. It is difficult to see or do when her work is limited.

Let us name the signs that appear when data science teams work like pin factories (for example, in simple status updates): “waiting for data pipeline changes” and “waiting for ML Eng resources”, which are common blockers. However, I believe that a more dangerous effect is what you don’t notice, because you cannot regret what you don’t know yet. The impeccable fulfillment of requirements and complacency achieved as a result of achieving process efficiency can obscure the truth that organizations are not familiar with the benefits of learning that they miss.

The solution to this problem, of course, is to get rid of the factory pin method. In order to stimulate learning and iteration, the roles of data science should be common, but with broad responsibilities that are independent of the technical function, that is, organize data specialists so that they are optimized for learning. This means that it is necessary to hire “full-stack specialists” - general specialists who can perform various functions: from concept to modeling, from implementation to measurement. It is important to note that I do not assume that when hiring full-stack specialists, the number of employees should decrease. Most likely, I’ll just assume that when they are organized differently, their incentives are better aligned with the benefits of training and effectiveness. For example, you have a team of three people with three business qualities. At the factory for the production of pins, each specialist will devote a third of the time to each professional task, since no one else can do his job. In a full stack, every universal employee is fully dedicated to the entire business process, job scaling and training.

With fewer people supporting the production cycle, coordination is reduced. The wagon smoothly moves between functions, expanding the data pipeline, to add more data, trying new functions in models, deploying new versions in production for causal measurements, and repeating steps as quickly as new ideas come. Of course, the station wagon performs different functions sequentially, and not in parallel. After all, this is just one person. However, the task usually takes only a small part of the time required to access another specialized resource. So, the iteration time is reduced.

Our station wagon may not be as skilled as a specialist in a specific job function, but we do not strive for functional excellence or small incremental improvements. Rather, we strive to study and discover new professional challenges with a gradual impact. With a holistic context for a complete solution, he sees opportunities that a narrow specialist will miss. He has more ideas and more opportunities. He also fails. However, the cost of failure is low, and the benefits of learning are high. This asymmetry promotes fast iteration and rewards learning.

It is important to note that this is the scale of autonomy and the variety of skills provided to scientists working with full stacks, largely depends on the reliability of the data platform on which you can work. A well-designed data platform abstracts data scientists from the complexities of containerization, distributed processing, automatic transfer to another resource, and other advanced computer concepts. In addition to abstraction, a reliable data platform can provide unhindered connectivity to the experimental infrastructure, automate monitoring and reporting systems, and automatically scale and visualize algorithmic results and debugging. These components are designed and built by data platform engineers, i.e. They are not transferred from the Data Science Specialist to the data platform development team. It is the Data Science Specialist who is responsible for all the code used to launch the platform.

I was also once interested in the functional division of labor using process efficiency, but by trial and error (there is no better way to learn), I found that typical roles contribute better to learning and innovation and provide the right indicators: discovering and building a much larger number of business opportunities than specialized approach. (A more effective way to learn about this approach to organization than the trial and error method I went through is to read Amy Edmondson's book, Team Interaction: How Organizations Learn, Create Innovation, and Compete in the Knowledge Economy.)

There are some important assumptions that can make this organization approach more or less reliable in some companies. The iteration process reduces the cost of trial and error. If the cost of the error is high, you may want to reduce it (but, this is not recommended for medical applications or production). In addition, if you are dealing with petabytes or exabytes of data, specialization in data design may be required. Similarly, if maintaining online business opportunities and their accessibility is more important than improving them, functional excellence can outperform learning. Finally, the full-stack model is based on the opinions of people who know it. They are not unicorns; they can be found or prepared by yourself. However, they are in high demand, and to attract and retain them, the company will require competitive financial compensation, sustainable corporate values and interesting work. Make sure your corporate culture can provide these conditions.

Even with all this said, I believe that the full-stack model provides the best conditions for starting. Start with them, and then consciously move towards the functional division of labor only when it is absolutely necessary.

There are other disadvantages of functional specialization. This can lead to a loss of responsibility and passivity on the part of workers. Smith himself criticizes the division of labor, suggesting that it leads to a dulling of talent, i.e. workers become clueless and withdrawn, as their roles are limited to a few repetitive tasks. While specialization can ensure process efficiency, it is less likely to inspire workers.

In turn, universal roles provide everything that stimulates job satisfaction: autonomy, skill and determination. Autonomy is that they are not dependent on anything in achieving success. Mastery lies in strong competitive advantages. And determination is the ability to influence the business they create. If we manage to get people to get carried away with their work and have a big impact on the company, then everything else will fall into place.

Tags:

Why Data Science Teams Need Universal, Not Specialists

Also popular now: