How to create a department of Data Science and not screw it up
Data Science began to come not only to large companies, but also to small ones, and even to startups. However, very often top managers do not have an understanding of what is required for its successful application. Many people think that one data scientist in a month will solve all the problems of the company, and artificial intelligence will click to work perfectly in all departments. Unfortunately, this is not the case. My name is Ivan Serov and in this post I will tell you where to start creating a DS department and with what difficulties it is fraught.
Management expectations
One of the most important in creating a department is to immediately lay down expectations and KPIs. With DS, as with any other innovation, you need to go through the whole cycle, which will begin with operational losses. At best, the costs of architecture and specialists can be recaptured in six months, and more often in a year or two or three, depending on the size of the company. You must be prepared for this and not give up after a couple of failures. Often, top managers close the department after a year because he did not have time to make a profit. Because of this, trust in DS is lost. Only by setting the necessary expectations and goals (preferably, on a SMART ) can you make a successful department.
Start small
The best thing to start is to make a so-called proof of concept project - it’s not very complicated and short-lived, but it can bring business to the business. For example, increase revenue by 2% due to the recommender system. You should not try to make an ensemble of 5 custom neural networks and work on it all year. For example, even for projects on the classification of texts, you can start with simple algorithms (such as bag of words) and already get a boost. As a result, this pilot project will be a starting point for further development and will give the management an understanding that money goes to useful things and DS needs to be developed. This will further give time to work on more complex things. In the absence of competences, it makes sense for a pilot project to hire an external team of DS consultants. They can help bring your wishes to life with pretty good quality,
Collect data
Everything here is simple and difficult at the same time: ideally, the company should use all the data that it has. For example, if you are an online retailer, you have, at a minimum, data on sales of specific products, customer behavior on the site and marketing newsletters. Already on this you can build many models, for example, the system of personal mailings.
In fact, it is often a big problem to collect all the company's data into one database due to the different sources, the lack of clear interaction between departments, or even the lack of BI specialists in the company. For organizations that have all the data stored in excel, you should first start collecting them into a database (SQL), and only then think about DS.
All available data should be collected in the form in which it will be convenient for analysts and data scientists to take (most often it is SQL). You need to agree in advance with the BI department about the form in which you want to receive data, process and use it in production.
With a small amount of data, you can buy them from third-party companies. For example, in a telecom: link this data by telephone number with yours and thus enrich them. But in each of these cases, it is necessary to calculate whether there is a benefit from this.
Find analysts
It is important that the company already has an analytics department at the time of creating the DS department. These are the guys who will help Scientists find data, tell what they mean, how to properly collect the necessary variables and much more. Analytics is the first step in the company's movement towards the very Data Driven decision-making approach (that is, when all decisions in the company are based on the data obtained, and not on the desire of management). They will help to benefit from the data without the use of models, and reports will help management to make the right decisions. In addition, in the future, analysts will monitor the status of all DS models and prepare reports based on the results.
Pick a team
Many articles have already been written about this item, I will only try to summarize what has already been said. So, a good DS team most often stands out from:
- Project Manager - manages the project, is responsible for the entire business part;
- Data Scientist - builds models;
- Data Engineer - collects data and prepares production pipelines;
- Developer - implements a DS-solution.
All roles are very variable and may vary depending on your desires. For example, sometimes a team can still have a business analyst, sometimes there can be several data scientists at once, sometimes a data engineer and a developer can be one person. There are a lot of team options and you need to build on your needs. Or try several options and choose the best.
In addition to the standard team, creating a department from scratch requires not only good specialists from the list above, but also an evangelist who will explain to everyone what DS is and what may be its benefits for other departments - the Chief AI Officer / Chief Data Officer / Chief Digital Officer (choose the name yourself). It is important to mention that if you hire one data scientist and throw tasks on him and analyst, and architect, and developer, then you should not expect a quick result, moreover, it can deprive this person of motivation, and the company successful in the future department.
If the company is big and there are many opportunities for the development of Big Data, then Data Architect is also needed, which will set up the architecture, multi-stream data collection and deploy Hadoop or Spark (systems for processing large data arrays), which the data scientists of the company .
Do not forget about internal communications and trainings.
After the pilot project it is necessary to actively develop the team. Companies should organize at least two types of trainings:
For data scientists - these can be workshops on different topics, weekly meetings, hackathons, master classes. Also, you should pay attention to the purchase of online courses for the team (for example, with coursera) and maybe even put it in the KPI. This will help maintain the team at the proper level in a rapidly growing field and improve internal communication.
For project managers and top managers - it can also be workshops in the form of business case analysis or AI strategies of companies, or, for example, basic understanding of machine learning and deep learning technologies (what is possible and what cannot be done, fundamentals technology). This will help management to form expectations from DS.
Also, most likely, even before the creation of a DS department, there are already interested people in the company - these could be developers who have taken some DS courses, or people from business who want to be DS project managers - they should be involved in the department and help them develop . For example, having trained a developer in machine learning methods, you can get a good and motivated specialist who knows the internal structure of the company and is cheaper than the average market scientist data scientist, who also needs time to figure it all out.
External communication is important
This item is often forgotten, but it is no less important than the others. The market for specialists in machine learning is in a great shortage of personnel (in recent years, everything has started to improve, but still), every good data scientist understands his value and rather chooses the company he wants to work in - therefore offering a large salary is no longer enough need to captivate projects. To do this, you should competently build your external communications - work with media, opinion leaders, community, talk about implemented projects, write articles in various thematic publications, speak at conferences, possibly sponsor industrial events like hackathons and so on - this is only a small part of that what to do to attract talent to the company.
That's all, in conclusion, just say that I did not specifically mention the difficulties in the process of work of the Data Science department, but only told what is needed to create it. If you have something to add - welcome to the comments.