Machine Learning vs. analytical approach
Some time ago, we found our old materials on which we taught the first streams in our machine learning courses at the Data School and compared them with the current ones. We were surprised how much we added and changed in 5 years of training. Realizing why we did this and how, in fact, the approach to solving the problems of Data Science has changed, we decided to write this publication.
We started training with the basic methods and algorithms of machine learning, told how to put them into practice, how to select parameters, how to clean and prepare data, how to measure quality. We believed (and still believe) that the training of a full-fledged agent-Scientist should include not only classical machine learning methods, but also graph analysis methods (social networks, SNA), text analysis, work with neural networks and big data (Big Data).
Thus, at the output we got an expert in a wide field of Data Science, capable of applying an extensive arsenal of methods in practice. We took the same specialists to our business. First, in the company where we worked and directed the relevant areas, and then in our business for the development of products based on machine learning -Data Studio .
But later we realized that this is not only not enough for the successful implementation of Data Science projects, but that this is not even the main thing.
The approach at the beginning of the practice of Data Science and, to be honest, for many analysts so far, is as follows: give me the data, I will clear it, make a feature vector, divide it into training and test samples, run several ML algorithms, and here is the result.
Does this approach have a right to life?
Yes, it does, but where the subject area is already well studied and there is already a good accumulated experience in applying analytics. Examples? Bank scoring, outflow from operators, cross-selling (Next Best Offer) in retail, banks, telecoms, forecasting the effectiveness of stocks in retail, forecasting balances. This list goes on.
Now let's imagine other areas: forecast of arrival time in multimodal transportation (ship, train, truck): what signs will you use? Type of cargo, cargo weight, the presence of certain sorting nodes? And if you think about it? Maybe some more simple and obvious signs (even without machine learning models) will give you significant accuracy?
Or you need to predict the sensitivity of large customers to changes in prices for certain products. How to determine elasticity? What exactly will you predict?
But is it necessary to build a model if the production process is later changed anyway?
It turns out that you need to be able to work in new subject areas of application of analytics, since in well-studied areas, there are already so many developments and this is the “red ocean”.
What does it take to go into new areas with analytics?
To do this, you need to be able to deeply understand the subject area of a particular process, descriptions of which are often not available. Understand what kind of data is generally needed, understand what exactly the business is done on. Do you need to understand analytics here at all, do you need some predictive algorithms, do you need to change the business process, are there operational levers (what is the point of predicting equipment shutdown if there are still no ways to avoid it?).
To summarize, the following things are required:
- Analytical approach, ability to formulate and test hypotheses
- Understanding the principles and features of the business and individual processes
- Understanding Process Economics
- Understanding of technology
- Ability to bind data with business processes
And, if you tear yourself away from machine learning, what area is best able to do this? Correct - management consulting. And where is this taught using the so-called case-method (many examples from different business situations) - right, at MBA courses (master of business administration).
Thus, it turns out that the ideal Data Scientist is an MBA graduate with experience in consulting, who has completed machine learning courses.
This, of course, is overkill, but it’s true that among the contractors, those with the highest level of processes and standards, at the level of staff selection and training have developed a culture of analytical thinking. We adhere to the same approach in our Data Studio . And, logically, we laid the same approach in our training at the School of Data .
You can object. After all, what was written above is more applicable in consulting, where each time you don’t know in advance what subject area the project will be from. And what about large companies where the area is outlined in principle?
In companies, we observe all the same specifics described above, and the need for an analyst and the whole team to understand the business, the need for responsibility for the final result.
For this reason, in large companies, we are now seeing a trend in the specialization of Data Science divisions and the shift of the analytics function from a centralized division, one for the entire company, to a business function, that is, closer to business. With this specialization, the ability of an analyst to quickly understand a new business and offer realistic solutions, rather than models, is a competitive advantage.
What exactly has changed in our curriculum? Before all of us, we taught on the basis of practical cases. The structure and nature of cases has changed. Previously, our cases were like tasks on Kaggle: here is the task, here is the target variable, here is the quality metric, here are the data.
Now the task sounds different: here is the task in terms of the client, here is a description of the client process. Formulate the analytics task, propose a quality metric, evaluate the appropriateness of using analytics, calculate the economic effect, suggest methods, formulate a request for the data you need. And then everything is as usual: clean the data, build a model, etc. And we give such examples from completely different areas, fortunately, the presence of our own consulting in this area greatly expands the range of available tasks that we solved on our own experience.
But the discipline of the analytical approach is not only the practice of cases. We also teach the standard frameworks (basic analysis patterns) used in consulting. We also added to the training the development process of the analytical product that we adhere to in the classroom, from business analysis to the presentation of the results to the customer and planning the deployment of a productive solution including the stages, roles, key decision points and moments of interaction with the customer.
We give a separate role to presentations - too often we have seen a gap between the thoughts of analysts and the perception of these thoughts by the customer’s employees.
In general, we believe that the task of training a Data Scientist is not how to prepare a specialist for existing areas (there are already many courses for this and this has become commodity in many ways), but to prepare an expert researcher for work in new areas where Digitalization is just coming.
Well, and, as usual - the beginning of a new course at our School of Data on September 16th. We accept orders for new projects at Data Studio all the time, just like we recruit employees (see the section on open vacancies).
PS We updated our site a bit to make it more convenient. Therefore, do not be surprised at the new look.