The most sought-after skills in data science

Original author: Jeff Hale
  • Transfer
In terms of knowledge, data science experts expect a lot: machine learning, programming, statistics, mathematics, data visualization, communication and deep learning. Each of these areas covers dozens of languages, frameworks, technologies available for study. So how is it better for data professionals to manage their training time budget so that they can be valued by employers?

I carefully studied job sites to find out what skills are most popular with employers right now. I considered both the wider disciplines associated with working with data, as well as specific languages ​​and tools, as part of a separate study. For material, I turned to LinkedIn , Indeed , SimplyHired., Monster and AngelList , as of October 10, 2018. The graph below shows how many data science jobs are represented on each of these resources.



I have studied many job descriptions and surveys to understand which skills are most often mentioned. Terms like “management” were not included in the analysis, as they are used on job sites in a very wide range of diverse contexts.

The search was conducted in the United States based on the terms “data science”, “keyword”. In order to reduce the output, I selected only the exact occurrences. One way or another, a similar method ensured that all results would be relevant to data science and the same criteria would apply to all queries.

AngelList does not give out the total number of vacancies related to working with data, but the total number of companies offering such vacancies. I excluded this site from both studies, since its search algorithm, apparently, works on the basis of the "OR" principle and does not make it possible to somehow switch to the "And" model. You can work with AngelList when you enter something in the spirit of “data scientist” “TensorFlow” - in this case, matching the second query implies matching the first. However, if you use keywords in the spirit of “data scientist” “react.js”, then there will be a lot of vacancies that are not related to data science.

Materials with Glassdoor also had to be excluded. The site claimed that they had information on 26,263 job vacancies in working with data, but in fact a maximum of 900 was displayed. In addition, it seems to me extremely doubtful that they collected more than three times as many vacancies as any other large site.

For the final stage of the study, I selected keywords for which there was a large return on LinkedIn: more than 400 results for broad-profile skills, more than 200 for private technologies. Of course, there were some duplicate offers. I recorded the results of this stage in a Google document .

Then I downloaded the .csv files, uploaded them to JupyterLab, calculated the prevalence of each as a percentage, and averaged the obtained values ​​over different resources. I subsequently compared the results by language with those presented in the study on job openings from the data science sector from Glassdoor in the first half of 2017. If you add to this the information from the survey on the use of KDNuggets, it seems that some skills are gaining popularity, while others are gradually losing value. But more on that later.

In my Kaggle KernelYou will find interactive graphs and additional analysis. For visualization, I used Plotly. In order to work with Plotly and JupyterLab in a bunch, you have to play something, at least that was at the time of this writing - the instructions can be found at the end of my Kaggle Kernel, as well as in the Plotly documentation .

Broad skills


Here's a graph that represents the most popular general skills that employers want candidates to see.



The results show that analytics and machine learning continue to form the basis of the work of data science experts. The main purpose of this specialty is to make useful conclusions based on data arrays. Machine learning aims to create systems that can predict the course of events, respectively, it is in great demand.

Data processing requires knowledge of statistics and the ability to write code - there is nothing to be surprised at. In addition, statistics, mathematics and software engineering are specialties in which training is carried out in universities, which can also affect the frequency of requests.

Interestingly, in the descriptions of almost half of the vacancies, communication is mentioned: data specialists need to be able to convey their findings to people and work in a team.

Mention of AI and deep learning is not as regular as some of the other queries. However, these areas are branches of machine learning. Deep learning is increasingly being used in tasks for which machine learning algorithms were previously used. For example, the best machine learning algorithms for problems that arise when processing a natural language, now relate specifically to the field of deep learning. I believe that in the future it will become more and more popular, and machine learning will gradually begin to be perceived as a synonym for the deep.

What specific software solutions should be mastered by data science experts, according to employers? We turn to this question in the next section.

Technological skills


Below are 20 specific languages, libraries and technological tools with which, in the opinion of employers, data processing specialists should have experience.



Let's walk through the leaders quickly.



Python is the most requested option. The fact that this open source language is extremely popular among programmers, many noted. For beginners, this is a very convenient option: there are many training resources. The vast majority of new data tools are compatible with it. Based on all this, Python can be called the main language for data science experts.



R follows Python by a small margin. Once upon a time, it was he who was the main language for data science specialists. It came as a surprise to me that active interest in him still persists. This language originates in statistics, and, accordingly, is very popular among those who deal with it.

Almost all vacancies make it necessary to know one of these two languages ​​- Python or R.



SQL is also very much in demand. The abbreviation stands for Structured Query Language (Structured Query Language), and it is this language that is the main tool for interacting with relational databases. SQL in the data science community is often neglected, but it refers to skills that you should be fluent in if you plan to enter the labor market.




Next come Hadoop and Spark - both of which are open source tools from Apache, designed to work with big data. Much less tutorials and articles on Medium have been written about them. I assume that the number of applicants who own them is significantly less than those who are familiar with Python or R. If you know how to work with Hadoop and Spark or have the opportunity to master them, this can be a good advantage for you over your competitors.




Next up are Java and SAS . I was surprised that these two languages ​​were able to climb so high. Both are the brainchild of large companies and for both are some amount of free materials. However, among data science experts, neither Java nor SAS are of particular interest.



Next in the ranking of popular technologies is Tableau . It is an analytical platform and visualization tool that is powerful and easy to use. Its popularity is growing steadily. Tableau has a free public version, but if you want to work with data in private mode, you have to fork out. If you are completely new to Tableau, it makes sense to take a short course - say, Tableau 10 AZ on Udemy. They don’t pay me for advertising, I just did this course myself and found it very useful.

On the chart below you can find an extended list of popular languages, frameworks and other tools for working with data.



Historical comparison


The GlassDoor team published a study of the ten most popular skills for data science experts from January to July 2017. On the graph below, their data on the frequency of terms are compared with the average values ​​calculated by me for LinkedIn, Indeed, SimplyHired, and Monster sites.



Overall, the results are similar. Both my research and research from Glassdoor agree that the demand for Python, R and SQL is highest. Tops of skills also coincide in composition within the first nine positions, although the exact order is different.

Judging by the results, in comparison with the first half of 2017, the demand for R, Hadoop, Java, SAS and MatLab decreased, while Tableau, on the contrary, became more popular. This should be expected if you look at least at the results of a survey of developers from KDnuggets. They clearly show that R, Hadoop, Java, and SAS have been on the decline for several years, while Tableau is stable on the rise.

Recommendations


Given these calculations, I would like to offer a number of recommendations for data specialists who have already entered the market or are just getting ready to start a career, and although to increase their competitiveness.

  • Show that you know how to analyze data, and spare no effort to master machine learning properly
  • Pay attention to communication skills. I would advise you to read the book " Made to Stick ", which describes how to give your ideas more weight. Also practice with the Hemmingway Editor app to learn how to articulate your thoughts more clearly.
  • Learn the framework for deep learning. This is gradually becoming an integral part of the learning process of machine learning. In my other article, I compare various frameworks on how useful, interesting and popular they are - you can find it here .
  • If you are hesitating between Python and R, choose Python. If you already know Python as the back of your hand, consider learning about R. This will definitely make you a more attractive candidate on the market.

When an employer is looking for an employee who works with Python, he will likely expect candidates to become familiar with the main libraries for data processing: numpy, pandas, scikit-learn and matplotlib. If you want to master this set, I recommend the following resources:

  • DataCamp and DataQuest   - both there and there you can take a SaaS data science training course online for little money; You will learn right in the process of writing code. Both courses cover a wide range of tools.
  • Data School offers a range of different resources, including a good series of YouTube videos that explain the basic concepts of data science.
  • « The Python and data analysis " McKinney. This is the work of the author of the pandas library; basically it’s about it, but it also touches on the basics of Python, numpy, and scikit-learn in relation to data science.
  • Introduction to machine learning with Python. A Guide for Data Professionals ”by Muller and Guido. Mueller is responsible for supporting scikit-learn. A great book for those who study machine learning in general and this library in particular.

If you want to make a breakthrough in deep learning, I advise you to start with Keras or FastAI , and then go to TensorFlow or PyTorch . Scholl's “ Deep Learning in Python ” is a great help for those learning to work with Keras.

In addition to these recommendations, I think it’s worthwhile to focus on studying what you yourself are interested in, although, of course, you can allocate your time for training based on a variety of considerations.

If you are looking for a job as a data processing specialist on online portals, I advise you to start with LinkedIn - his results are consistently the most extensive. Also, when searching for vacancies or posting resumes on websites, keywords play a very important role. For example, for all considered resources, the query “data science” yields three times more results than the query “data scientist”. On the other hand, if you are only and exclusively interested in data scientist offers, it is better to give preference to this request.

But no matter what resource you choose, I recommend creating an online portfolio that demonstrates your skills in different demanded areas - the more there are, the better. Your LinkedIn profile should ideally contain some evidence of the skills you are talking about.

Perhaps I will present the rest of the research results in other articles. If you want to learn more about code or interactive graphics, I invite you to Kaggle Kernel .

Also popular now: