HeadHunter Job Analysis
Once I wondered what if I tried to analyze the vacancies and compose some tops on them. Find out who pays the most, who is most in demand and much more.
I used the well-known HeadHunter as a data source. Were collected and processed jobs for May of this year. Only for a month, because the API does not allow getting more.
Data collection
The HeadHunter API has excellent documentation that is located in the repository . Requests should be made to the https://api.hh.ru/ domain with the set User-Agent
, preferably of the form название_приложения/версия_приложения (емейл_для_связи)
(other options sometimes work User-Agent
, but if the server doesn't like something, it will return an error).
The logic of the collection is very simple, so I implemented it in bash using cURL and jq . However, I want to share a few nuances.
Pagination
Endpoint is available to search for vacancies by various parameters GET /vacancies
.
curl -A 'irenica (https://irenica.com/)''https://api.hh.ru/vacancies'
The search results will be divided into pages, the size of which is the parameter per_page
(20 by default and 100 maximum). You can select a specific page by specifying a parameter page
(the numbering starts from 0).
In the field of pages
service information returned with vacancies, the total number of pages of the result will be indicated.
With this, you can easily search through all pages:
declare -i i=0
whiletrue; dodeclare url="https://api.hh.ru/vacancies?per_page=100&page=$i"declare page="$(curl -A 'irenica (https://irenica.com/)' "$url")"# обрабатываем $page
((i++))
declare -i totalCount=$(echo"$page" | jq '.pages')
if ((i >= totalCount)); thenbreakfidone
Full job details
However, the search results contain only part of the job data. To get everything, you need to make a separate request for the endpoint view GET /vacancies/id_вакансии
.
Partial data on vacancies are in the items
search results field . At first we will collect from them vacancy IDs:
declare vacanciesIds="$(echo "$page" | jq -r '.items[].id')"
Then we will request complete information about the relevant vacancies separately:
for vacancyId in$vacanciesIds; dodeclare url="https://api.hh.ru/vacancies/$vacancyId"declare vacancy="$(curl -A 'irenica (https://irenica.com/)' "$url")"# обрабатываем $vacancydone
Search limit bypass
The HeadHunter API has one feature - no matter how many are found, a maximum of 2000 will be returned. At the same time, the actual amount found will also be returned to the found
search results field . Thanks to this, it is possible to know for sure whether you received all the requested data, or if there are losses.
To get around this limitation, I came up with the following. When searching, you can specify the length of time when vacancies of interest were published (through parameters date_from
and date_to
that take the date in ISO 8601 format). You can take a small interval and sort through all the results with such pieces: the smaller the time interval, the less vacancies were published for it.
It is worth paying attention that the vacancies published only for the last month are returned. Therefore, it makes no sense to set the range anymore.
To iterate over time intervals, the latter is best represented as Unix time:
declare -i startTime=$(date -d '-1 month' +%s)
declare -i endTime=$(date -d now +%s)
while ((startTime <= endTime)); dodeclare -i intervalEnd=$((startTime + 60*60))
declare startTimeIso="$(date -d @$startTime +%FT%T)"declare intervalEndIso="$(date -d @$intervalEnd +%FT%T)"# ...declare url="https://api.hh.ru/vacancies?per_page=100&page=$i&date_from=$startTimeIso&date_to=$intervalEndIso"# ...
startTime=$intervalEnddone
Payroll processing
To collect statistics, it was necessary to group vacancies on certain grounds. At bash, doing this was already problematic, so I used Python.
The logic of the collection is nothing special - the accumulation of data in the associative array, sorting and output to CSV. However, again a few nuances.
Salary fork
It should be noted that the salary is presented in the form of two numbers - the minimum and maximum, and any of them may be absent.
Since for analysis it was necessary to have one number, I decided to use the lower limit, and only if it is absent, the upper one.
salary = Noneif vacancy['salary']:
if vacancy['salary']['to']:
salary = vacancy['salary']['to']
if vacancy['salary']['from']:
salary = vacancy['salary']['from']
Exchange rates
Salary in a job can be specified in different currencies, and they - have a different rate. The HeadHunter API has an endpoint GET /dictionaries
containing all the necessary predefined values. Exchange rates are presented in the field currency
. For convenience, it would be better to put their list in an associative array, where the key is the alphabetic currency code:
currencies = {}
dictionaries = requests.get('https://api.hh.ru/dictionaries').json()
for currency in dictionaries['currency']:
currencies[currency['code']] = currency['rate']
Now, during processing, it will be easy to convert all salaries into one currency:
salary /= currencies[vacancy['salary']['currency']]
NDFL accounting
In some vacancies the salary is indicated before the payment of personal income tax, in some - after. A specific variant is indicated by a field gross
: it is equal true
in the first case and false
- in the second.
I decided to transfer all salaries to the option after tax:
if vacancy['salary']['gross']:
salary -= salary * 0.13
Results analysis
Now is the time to show the numbers.
Remote work
Probably many of those who read this post, would like to work on the remote. But as we see, work from home in our country is not very much quoted yet. Salary is much lower, the number of vacancies is significantly less. And therefore there is less opportunity to choose for the applicant.
And this is quite strange, because in many professions and many firms (by the specifics of the tasks), the presence of a person in the office is completely unnecessary. But this is an eternal argument.
Name | Salary, average | Salary, minimum | Salary, maximum | Number |
---|---|---|---|---|
Domestic staff | 112536 | 10977 | 130000 | nineteen |
Information technology, Internet, telecom | 55225 | 1000 | 300,000 | 2828 |
Top management | 47687 | 9474 | 100,000 | 23 |
Extraction of raw materials | 46579 | 20,000 | 90898 | 80 |
Installation and Service | 45439 | 11874 | 69600 | 9 |
Public service, non-profit organizations | 44911 | 20,000 | 90000 | nineteen |
Working staff | 44218 | 9499 | 67860 | 37 |
Production | 42388 | 2372 | 100,000 | 236 |
Construction, real estate | 39896 | 70 | 110000 | 329 |
Transport, logistics | 37662 | 9490 | 100,000 | 223 |
Applicants with disabilities
However, there is an even smaller category of vacancies - for people with disabilities. And this is completely illogical - if employers do not want remote workers, but of those who are ready for this, why are there so few who think about people with disabilities? If you do not care that a person is in three time zones, what difference does it make to you whether he is able to walk, for example?
Perhaps many of you are familiar with people with disabilities. I, too, and I wondered how difficult it is for them to find a job, and what they can count on.
Name | Salary, average | Salary, minimum | Salary, maximum | Number |
---|---|---|---|---|
Public service, non-profit organizations | 69675 | 8700 | 90000 | eight |
Top management | 48705 | 30,000 | 82425 | 15 |
Information technology, Internet, telecom | 45321 | 4350 | 200,000 | 1050 |
Science education | 45056 | 3158 | 90000 | 376 |
Purchases | 43591 | 15,000 | 80,000 | 9 |
Construction, real estate | 42148 | 22 | 250,000 | 210 |
Production | 40969 | 10,000 | 130500 | 189 |
Accounting, management accounting, finance companies | 36387 | 2610 | 113100 | 125 |
Lawyers | 34308 | 2610 | 160,000 | 131 |
Security | 33414 | 22 | 90000 | 178 |
Students
We all start with something, namely, with a job search, without any experience. I decided to assess the situation with positions open to such candidates.
The number of vacancies is encouraging for quick employment. And I do not know how realistic it is to get the maximum salary, but you can even somehow live by the average figures.
Name | Salary, average | Salary, minimum | Salary, maximum | Number |
---|---|---|---|---|
Counseling | 62601 | 1500 | 221850 | 2504 |
Construction, real estate | 55855 | 20 | 949989 | 6455 |
Top management | 50826 | 11310 | 400,000 | 111 |
Extraction of raw materials | 38192 | 8,000 | 100,000 | 328 |
Security | 34617 | 3954 | 100,000 | 5844 |
Medicine, Pharma | 34475 | 450 | 200,000 | 11776 |
Transport, logistics | 33600 | 500 | 150000 | 8,000 |
Science education | 31426 | 1100 | 124510 | 1660 |
Sales | 30444 | one | 350000 | 52566 |
Installation and Service | 30360 | 8264 | 80,000 | 381 |
Common top
And now the most interesting thing: who pays the most? Sorted all vacancies found without any filters.
Of course, this is top management. Who would doubt that.
A curious fact: if you pay attention to the average salary in all tables, you can see that it is not that different.
Name | Salary, average | Salary, minimum | Salary, maximum | Number |
---|---|---|---|---|
Top management | 78789 | 150 | 2,000,000 | 2408 |
Extraction of raw materials | 61699 | 8,000 | 180000 | 2302 |
Counseling | 59797 | 1500 | 500,000 | 3762 |
Information technology, Internet, telecom | 52777 | 26 | 684804 | 25900 |
Construction, real estate | 48587 | 20 | 949989 | 33229 |
Production | 42007 | one | 261,000 | 27269 |
Working staff | 41203 | 25 | 200,000 | 43079 |
Car business | 38555 | 20 | 824254 | 9269 |
Installation and Service | 38412 | 25 | 180000 | 2390 |
Purchases | 37846 | 50 | 261,000 | 2658 |
Cleaning woman
And here is the easiest way: why study for 5 years, if you can just wash the office? Below is the result of filtering the top vacancies for the query "cleaning *".
What if you get a job in several offices and come in the evening for a couple of hours for cleaning? So you can live quite luxurious. We will consider it life hacking.
Name | Salary, average | Salary, minimum | Salary, maximum | Number |
---|---|---|---|---|
Top management | 63000 | 40,000 | 87000 | eight |
Marketing, Advertising, PR | 50,000 | 50,000 | 50,000 | 6 |
Extraction of raw materials | 45,000 | 45,000 | 45,000 | 3 |
HR management, training | 33246 | 7908 | 87000 | 58 |
Accounting, management accounting, finance companies | 32,000 | 30,000 | 35,000 | ten |
Security | 31507 | 20,000 | 70,000 | 6 |
Sales | 29696 | 4737 | 55,000 | 159 |
Construction, real estate | 29024 | 413 | 80,000 | 73 |
Transport, logistics | 24987 | 10990 | 45,000 | 26 |
Car business | 24465 | 7124 | 45,000 | 61 |
Top by city
Finally, I decided to check the number of open positions by city. The first places are not surprising, but then there are curious and even unexpected positions.
Name | Number |
---|---|
Moscow | 31137 |
St. Petersburg | 11745 |
Minsk | 7608 |
Almaty | 4386 |
Kiev | 3398 |
Yekaterinburg | 3182 |
Novosibirsk | 3097 |
Kazan | 3066 |
Ufa | 2980 |
Nizhny Novgorod | 2876 |
Repository
All code from the article, with improvements and instructions, is available in the repository .