Normalization of education in the resume on hh.ru



    Education is one of the most important and at the same time undervalued resume fields. Employers pay attention to it primarily when a young specialist is looking for work. Often, it is education that decides in favor of one of the candidates. Finally, it happens that companies are looking for a specialist with a very specific education, right down to the faculty of the desired university.

    Applicants, for their part, indicate education in a resume is not very willing. An abbreviation in the field of education is another good option. Often you can come across just “technical” or “in the name of Lenin”.

    Until recently, “education” on hh.ru was a free text field, which did not allow us to fully search for candidates by this criterion, it was visually easy to read information about education in a resume, and for us it was possible to build statistics useful to the market. Therefore, it is time to help users by creating a directory of universities and normalizing this field.

    About how we solved this problem with 11 million resumes and how users reacted, in this article.


    The big goal was that, firstly, new users select a university from our directory when creating a resume, and secondly, that existing users update their resumes in the same way.
    The base of educational institutions was kindly provided by colleagues from Odnoklassniki. In the course of work on the creation of our directory, we substantially reworked it, but the foundation was already laid, which greatly accelerated our work at the start.

    Step 1. Suggestions when filling out


    First of all, in the form of creating a resume, we added drop-down hints (sajesta) with the correct and full names of universities from our directory. After a month and a half of working with such a scheme, we saw that only 45% of new users choose the university proposed by us, while the rest preferred to leave their version, even if it completely coincided with the proposed one! As a result, we received 200 thousand resumes with a normalized education, but this indicator had to be increased by at least an order of magnitude.



    Step 2. Mapping


    New resumes are good, but in order for the project to make sense and could benefit today, it was necessary to normalize the existing base, which at that time was about 10 million resumes. Therefore, we decided to map (compare) “education”, which was already indicated in free form by users to the new directory of universities. It should be borne in mind that users indicate education in the resume, to put it mildly, very approximately (just the word "higher" is also a very common option).

    For mapping, a classic algorithm for finding the similarity of two texts was used: cosine similarity. Each text is considered as a vector in the space of terms (words, its components). The more times a word occurs in the text, the greater the coordinate the vector has on the corresponding axis. Similarity of 2 texts is nothing but cos between vectors in the space of terms.
    Using this algorithm "head on" gave not very impressive results, so I had to make some corrections.

    1. The coordinates of the vector corresponding to the text can take the values ​​{0, 1} - indeed, several identical words in the name of the educational institution are exotic.
    2. The space of terms had to be made anisotropic: the coordinates along some axes make different contributions to the norm of this space.
    There are frequently used words (for example, “state”, “technical”), which may be omitted or present in the custom spelling of the institution. And they should have less impact on the degree of similarity of the texts. On the contrary, words such as “(named after) Kuibyshev” are more important and make it more likely to establish a correspondence. Thus, when determining the level of similarity, the words that make up the texts are divided into several groups that differ in the degree of importance for the search for correspondence.

    Universities. Heritage
    Renaming universities is another problem that had to be solved. For example, what was once called the “Pedagogical Institute” is now called the “Pedagogical University”. Therefore, when mapping, possible homonymy is taken into account. By the way, in the 90s many cities changed their names, therefore, in the framework of the homonymy “Kalinin Pedagogical Institute”, it’s worth mapping to “Tver Pedagogical University”. Moreover, employers today know, basically, only the modern name of the institution.

    Matching abbreviations
    A separate task was made up of matching abbreviations. Firstly, some educational institutions had the same abbreviations at different times: for example, Samara State University - the former KSU (Kuibyshevsky) and Kursk State University - the real KSU.

    Secondly, educational institutions in different countries often also have the same abbreviations, for example: BSU is Bryansk State University named after I.G. Petrovsky, and Belarusian State University. To resolve such conflicts, it was necessary to take into account information about the cities where educational institutions are located, their population, countries of residence of the resume owners. Great help with mapping was also provided by numerous heuristics used.

    The result of mapping
    a result, we were able to "zamappit" slightly more than half of all higher education in our summary: 6989453 of 12 510 682. After the testing and review, we decided it was time to open the results of users and study their reactions.

    Step 3. Check the university in the resume


    The user cannot quietly change the name of the institution. Few people will like it if on its resume the system will make changes on its own, and there have still been inaccuracies in the directory. Therefore, we created a notification “specify the name of the educational institution in your resume” on the page with feedback on vacancies. Result - less than 10% of users who saw it clicked on this link: they failed to achieve the goal in this way. Probably, the users were sure that with the "education" they were all right and there was nothing to check.



    However, thanks to this notification, we saw, firstly, typical errors, and secondly, a strange pattern: even if we mapped everything correctly, users still returned their version, which, perhaps, is simply more familiar and familiar to them. It was worth considering for the future.

    In general, during the two weeks of the notification, we received another 150 thousand resumes with the right education. In total, for the 2.5 months of the existence of the directory of universities, we had 450 thousand zappa resumes, or about 5% of the entire base. This result was not impressive again, and we continued to draw conclusions and think over the next steps.

    Step 4. How to pick up passive users


    With the help of sagestas and notifications, we have only covered active users who come to the site. In order to reach out to applicants who are not looking for work now, we decided to mail out to a part of the database of registered applicants. In the letter, we wrote that we made some changes to the education from the resume, and they need to be confirmed, but it can also be rejected.


    The logic in the letter was as follows:
    • if the user does not react to this letter, the education in the resume will remain untouched;
    • if the user confirms that we changed the name correctly, then the education in the resume is updated to the current version from our directory;
    • if the user rejects the option we have proposed, he will go to edit his resume, where he can return the original version.




    We unloaded all cases of refusals from our version and based on them we checked the directory once again, making the necessary changes.

    Here it should be noted that the wording on making changes to the resume was not very successful, so on another part of the database we sent letters where we talked about the new directory of universities and suggested users update the name of the university on their own.



    A week after the newsletter, we had 1,000 052 completed resumes with the education from the directory in our database - an essential part, but not all. Therefore, we continued mailing with a proposal to update the university, explaining why this is necessary and what gives applicants. In support of the normalization of universities, we also launched the “Battle of universities” project .to encourage users to update resumes, thereby supporting their university in an impromptu battle. This project, of course, does not pretend to an objective rating of universities, but nevertheless it also made (and continues to make) a certain contribution to the normalization of education.



    Just a few days ago, added options for the names of universities in English (for resumes in English). While not for everyone, we will increase their number.

    As a result, today we have 23% of the CVs in the database with normalized education, which is about 3.3 million. We plan to reach 30% by the end of the year.

    If you have not yet updated your education in a resume, then now is the time to do it .

    If your university is still not in the directory, then write to us about it, and we will add it.

    Step 5. Searching for universities is the first thing that has started


    Due to the fact that in fact a quarter of all resumes now have a normal education, and this share is constantly growing, we have released the first stage of a search for universities. Now the recruiter can find the graduates of a particular educational institution by simply clicking on it in any resume, and with the help of search filters the selection quickly narrows to the desired city, profession, candidate experience, language skills, desired type of employment, and so on. It’s now easier for employers who know exactly what they want, or just picky (as you prefer) to find the right candidates. But this is only the beginning.



    Normalization of education is only part of the normalization project, which also includes the normalization of positions, skills, employers and professional areas.

    If you have ideas and questions about this project - they are always welcome in the comments.

    Also popular now: