HighLoad Cup # 2. Championship for backend-developers back in the ranks
Are you ready for new loads? We invite all amateurs and professionals to the championship on design and administration of high-loaded services HighLoad Cup # 2 !
The beginning of the competition was laid last year. Then we knew that the HighLoad Cup is exactly the championship that was lacking in a number of Mail.Ru Group projects. The first pilot competition was attended by 449 people. There was a lot of code and a lot of sweat from both the organizers and the participants (8789 different solutions). There were nuances in the technical implementation, but the main thing that everyone liked! The organizers spent many nights in the data center, a few weekends - in the office. Ready for this again! At the end of the article you will find useful materials from us and from the participants who will help you understand the mechanics and find some best practice solutions.
This time they tried to prepare for you a little bit more difficult business. In addition, we have expanded the audience, now English-speaking users can take part in the competition. Join the Russian-speaking community in Telegram . There you will get a lot of insights on the competition :)
So, welcome aboard!
Compared to last year, nothing conceptually changed in the competition.
Participants are given the task of creating a small web-service that works with data of a certain structure and implements the API to this data. The container (Docker) with the implemented service is uploaded to us on the servers, where we launch it and begin to fire HTTP requests.
Solutions are sent to us using a locally installed Docker client in a special storage (each has its own). Then the service sent to us is automatically checked by the CodeHub-CodeRunner system, developed by the Mail.Ru Group Technopark Laboratory employees.
Then we start threshing the container on a test machine with an Intel Core i7 processor. The solution will be allocated 4 cores of 2.4 GHz, 2 GB of RAM and 10 GB of hard disk space. In short, the “tank” is launched with the phantom engine, which conducts shelling in several streams with a linearly growing load profile. Before the shelling begins, the user solution has a few minutes (the exact amount depends on the task) to process data from the resulting JSON file. Correct work with this data is a necessary condition for victory. Shelling only two, short and long.
According to the results of such attacks, we calculate the number of correct and incorrect answers, RPS and response speed, and form a rating table for a certain metric. The author of the fastest and fail-safe service will be the winner.
As a result of the shelling, logs and metrics are obtained, which will then be shown to participants in the form of graphs on the decision page. Tracked separately:
- basic metrics;
- correct answer;
- speed of response to the request;
- the number of responses per second.
The rating of the solution is calculated as follows: we take the time of all correct answers that the API managed to give during the shelling, we add the penalty time for each wrong answer or request, the answer to which we could not get (the penalty time is always equal to the total request timeout). The participant, the total time of which will be less than others, is higher in the leaderboard and has a chance to become the winner of the championship.
Our team thought for a long time what task to give this year. They wanted something that would equalize the chances of the majority (so that only self-written bicycles in C / C ++ won).
The wording is as follows:
In an alternate reality, humanity decided to create and launch a global search system for the “second half”. It aims to reduce the number of single people in the world and help build strong families.
Both in test and in “combat” data for different attacks there are records about one entity: Account. It describes all known information about the user - his name, contacts, interests, revealed sympathy for other users. Guaranteed correctness of the data provided in accordance with the following types and restrictions. All data was generated and invented by us according to certain laws.
The following personal data is contained in one Account (Profile):
- id is a unique external user id. It is installed by the testing system and then used to verify server responses. Type is a 32-bit integer.
- email - user email address. Type - unicode string up to 100 characters. Uniqueness guaranteed.
- fname and sname are the first and last names, respectively. Type - unicode strings up to 50 characters. Fields are optional and may not exist in a specific entry.
- phone - mobile phone number. Type - unicode string up to 16 characters. The field is optional, but uniqueness is guaranteed for the specified values. Filled quite rarely.
- sex is a unicode string, “m” means male, and “f” is female.
- birth is the birth date recorded as the number of seconds from the beginning of the UNIX epoch UTC (in other words, this is the timestamp). Limited to the bottom 01/01/1950, from the top 01/01/2005.
- country - country of residence. Type - unicode string up to 50 characters. Field optional.
- city - city of residence. Type - unicode string up to 50 characters. The field is optional and rarely indicated. Each city is located in a particular country.
Also in one Account there are fields specific to the search system of the “second half”:
- joined - date of registration in the system. Type - timestamp with restrictions: below 01/01/2011, above 01/01/2018.
- status - the current status of the user in the system. Type - one line of the following options: "free", "busy", "everything is difficult." Do not pay attention to the strange endings :)
- interests - the interests of the user in everyday life. The type is an array of unicode strings, possibly empty. Lines do not exceed 100 characters in length.
- premium - the beginning and end of the bonus period in the system (when users really wanted to find a “soul mate” and they paid for the service). In JSON, this field is represented by an embedded object with the fields start and finish, where timestamps with the lower bound are written 01/01/2018.
- likes - an array of known user likes , possibly empty. All sympathies are at odds with each one and is an object of the following fields:
- id - identifier of another account to which the user has sympathy. Account can always be found in the source data on id. Please note that there may be several likes with the same id in the data.
- ts is the time, that is, the timestamp, when sympathy was recorded in the system.
Need to implement an API.
- Getting a list of users: / accounts / filter /
This API method is planned to be used to search for users by previously known or desired fields. For example, someone wanted to see all people of a certain age and sex living in a particular city.
- The division of users into groups: / accounts / group /
This API method is planned to be used to create reports on the work of the system. The fields by which the grouping is performed are transferred in the GET parameter of the keys, separated by commas. They are not as numerous as in the user filtering request. There are only five fields for grouping - sex, status, interests, country, city.
- Compatibility recommendations: / accessories / id / recommend /
This query is used to search for the "second half" of the specified user data. The request passes the user id for which those who are best suited by status, age and interests are searched. The decision should check compatibility only with the opposite sex (we are not against sexual minorities and condemn discrimination, it just happened :)). If a country or city with the keys country and city, respectively, is transmitted in a GET request, then you need to search only among those living in the specified location.
- Selection for similar sympathies: / accessories / id / suggest /
This type of query is similar to the previous one in that it is also about searching for the “second half”. The user id for which we are looking for the second half is also sent, the GET parameter limit is used. Differences in implementation: we are looking for people who like the same sex with similar “likes” and offer those whom they recently liked themselves. If a country or city GET parameter is passed in the request, then you should look for “similar sympathies” only in a certain location.
To tell everything in one article is not possible. Detailed rules will be published on the day of launch (today) on the Championship website and in the GitHub repository , but you already know what awaits you.
Yes, we know that the holidays (happy), so the championship will be very long :)
- Beta testing (results are not counted): start on December 13 at 19:00, end on December 21 at 19:00.
- Qualifying round: from December 21, 19:00 to January 31, 19:00.
- Final round: until February 5th.
During beta testing, the terms and conditions of the problem may change (if there are bugs and for other reasons).
Qualifying round - the rules do not change.
The final round takes place fully automatically, but the finalists (N users who have passed the qualifying round and not less than 50 people) choose a solution that will be shot at several waves. The result is formed by the best result for all the waves.
First place - a brand new MacBook Air.
Second and third place - Apple iPad.
Fourth, fifth and sixth places - Samsung Gear S3.
The participant has the right to ask for another gift of equivalent value. All participants who qualify for the final will receive brand T-shirts from our championship.
If you go to our Telegram chat room , you are unlikely to leave. We are waiting for you and good luck!
This article doesn’t address system upgrade issues. We have done a lot of work on eliminating infrastructure bugs, reviewed all the issues from the participants in GitHub, have already implemented something and put it on the TODO list for the next year. I want to express my deep gratitude to Maxim @ xammi- Kislenko, Ilya @liofz Lebedev, Yevgeny @gunicorn Ivanov, Irina @aithelle Lukyanova, Vasily @vasidmi Dmitriev and the whole team that participated in the realization of the competition, including the entire championship community. Thank!
Useful literature on the results of HighLoad Cup 2017