Analysis of the final tasks of Yandex.Algorithm 2014

    On August 1, the Yandex office, which recently opened in Berlin , hosted the finals of our programming championship. And his winner again became known to everyone who is interested in sports programming, Gennady Korotkevich .

    Tasks for the Algorithm were prepared by an international team. It included programmers from Russia, Belarus, Poland and the USA. These are the specialists of Moscow State University named after M.V. Lomonosov, Carnegie Mellon University, Yandex and Google employees. In Yandex, the tasks were the developers of the Minsk and Kiev office, and then tested them on their colleagues. One of the compilers last year was himself a finalist of the Algorithm. Especially for Habrahabr, we sorted out all the problems with the authors. By the way, although the competition is over, you can try yourself invirtual contest . Many finalists claimed to win. Among them were winners and prize winners of ACM ICPC and TopCoder Open, developers of Google and Facebook. In the final round, the winners of the 2013 Algorithm Evgeny Kapun and Shi Bisun, ACM ICPC champion Mikhail Kever , and one of the most titled sports programmers in the world Peter Mitrichev fought . This year, Makoto Soejimo , the compiler of tasks for Algorithm 2013 and the administrator of TopCoder Open, also decided to compete for the prize . The fight for first place flared up between Gennady Korotkevich and Hosaka Kazuhiro





    from the University of Tokyo. The best result - four tasks at 66 minutes of the penalty time - showed Korotkevich, confirming the title of champion. Kazuhiro solved the same number of problems, but scored more than a penalty time (90 minutes) and took second place. Third place went to Wang Qinshi from Tsinghua University: he solved four problems with a 125-minute penalty.

    Task A. 1024 Stack Edition


    Time limit: 1 second
    Memory limit: 512 mebibytes

    Recently, the girl Ksyusha bought the application "1024 Stack Edition" in the online store - a step-by-step arcade. As the name implies, the game takes place around a stack filled with powers of two. According to the rules of the game, at each turn the player can perform one of three actions on the stack:

    • Add a random number to the end of the stack.
    • Remove the last number from the stack.
    • If the last two numbers are equal, then replace them with their sum.

    If a player chooses to add a random number to the stack, then with a chance p , the number 1 will be added to the stack and with a chance of 1- p, the number 2 will be added. If the player chooses to remove the last number from the stack, he pays a fine of 1 coin. Please note that all other operations, except deleting the last item from the stack, do not result in a fine. In order to win, the player needs to bring the stack to such a state that there is only one element in it - 1024. Moreover, the less the penalty he pays, the better.

    Once Ksyusha left her phone unattended, and upon returning found that someone was playing the “1024 Stack Edition”. Ksyusha is not interested in knowing who it was and what his motives were. She is interested to know what is the mathematical expectation of a fine, which she personally will have to pay in the best game if she plays the game. Your task, as you might have guessed, is to calculate this number.

    Analysis of Problem A

    First, consider the case when N = 0 . Let f (i) be the expected minimum number of coins that would have to be spent to collect the number 2 i . We give formulas for calculating f (i) in the case if p <100. If, however, p = 100, then these same formulas are adapted in a trivial way.

    1. f (0) = 1.0 / p - 1;
    2. f (1) = min ( f (0) • 2; 1 / (1 - p ) - 1, 1 - p);
    3. f (i) = 2 • f ( i - 1), i> 1.

    In the case of N = 0, the answer to the problem is f (10).
    Now suppose that N > 0. Denote by g (i, j) the expected minimum number of coins that we will need to spend in order to get only one number 2 j from the last (top) i elements of the stack . Counting g (i, *) for a fixed i will occur in two stages (in S (i) denote the binary logarithm of i -th from the top stack element): 1. Calculate the g (i, S (i)) = 1 + min (g (i - 1; *)), i> 1 . This case corresponds to the fact that from everything above element i we get a certain number, after which we delete it.



    We also calculate g (i, S (i) + 1) = g (i - 1, S (i)) . This case corresponds to the fact that from all the elements located above i , we collect the number 2 S (i) , and then combine it with the current element, which is also equal to 2 S (i) . For all j other than S (i) and S (i) + 1 , put g (i, j) = + inf.

    2. At the first stage, we “figured out” the number S (i) . Before proceeding to g (i + 1, *) and the number S (i + 1) , we can perform some actions with i-м элементом стека, превратив его в любой другой элемент. Чтобы это учесть в динамике, пересчитаем g(i, *) следующим образом:

    g(i, 0) = min(g(i, 0), min(g(i, *)) + 1.0/p) — мы либо используем уже найденное значение g(i, 0), либо берем любое число, удаляем его и ставим туда 1 = 20.
    g(i,j) = min(g(i,j),min(g(i,*)) + 1 + f(j),g(i,j − 1) + f(j − 1)), j > 0. Здесь мы либо удаляем стоявшее на месте i число и затем за f(j) монет ставим туда число 2j, либо ставим за g(i, j − 1) шагов число 2j−1, собираем за f(j − 1) шагов еще 2j − 1 and combine them.

    The answer to the problem is g (N, 10) . It should be noted that g (*, 11) should also be considered in intermediate calculations. The complexity of the solution is O (NR) , where R is the maximum power of two that is present in the input (in this case, R ≤ 10 ).



    Problem B. Line task


    Time limit: 4 seconds
    Memory limit: 512 mebibytes

    Searches - what could be better? A field for entering a question, a “find” button - and the entire Internet is built into an ordered list of answers. Millions of search queries are saved every day in order to respond to them faster and better in the future, but this is not enough!

    It is important to help the user quickly enter the search query itself. For example, you can prompt popular or interesting to the user query options, ordering them according to preferences. But what if we don’t know anything about who is asking the question? You are tasked with developing an algorithm that will offer two clues in lexicographic order. Well, almost lexicographical.

    It is known that users can make typos when entering a request. We say that the line A is almost less than string Bed and , if a line A can be replaced by no more than one letter to an arbitrary small letters of the alphabet so that line A was the lexicographically less than string Bed and .

    You are given N hint words. Count the number of word pairs, the first of which is almost less than the second. The number of pairs must be calculated taking into account the order: if two words are mutually almost smaller, then consider both pairs.

    Parsing Problem B

    Consider a couple of lines s and w . String salmost less than the string w if and only if the minimum lexicographically string s min that can be obtained from s by replacing one letter is strictly less than w .

    However, as is easy to see, s min is obtained from the string s by replacing the first character not equal to 'a' with the character 'a'. If the string s consists only of the characters 'a', then s = s min . Consider the row s i and find C i - the number of rows s j (possibly i = j ) such that s min i <s j. Then we need to add C i - 1 to the answer if s min i ≠ s i , and C i otherwise.

    To find the values ​​of C i you can use two methods:
    • sort the rows s i and find C i with a binary search;
    • add all the lines to the trie , and find C i by going down it.


    Problem C. Cube


    Time limit: 2 seconds
    Memory limit: 512 mebibytes A

    cube is given whose faces are numbered with various integers from 1 to 6. Hexamine is also given, all cells of which are numbered with various integers from 1 to 6.

    Determine whether it is possible to make a cube that can be rotated so that the numbering of the faces coincides with the numbering of the faces of the given cube, if you can only bend hexamino along the borders of the cells, but not cut it.

    Parse task C

    The main thing to note in this task is that hexamino can be bent in different directions. In this case, two cubes are obtained, differing from each other only by the mutual permutation of one pair of opposite digits (since a 180 degree rotation permutes two pairs of opposite digits, then the permutation of all three pairs is equivalent to the permutation of one pair).

    Further, there are several solution methods. One of them consists of the following steps:

    1. Set the cube with the bottom side on the corresponding hexamino cell.
    2. For each of the four turns of the cube around its axis, we begin to roll it along hexamino until it is possible to roll the cube so that the bottom face hits the cell with the corresponding number.
    3. If for any of the turns all 6 cells are visited, then the answer is “Yes”.
    4. If not, then swap the numbers on the right and left sides and repeat the procedure from step 2.
    5. If in this case all 6 cells are not visited during any of the turns, the answer is “No”.

    It is also possible to calculate all 11 scans and then check the hexamino match (turning it to the standard position), and if it matches, match the pairs of numbers used to number the opposite sides of the cube with the same pairs in the scan.



    Task D. UFO


    Time limit: 1 second
    Memory limit: 256 mebibytes

    At night, a UFO fell into a lake located near the Vasiny House. A UFO was a frame of an N- dimensional rectangular parallelepiped assembled from titanium ribs, all the ribs having integer lengths. At the time of the crash, the joints between the ribs collapsed, in contrast to the ribs themselves, which were at the bottom and on the shore of the lake.

    In the morning, Vasya came to the shore of the lake and found K titanium rods. Assuming that these rods are the edges of the UFO skeleton and that perhaps some of the ribs are still under water, determine the smallest possible dimension of the space from which the UFO arrived.

    Solution to Problem D

    This is the easiest recruitment task. Obviously, the dimension of the parallelepiped is not less than the number of different lengths of edges. We also note that in a K- dimensional parallelepiped 2 K-1 edges of the same type, that is, if there are X identical edges, they will use [(X - 1) / 2 k − 1 ] + 1 measurement.

    Therefore, in ascending order, we sort through all the dimensions until the total number of measurements used calculated by the above formula is less than or equal to the current dimension. Since at K = 21 the edges of one type will be 2 20 > 10 6 , the search will be small.

    Task E. Incomplete matching


    Time limit: 1 second
    Memory limit: 512 mebibytes

    Artyomka is very fond of matching. A matching in an undirected graph is a set of pairwise non-adjacent edges. A matching is called incomplete if it is impossible to add one more edge to the matching without deleting any of the edges already included.

    An undirected graph is called a bipartite if its vertices can be divided into two sets so that any edge of the graph connects vertices from different sets.

    Artyomka is given a bipartite graph. Its task is to find the number of incomplete matching in it modulo 1 000 000 007 (10 9 + 7). Your task is to help him.

    Solution to Problem E

    To solve the problem, we note that the number of vertices in the left lobe is small.

    We will consider the following dynamics, adding one vertex from the right lobe: f (i, j) - the number of ways to arrange the edges when we examined the first i vertices of the right lobe, and the vertices of the left lobe are defined by the ternary mask j as follows: 0 - vertex could not be connected by an edge to any vertex from the right lobe, 1 - the vertex could be connected by an edge to any vertex from the right lobe, but we did not put this edge, 2 - the vertex is connected by an edge to one of the vertices of the right lobe.

    Thus, considering the next vertex from the right lobe, we can recalculate the mask for the vertices of the left lobe. Obviously, when calculating the answer, you need to consider masks without 1.

    Thus, the answer is the sum f (n 2 , ValidMask) , for all ValidMask ternary masks that do not contain units in the ternary record.



    Task F. The Music World


    Time limit: 1 second
    Memory limit: 512 mebibytes

    In the updated Yandex.Music, musical recommendations are even better than before! They take into account each track that the user listens to from the very beginning of using the service, and depending on the ones they have already listened to, increase the pool of recommendations. The player includes the user a random one of the recommended tracks, and so the whole day.

    But Yandex servers are so fast that they can make recommendations not only for the track that the user is currently listening to, but for all options for the following tracks that are randomly selected.

    The user selects the first track, and then the algorithm automatically builds recommendations for N songs ahead. From each song i-th generation can get no more than Ki (1 ≤ Ki ≤ 5) new songs in the ( i + 1) -th generation, and for the i- th generation the probability pi , j of the fact that based on the song of the i-th generation will be predicted j new songs for every j from 0 to Ki .

    Since the experimental study was conducted on the indie genre, all the recommended tracks are different. In order to measure the variety of collecting statistics on the similarity of songs, similarity is measured for each pair of tracks. To help Yandex developers estimate the memory costs for this measurement, calculate the mathematical expectation of the number of pairs of tracks in the recommendation pool in the ( N + 1) -th generation.

    Solution of Problem F

    Let Y i denote a random variable equal to the number of songs in the i- th generation. The proposed solution to the problem is based on the concept of a generating function. If you are not familiar with it, you can find it, for example, in the Wikipedia article ru.wikipedia.org/wiki/Property_function_of_sequences .

    Consider the generating functions of the random variables specified in the input , as
    well as the generating functions of the random variables .

    We have an explicit form of all ψ i , as well as an explicit form of φ 1 (x) = x . We learn to express φ i +1 in terms of ψ i and φ i.

    If we knew for sure that Y i = k , then we could say that φ i + 1 (x) = ψ i (x) k . But since Y i is a random variable, the desired generating function is the sum of the generating functions corresponding to different values, with weights equal to the probabilities of these values, that is .

    Since the maximum value of the desired quantity is very large, it is not possible to calculate the explicit form of the generating function for Y n + 1 . However, we do not need him. It is easy to see that the value to be calculated


    We learn to recalculate the value, the first and second derivatives of the function φ i + 1 (x) in (1) through the previous functions. I recall that φ i (1) = ψ i (1) = 1. As we know, φ i + 1 (x) = φ ii (x)) . Then

    With the substitution x = 1, these formulas simplify and take the form


    It is also easy to understand that calculations can be carried out immediately modulo 10 9 + 7, replacing the original probabilities with the corresponding modulo values.

    Also popular now: