AlphaGo Zero AI platform honed the skill of playing go without human intervention

    DeepMind, a division of Alphabet Holding, continues to work on improving artificial intelligence. It was the DeepMind specialists who created the world champion in the game of go - the AlphaGo platform. She managed to beat several world champions in go, after which it became clear that a person would never be able to defeat a car.

    DeepMind recently announced the emergence of an even more powerful computer go system that can play go better than all previous versions of AlphaGo. The novelty was called AlphaGo Zero. This platform has learned to play go without learning in human-played games on its own.

    AlphaGo Zero’s “knowledge base” contains go rules and nothing more. Nevertheless, the program improves very quickly, playing with itself. The developers claim that Zero mastered the rules of the game in just a few hours. After three days of self-training, AlphaGo Zero defeated AlphaGo Lee, the AI ​​version that defeated Lee Sedol with a 4: 1 score in 2016.

    After 21 days, the system played already at the AlphaGo Master level - a version of the platform that this year defeated the best world players in go from the top 60 list, including world champion Ke Jie in all three games.

    After 40 days of training in games against herself, Zero coped with all her ancestors without much difficulty. AlphaGo Master defeated the system that won against Lee Sedol with a score of 100: 0. As you learn, the system created a “tree” of possible moves, evaluating the consequences of each.

    The developers gave the new system only basic information about the rules of the game. The database did not include information about the games of the champions. The system learned everything by itself, playing with its copy millions of times. It took about 0.4 seconds for one move. If a person wanted to go through the same number of parties, then it would take him several thousand years. After each new batch, weights in the neural network and other components were updated. Interestingly, AlphaGo Zero has only one layer of the neural network, and not two, as in previous versions.

    The creators of the system argue that one should not be afraid of the power of AI in this case. The specialists who created this system claim that her style of playing the go is similar to the style of some masters, but this is only at the very beginning. When the battle reaches about the middle, then specialists usually do not see any particular strategy - it seems that the system is acting randomly. But in fact, this is not so - all moves are carefully planned and aimed at winning.

    Google first talked about AlphaGo in 2015. The system operates using two neural networks. The first one calculated the possibility of making certain moves, the second one estimated the position of the stone on the board during the game. Initially, the system was trained on the example of parties by a human player. In addition to neural networks, AlphaGo had the same search in the probability tree using the Monte Carlo method - a technology often found in good computer systems. In this case, the machine selects the best move by analyzing the various moves. Over time, AlphaGo developers have added new features, using reinforced learning. In this case, the system is trained without using a training sample of batches.

    Seven-time European champion Alexander Dinerstein shared his opinion on the new system with us (3 professional dan, 7 dan EGF).

    The machine learned go completely independently. Previous versions of AlphaGo, for mastering the rules, first drove a set of batches of human players and only then played against copies of themselves to hone the game. The AlphaGo Zero version played only with itself and learned everything on its own, but even AlphaGo Master, who played against Ke Jie in May, won. Do you agree that when considering AlphaGo Zero, researchers do not even stutter about the match with a person and present only another computer system as a reference for comparison?

    It seemed to me that Zero began to play a more humane go, the moves became easier to understand, the game has less than what we call tenuki - this is when the program dramatically changes plans, basically not responding to the opponent’s last move. Of the minuses: the program still repeats the same patterns in the openings, which makes the parties less entertaining. Go in these games even resembles chess with their long learned openings. But in fact, in the parties of people often after the first 5-10 moves there arises a position not previously encountered - it is much more interesting to disassemble these parties.

    I expected games to be shown on the handicap - after all, there were allegations that a fresh version of alpha could give 4 handicap stones to the one that played with Fan (European champion). Alas, these parties are still kept secret.

    Nothing is heard about new matches. Yes, and those interested among the pros are somehow not visible. They understand, apparently, that when playing on equal chances there is no chance, and playing on the handicap is a blow to self-esteem.

    In their work, developers notice how AlphaGo Zero gradually invented some joseki (debut combinations), including one combination found in a professional game. In the same place, researchers note that the algorithm exhibits some properties characteristic of the human game: territory capture, greed, zones of influence. Do you think it is correct to call a computer go system a weak form of artificial intelligence?

    New to the debuts: as in the previous installments of alpha-li and alpha-master, we encounter moves that people thought were bad. I’ve been teaching Go for 15 years and remember that I scolded my students for such moves. Now all go professionals are trying to copy them, even proud Japanese who rarely adopted Chinese and Korean novelties. Everyone agrees that the ideas of Alpha are powerful, no one even tries to refute from.

    How did AlphaGo change go philosophy? Have new strategies already appeared? How can a completely “inhuman” AlphaGo Zero change the world of go?

    AlphaGo's ideas made the game more boring in debuts. And this is good. People will continue to be interested in parties of professionals, follow their news. Nowadays, there are still no programs playing on the strength of the pros for sale, and especially in the public domain. We are waiting for the Japanese DeepZenGo 7 in November this year. She will play the power of top pros (and there is evidence for this, since she is actively tested on go-servers). Here the first problems will already begin. We will feel ourselves in the shoes of chess players with their eternal suspicions of unfair play. And tournaments on go servers will suffer. But this is inevitable. Although no one imagined that this would happen so quickly.

    Has there been a fact in the go community that from now on people will have a head start in matches - they will have to give the computer, not the algorithm, but the protein player?

    The question on the handicap is very difficult. The game program shows that it is stronger than the best protein masters, but how much? Lee Sedol, for example, is sure that the match will not lose on 2 handicap stones. It would be interesting to hold the match on a floating handicap - in the format that Go Seigen used in the middle of the last century. But which of the top pros will do that? The pros have won against the pros 2 times earlier - recall, for example, the match of Cho Hongkhen with five contenders for Korean titles in the 80s. In my memory, this was the last match of this kind. But what if here you need not two stones, but 3 or 4? Can you imagine Kasparov playing a match with a car without a rook? Me not!

    A curious question. One of the alpha programmers had previously worked on the giraffe self-learning chess program, which had learned to play the power of the master in 72 hours. He probably gained a lot of experience working on the go program. Curious if he could write a new chess program by analogy with Alpha? Or does the neural network approach not work in chess? I am very interested in the answer to this question.

    Also popular now: