Why self-learning AI has problems in the real world
The latest self-learning systems of artificial intelligence are capable of studying a game from scratch from scratch and become world-class champions. Until recently, machines capable of beating champions began their studies by studying human experience. In order to defeat Garry Kasparov in 1997, IBM engineers loaded into Deep Blue information accumulated over the centuries of the fascination of mankind with chess. In 2016, the Google-based DeepMind artificial intelligence AlphaGo surpassed champion Lee Sedola in the ancient board game Go, after having studied millions of positions from tens of thousands of games played by people. But now the developers of AI are rethinking the approach to the introduction of human knowledge into “electronic brains”. Current trend: do not bother with this.
In October 2017, the DeepMind team published information about the new Go system - AlphaGo Zero. She never studied the games played by people. Instead, she learned the rules and started playing with herself. The first moves were completely random. After each game, the system analyzed what led to victory or defeat. After some time, AlphaGo Zero began to play with the pumped-out winner Li Sedola - AlphaGo. And she won with a score of 100: 0.
Lee Sedol, 18-time world champion in the game of Go, during the match with AlphaGo in 2016.
Then the researchers created a system that became the strongest player in the AlphaGo family - AlphaZero. In work, published in December, the developers of DeepMind reported that AlphaZero, also starting to learn from scratch, surpassed AlphaGo Zero - that is, she defeated the bot, who defeated the bot, who defeated the best Go player in the world. And when she was fed the rules of chess, as well as the Japanese version of this game, shogi , AlphaZero quickly learned to win the strongest algorithms in these games. The experts were surprised at the aggressive, unusual style of the game. As the Danish grandmaster Peter Heine Nielsen noted : “I was always interested to know what would happen if super-entities fly to Earth and show us how they can play chess. Now I know".
Last year, we saw the emergence of otherworldly self-learning bots in areas as diverse as unlimited poker and Dota 2.
It is clear that companies investing money in these and similar systems have much more ambitious plans than domination in gaming championships. Researchers hope to use similar methods to solve real problems, such as creating superconductors operating at room temperature, or using origami principles for packing proteins into potent drug molecules. And, of course, many practitioners hope to create a general-purpose AI - a vague goal, but exciting, implying that the machine will be able to think like a person and solve various tasks.
But despite the large investment of manpower and equipment in such systems, it is not clear how far they can get away from the sphere of games.
Ideal goals for an imperfect world
Many games, including chess and Go, are united by the fact that players always see the whole alignment on the playing field. Each player at any time has "full information" about the state of the game. But the harder the game, the further you need to think ahead from the current moment. In reality, this is usually not the case. Imagine that you asked for a computer to diagnose or conduct business negotiations. Noam Brown , a graduate student in computer science at Carnegie Mellon University: “Most real-world strategic relationships use hidden information. I have a feeling that many members of the AI community ignore this circumstance. ”
Brown specializes in the development of poker game algorithms, and there are other difficulties in this game: you do not see the cards of your opponents. But here too, cars that learn to play on their own are already reaching transcendental heights. In January 2017, a program called Libratus, created by Brown and Tuomas Sandholm , beat one on one four professional players in no-limit Texas Hold'em. At the end of the 20-day tournament, the bot scored $ 1.7 million more rivals.
StarCraft II multiplayer strategy is an even more impressive game, implying incomplete knowledge of the current situation. Here AI has not yet reached Olympus. This is hampered by the huge number of moves in the game, often measured in thousands, and their high speed of execution. Each player - a person or a car - with every click you need to think about the unlimited variety of further developments.
So far, the AI can not fight on equal terms with the best players. But developers are aiming for it. In August 2017, DeepMind enlisted the support of Blizzard Entertainment (created StarCraft II) in creating tools that should help AI researchers.
Despite the difficulty of the gameplay, the essence of StarCraft II comes down to a simple task: to destroy enemies. The same can be said about chess, Go, poker, Dota 2 and almost any other game. And in games you can win.
From the point of view of the algorithm, the task must have a “target function”, which must be found. It wasn't too hard when AlphaZero played chess. The loss was counted as -1, a draw - 0, a victory - +1. The objective function for AlphaZero was to earn the maximum points. The objective function for a poker bot is also simple: win a lot of money.
The algorithm is trained in complex behavior - walking on an unfamiliar surface.
In life, everything is not so clear. For example, an unmanned vehicle needs a more specific objective function. Something like the careful formulation of his desire, which explains the genie. For example: to quickly deliver passengers to the correct destination, observing all the rules and properly assessing human lives in dangerous and uncertain situations. Pedro Domingos , computer scientist at the University of Washington: "Among other things, the difference between a great and ordinary machine learning researcher lies in the way the objective function is formulated."
Remember the twitter chat Tay, launched by Microsoft on March 23, 2016. His goal was to involve people, and he achieved it. But suddenly it turned out that the best way to maximize involvement was to pour out all sorts of insults. Bot turned off in less than a day.
Your personal worst enemy
Something stays the same. The methods used by modern dominant gaming bots rely on strategies devised decades ago. Just the same greetings from the past, only supported by modern computing power.
These strategies are usually based on reinforcement learning, a technique without human intervention. Instead of replacing the meticulous control of the algorithm using detailed instructions, engineers allow the machine to explore the environment and achieve its goals by trial and error. Before the release of AlphaGo and its descendants, in 2013 the DeepMind team achieved a serious, important result, through learning with reinforcements, having taught the bot to play seven games for the Atari 2600, and in three of them - at the expert level.
Not stopping there, on February 5, the DeepMind team rolled out IMPALA , an AI system capable of playing 57 games for the Atari 2600, as well as 30 more three-dimensional levels created in DeepMind. At these levels, the player passes through various localities and rooms, solves problems like opening doors and picking up mushrooms. Moreover, IMPALA transferred the accumulated experience between tasks, that is, each session played improved the results of the subsequent session.
But within the broader category of reinforced learning, board and multiplayer games allow for an even more specialized approach. The study can take the form of a game with yourself, when the algorithm gains experience, struggling with its own copy.
This idea is also very many years. In the 1950s, IBM engineer Arthur Samuelcreated a program for the game of checkers, which partially studied in games played between alpha and beta copies. And in the 1990s, Gerald Tesauro, also from IBM, created a backgammon program that put its own algorithm against it. The bot reached the level of a human expert by developing non-standard but effective strategies.
Playing with itself, the algorithm in each game meets an equal rival. Therefore, changes in strategy lead to different results, given the immediate response of the copy algorithm. Ilya Sutskever , research director at OpenAI: “Every time you learn something new, discover the slightest information about the game and the environment, your opponent instantly uses it against you.” In August 2017, OpenAI released a bot for Dota 2who controlled the Shadow Fiend character — something like a necromancer demon — and defeated the best players in the world in fights. Another project of the company: two algorithms control sumo wrestlers, learning from each other fighting techniques. And during such training it is impossible to stagnate, you must constantly improve.
The Dota 2 bot created in OpenAI independently learned several difficult strategies.
But the old idea of playing with yourself is just one of the ingredients of the modern superiority of bots, who still need to somehow “rethink” their gaming experience. In chess, Go and video games like Dota 2, there are a myriad of possible combinations. Even after spending many lives in battles with his shadow in virtual arenas, the machine will not be able to calculate every possible scenario in order to create a table of actions and check with it when it finds itself in a similar situation again.
To stay afloat in a sea of possibilities, you need to generalize, grasp the essence. IBM Deep Blue succeeded thanks to built-in chess formulas. Armed with the ability to evaluate combinations on the board, with which he had not previously met, the computer adjusted the moves and strategies in order to increase the probability of his victory. But the new techniques that have appeared in recent years have allowed us to abandon the formulas.
Deep neural networks are gaining more and more popularity. They consist of layers of artificial "neurons," like pancakes in a stack. When neurons in one layer are triggered, they send signals to the next layer, it sends to the next one, and so on. By correcting the connections between the layers, such neural networks achieve fantastic results by transforming the input data into some interconnected result, even if the connection seems to be abstract. Suppose a neural network can be given a phrase in English, and that will translate it into Turkish. Or you can give her photos from the animal shelter, and the neural network will find those pictures of cats. Or you can show the board game's deep neural network rules, and it will calculate the probability of its victory. But first, as you understand, the neural network must learn how to select tagged data.
Neural networks playing with themselves and deep neural networks complement each other well. Networks playing with themselves generate a stream of information about games, providing a theoretically infinite source of data for learning for deep networks. In turn, deep networks offer a way to learn experiences and patterns that have been developed by networks that play with them.
But there is one trick. In order for systems playing with themselves to generate useful data, they need a realistic place to play.
All games are played, all heights are reached in environments where you can emulate the world with varying degrees of confidence. And in other areas it is not so easy to achieve impressive results.
For example, unmanned vehicles are hard to ride in bad weather, and cyclists strongly hinder the road. Also, drones may incorrectly assess the unusual, but the real situation, like the flight of a bird directly into the car's camera. Or take a less exotic application of AI - a robotic arm-manipulator. First, she needs to be taught the basics of physical actions so that the hand can at least understand how to learn it. But at the same time, she doesn’t know the peculiarities of touching different surfaces and objects, so for solving such tasks as unscrewing the cap from the bottle or performing a surgical procedure, the machine needs to be practiced.
Joshua bengio, a specialist in deep learning at the University of Montreal: “In difficult-to-simulate situations, the“ play with yourself ”learning model is not very useful. There is a huge difference between a truly ideal environment model and a model learned, “suffered”, especially if the environment is complex. ”
Life after games
It is difficult to say exactly when the superiority of AI in games began. You can choose Kasparov’s loss or Lee Sedola’s defeat. Often, counting from 2011, with the loss of Ken Jennings, the champion of the television game Jeopardy! , in a two-day rivalry with IBM Watson. The car was able to understand the wording and wordplay. The developers have endowed Watson with the word processing ability of our own. The computer can take an English phrase phrase to the word, with great speed to view the relevant documents, select pieces of information and choose the best answer.
But over the past few years, the "ordinary" life tasks are still not amenable to AI. In September 2017, published a report, according to which great difficulties have arisen in the research and development of personal cancer treatment methods in the framework of the Watson for Oncology project. The computer is much easier to understand the meaning of the questions in Jeopardy! than to understand the essence of the medical article.
However, there are a number of real tasks that are as highly specialized as games. It is rumored that the DeepMind team is working on adapting AlphaZero for use in biomedical protein folding studies. To do this, developers will have to understand how the amino acids that form proteins can be minimized.into small three-dimensional structures whose functions depend on the form. This is as difficult as a game of chess: chemists know some principles that allow them to calculate some scenarios, but the abundance of possible three-dimensional configurations is so great that it is simply not realistic to investigate them all "manually". But what if you turn protein styling into a game? That is exactly what they have already done. Since 2008, hundreds of thousands of players have tried their hand at the online game Foldit , in which points were given for the stability and feasibility of the protein structures created. A machine can train itself in the same way, for example, through reinforcement training, trying to outperform the best results of human players.
Also training with reinforcements and playing with you can help train interactive systems. Then robots will be able to talk to people, first having learned to talk to themselves. And given the increased productivity and availability of specialized equipment for AI, engineers will receive an incentive to translate more and more real-world tasks into the form of a game. Probably, in the future, it will only increase the importance of the “play with yourself” technique and other approaches that require enormous computing power.
But if our main goal is to create a machine that can do as much as a person, and a self-learning machine, then board games champions like AlphaZero have potential paths of development. It is necessary to be aware of the gulf between real mental activity, creative thinking of ideas and what we see today in the field of AI. That luminous image of artificial intelligence exists, for the most part, in the minds of great researchers.
Many scientists, aware of the level of hype, offer their own classifications. No need to overestimate the importance of bots playing games for the development of AI in general. People, for example, are not very good at playing. But on the other hand, very simple, specialized tools in some tasks can reach great heights.