m1rko January 28, 2019 at 12:55

Have AlphaStar implemented superhuman speed as a patch for simulation training error?

Transfer

Probably everyone has already heard that an AI called AlphaStar from Google Deepmind has smeared professionals in Starcraft 2 real-time strategy . This is an unprecedented case in Artificial Intelligence research. But I want to express constructive criticism about this achievement.

I will try to convincingly prove the following:

AlphaStar played with superhuman speed and accuracy.
Deepmind claims to have banned AI from performing actions that are physically impossible for humans. The developers did not succeed in this and probably know about their cant.
The reason AlphaStar plays at superhuman speeds is most likely due to its inability to get rid of the acquired spam click skill. I suspect that the developers wanted to make the program more humane, but could not. It will take time to approach this thesis. But this is the main reason why I wrote an article, so please be patient.

First of all, I want to clarify that I am unprofessional. I followed the development of AI and the Starcraft 2 scene for many years, but I do not pretend to be an expert. If you notice any errors, please indicate them. I'm just a fan and all this is incredibly exciting for me. There is a lot of speculation in the article, and I admit that I can’t definitively prove the main claims. With all reservations, if you read the article and disagree with me, please argue constructively. I really want you to dissuade me.

After all, AlphaStar is an amazing achievement. In my opinion, the greatest achievement of Deepmind today, and I look forward to how to further improve this program. Thank you for your patience. So let's go.

Superhuman speed AlphaStar

David Silver, Co-Director of AlphaStar: “AlphaStar cannot respond faster and cannot make more clicks than a live player.”

Here, the lead AI designer makes an important announcement (from 1:39)

In 2018, Serral dominated the Starcraft 2 scene. He is the reigning world champion and has won seven of the nine major tournaments in which he participated, leading to one of the most powerful examples of single player dominance in the history of Starcraft 2. The guy is very fast. Perhaps the fastest in the world.

First-person view (from 13:00):

Take a look at his APM in the upper left. This is a reduction for the number of actions per minute. In fact, this number reflects how quickly the player clicks on the mouse and keyboard buttons. Serral can never hold APM for more than 500 for a long time. There is one surge up to APM 800, but only for a split second and, most likely, as a result of spam clicks, which I will talk about shortly.

So, the fastest player in the world is able to maintain an impressive level of APM 500, but AlphaStar had surges up to 1500+. These non-human indicators over APM 1000 sometimes lasted five seconds and are full of meaningful actions. 1,500 actions per minute is 25 actions per second. This is physically impossible for humans. Also, please be aware that five seconds in Starcraft is a long time, especially at the very beginning of a big battle. If the superhuman rate in the first five seconds gives AI an advantage, then it will easily win the battle thanks to the snowball effect. Here is the beginning of the AlphaStar battle in the third game against MaNa (from 59:30):

AlphaStar holds the APM 1000+ for five seconds. Another complication in the fourth game with the sky-high APM 1500+ (c 2:11:32):

One commenter points to an acceptable average APM. But it is clear that these bursts are much higher than human abilities.

Spam Clicks, APM, and Robot Surgical Accuracy

Most players are prone to spam clicks. Pointless clicks that don't affect anything. For example, a person moves the army and for some reason clicks several times at the destination. What effect? Nothing. The army will not go faster. One click was enough. Then why is he doing this? There are two reasons:

Spam-click is a natural side effect when a person tries to click as quickly as possible.
Helps warm up your fingers.

Remember Serral? Its impressive power is actually not in speed, but in accuracy. It has not only a really high APM, but also amazingly effective (total clicks per minute, except for spam clicks). From now on, I will reduce the effective APM as EPM. It is important to remember that EPM only considers meaningful actions.

Take a look at how a former professional lost his mind on Twitter when he recognized Serral’s EPM:

Serral in his WCS Leipzig replays consistently has 300+ EPM. 344 EPM in a game vs major's bio. The 3 other semi-finalists are around 200 EPM. Top Koreans I've looked are between 200-240. Serral is 50% faster than his opponents on average. Scary!
- Jos de Kroon (@Retjah) February 1, 2018

His EPM 344 is an almost unrealistic indicator. It is so tall that it’s still hard for me to believe that this is true. The difference between APM and EPM also affected AlphaStar. If AI can play without spam clicks, does this mean that its peak EPM is at times equal to the peak APM? This makes surges up to 1000+ even more inhuman. When we take into account that AlphaStar plays with perfect precision, its mechanical capabilities seem completely absurd. He always clicks exactly where he wants to click. People miss, and AlphaStar at the right moments starts working four times faster than the fastest player in the world - with the accuracy that a person can only dream of.

Almost everyone in the community agrees that AlphaStar performed sequences that no human being is able to repeat. He was faster and more accurate than physically possible. The fastest professional in the world is several times slower. Accuracy cannot even be compared.

David Silver’s claim that AlphaStar can only perform actions that a person can reproduce is simply not true.

Do everything right or just turn on the speed?

Oriol Vinyals, Lead Architect, AlphaStar: “It's important to master games that are recognized as“ fundamental challenges for AI. ” We are trying to create intelligent systems that take over our amazing capabilities, so it is very important that they learn as humanly as possible. No matter how cool it sounds, but achieving maximum performance in the game, like very high APMs, doesn’t really help us measure the capabilities and progress of our agents, which makes the benchmark useless. ”

Why does Deepmind want to limit the agent to play as a person? Why not just let it go badly without any restrictions? The reason is that in Starcraft 2, mechanical superpowers ruin the gameplay. In this videoThe bot attacks a group of tanks with several zerglings, realizing perfect microtactics. Usually zerglings can do almost nothing against tanks, but thanks to robots, microtactics become much more deadly: they destroy tanks with minimal losses. With such good unit management, AI does not need to learn strategy. After all, Deepmind is not interested in creating an AI that simply defeats Starcraft professionals; in fact, they want to use this project as a stepping stone in promoting general AI research. It is very sad that one of the project managers declares limitations along with human abilities, when the agent clearly violates them and wins his games precisely thanks to superhuman execution.

AlphaStar is superior to people in unit management - this factor was not taken into account when the developers carefully balanced the game. This inhuman control is capable of spoiling any strategic thinking that AI has mastered. It can even make strategic thinking completely unnecessary. The program is not just stuck at a local maximum. If the game is played with inhuman speed and accuracy, then the abuse of perfect unit control is likely to be the best, most effective and reliable way to win. No matter how sad it sounds.

Here is what one of the pros said about the strengths and weaknesses of AlphaStar, losing to him with a score of 1-5:

MaNa: “I would say that his best quality is unit management. AlphaStar defeated all games with approximately the same number of units. The worst aspect of a small number of games is the stubborn refusal to upgrade. He was so convinced of the victory by the basic units that he practically did not upgrade anything, for which he paid in the exhibition match [the last game with MaNa, where the AI lost - approx. trans.]. There weren’t so many decisive moments in decision-making, so I would say that mechanics became the reason for the victory. ”

Among Starcraft fans, it is almost unanimous that AlphaStar won almost exclusively because of its superhuman speed, reaction time and accuracy. The pros who played against him seem to agree with that. One Deepmind employee played against AlphaStar before the program was played against professionals. Most likely, he will also agree with such an assessment. David Silver and Oriol Vinyals repeat the mantra that AlphaStar is capable of doing only what a person is, but we have already seen that this is simply not so.

AlphaStar does not seem to be “doing it right,” as David says (from 1:38):

Something is clearly wrong here.

Why did Deepmind allow AlphaStar superhuman speed?

Finally, let's move on to the main thing. Thank you for reading to this place. But first, to summarize.

We know what APM, EPM and spam clicks are.
We have some understanding of the maximum capabilities of man.
AlphaStar game directly contradicts the claims of developers about its limitations.
The Starcraft 2 community agreed that AlphaStar won thanks to the inhuman control of units and did not even need excellent strategic thinking.
Deepmind does not set out to create a quick bot, so it should not have played like that.
It is very unlikely that none of the Starcraft AI team thought that a person is not able to repeat the bursts of APM 1500+. Their Starcraft specialist should know more about Starcraft than mine. They work closely with Blizzard, which owns intellectual property at StarCraft. It is in their interests (see the previous paragraph, as well as statements by Silver and Vinyals) to make the bot act as close to the person as possible.

Given all these points, why did Deepmind even allow AI to explicitly circumvent the limitations of the human body?

This is pure speculation on my part, and I do not claim to know the exact story. But I suspect that the following happened:

At the very beginning of the project, Deepmind agreed on hard limits. At this point, AlphaStar banned the superhuman APM bursts we saw in the demo. If I designed the system, I would set such restrictions:

Maximum average APM throughout the game .
Maximum short burst of APM . I think it’s wise to set it at 4-6 clicks per second. Remember Serral and its EPM 344, which is a cut above the competition? This is less than six clicks per second. Against MaNa, the program generated 25 clicks per second for long periods of time. This is much faster than even the fastest spam clicks of a person, so it is unlikely that the initial restrictions allowed this.
Minimum time between clicks . Even if you limit the maximum speed during bursts, the bot can click very quickly at a brief moment during the allowed interval, which a person is not capable of.

Some suggest adding an element of randomness to the accuracy of clicks, but I suspect that this will reduce the learning speed too much.

So, set limits. What's next? Deepmind then launched simulation training on thousands of high-end amateur video games. At this stage, the agent is simply trying to imitate what people do - and he masters spam clicks. This is very likely because people make them very often. This is almost the most repetitive model of behavior in people, so it must be very deeply rooted in the behavior of the agent.

AlphaStar's maximum APM bursts are initially close to the set limits. But most AlphaStar clicks turned out to be spam clicks, so his APM was not enough for a normal fight.But without experimentation, there is no training. Here is what one of the developers said in yesterday's AMA: I think he is a little smeared in this scam:

Oriol Vinyals, Lead Architect, AlphaStar: “Teaching AI to play with low APM is quite interesting. In the early days, our agents trained with very low APMs and were generally not capable of micromanagement. ”

To speed up learning, developers increase APM limits by allowing short bursts. Here are the APM limitations that were in effect for AlphaStar in a demo match:

Oriol Vinyals: “In particular, we set a limit of 600 APM at intervals of 5 seconds, 400 APM at intervals of 15 seconds, 320 for 30 seconds and 300 for 60 seconds. If the agent issues more actions at these intervals, we discard / ignore them. These values are taken from human statistics. ”

If you are not very familiar with Starcraft, then such limits look reasonable, but they allow the superhuman bursts of APM, which we talked about earlier, as well as superhuman accuracy.

There is a limit on the maximum number of spam clicks. Usually these are commands to move or attack when a click is made on the map. Try how fast you can click the mouse button. The agent learned spam clicks from players and will not click faster than a person. That is, additional APM clicks at superhuman speed are "arbitrary" for experiments.

Arbitrary APM is used for battle experiments. This interaction often occurs during training. AlphaStar begins to study a new type of behavior that leads to better results, and the percentage of spam in clicks is reduced.

If the agent learned the benefits, why did Deepmind not revert to the original tougher, more humane restrictions on APM? Surely they realized that AI demonstrates superhuman abilities. The Starcraft community has almost unanimously recognized AlphaStar's inhuman micromanagement. Pros told AMA that AlphaStar’s main strength is its control over units, and its main weakness is strategic thinking. The Deepmind developers must have come to the same conclusion. Probably the reason is that the agent could not get rid of spam clicks. Although most of the time he acts clearly, but still regularly falls into spam clicks. This is evident in the first game against MaNa, when Alphastar rises up the ramp (from 39:30):

Look carefully at the blue circles with highlighting units.

The spam agent clicked on commands to move units at a speed of 800 APM. He never completely unlearned human stupidity, although these actions are completely useless and eat up his APM limit. The bug is especially dangerous during big battles. Probably, the APM limit was raised to fix the joint and allow the agent to work normally at such times.

What is so important about this?

I suspect that the agent could not get rid of the spam clicks that he learned during the simulation training in humans. Deepmind had to tinker with the APM limit to make experimentation and further progress possible. However, an unpleasant side effect of the superhuman game appeared, because of which, in essence, the agent violates the rules, being able to implement strategies that were initially forbidden to him.

This is an important thing, because such a beating of professionals directly contradicts the mission that Deepmind has repeatedly stated. Because of this, this graph leaves a sour taste of hypocrisy in your mouth:

This is a picture Deepmind posted on her blog.

It looks like the chart is designed to mislead people unfamiliar with Starcraft 2. It depicts the supposedly acceptable APM from AlphaStar. Take a look at APM MaNa and compare it with AlphaStar. Although the average is higher at MaNa, the AlphaStar's tail goes far beyond human capabilities. Please note that MaNa has a peak APM of about 750, while AlphaStar has a peak of over 1500. Now, keep in mind that in a person, APM is more than half composed of spam clicks, and AlphaStar EPMs are perfectly accurate clicks.

Now take a look at the APMs at TLO. The tail leaves for 2000. Think about it for a second. How is this possible? This was made possible thanks to a trick called "quick fire." TLO does not click superfast. He just holds the button - and the game registers it as 2000 APM. The only thing you can do with fast fire is spam at crazy speed. That's all. TLO just uses this for some reason. But at the same time, AlphaStar's superhuman APM bursts are masked - and the numbers look realistic for people who are not familiar with Starcraft.

Deepmind's blog post is not trying to explain the absurd TLO figures. If they do not explain the exaggerated TLOs, then they should not be included in the schedule. Point.

Such statistics dangerously close to a lie. Deepmind should adhere to higher standards.

Tags: