rodiohabr January 19, 2018 at 09:43

Reinforced Learning Experiment Platforms and More

From the sandbox

The researchers' dream of creating universal artificial intelligence led to the emergence of a mass of services where you can try the new algorithm on completely different tasks and evaluate how universal it is. What tasks he copes with, and which presents difficulties for him.

This article provides a brief overview of twelve similar services.

ALE: Arcade Learning Environment

→ Introductory article
→ Plafthorm repository

for the development and evaluation of machine learning algorithms. Provides an interface to hundreds of Atari 2600 games, each of which is unique and designed in such a way as to be interesting to people. The variety of games presented allows researchers to try to make truly universal algorithms and compare their results with each other.

For an algorithm operating in an ALE environment, the world looks quite simple. Observations - two-dimensional arrays of 7-bit pixels (array size 160 by 210 pixels). Possible actions are 18 signals, which, in principle, can be generated by the joystick of the console. The way you receive the reward can vary from game to game, but as a rule this is the difference in points for the current and previous frames.

In standard mode, the Atari emulator generates 60 frames per second, but on modern hardware it can be run much faster. In particular, data is provided about 6000 frames per second.

MAgent

→ Introductory article
→ Repository

A simulation environment with a focus on experiments in which hundreds to millions of agents can be involved. Unlike other environments where multi-agent is claimed, but in fact limited to dozens of agents, MAgent scales well and can support up to a million agents on one GPU.

All these efforts are aimed at not only training one agent in optimal behavior, but also exploring social phenomena that arise in the midst of a large number of intelligent agents. These may be issues related to self-organization, communication between agents, leadership, altruism and much more.

MAgent provides researchers with the flexibility to customize their environments. The demo version contains three preconfigured experimental environments: harassment (predators must unite in a flock to effectively pursue herbivores), collecting resources in a competitive environment, the battle of two armies (agents must master the techniques of encirclement, “guerrilla warfare, etc.)

Malmo

→ Introductory article A

platform for basic research in the field of machine learning based on the popular Minecraft game. Minecraft is a 3D game in which a dynamic world of the required complexity can easily be created. Provides an API for managing an agent, creating tasks for him, conducting experiments.

Interesting and challenging.

VizDoom

→ Project site

Based on the popular 3D Doom game, an environment for experimenting with computer vision and reinforced learning. You can create your own scripts / maps, use the multi-user mode, the mode in which the learning agent monitors the player’s actions, etc. The environment is fast enough (up to 7000 FPS per thread) and can work both on Linux and under Windows.

Provides an easy-to-use API for C ++, Python, and Java. The API is optimized for use in reinforcement learning algorithms. As observations, an image from the screen buffer is transmitted to the learning algorithm, and a depth map can also be transmitted.

The project website has a tutorial, video demos, examples, detailed documentation.

ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games

→ Introductory article
→ Repository

Platform for basic research of reinforcement learning algorithms.
Allows you to host games made on the basis of C / C ++ (as well as ALE). In addition, the developers made on the basis of ELF a simplified version of the real-time strategy (RTS), which can work with up to 4000 FPS per core on a laptop. This performance allows you to learn algorithms faster than in environments where conventional RTS games are used that are not optimized in order to execute faster than in real-time mode. There are also game options in Tower Defense and Capture the Flag modes. Those

interested may also be interested in watching the presentation of Yuandong Tian from Facebook Research with ICML2017.

Mazebase

→ Introductory article
→ Repository

Unlike systems that use games that were originally created to entertain people, this work focuses on creating games specifically designed for testing learning algorithms with reinforcement. Games created on the platform, you can modify or create new ones.

Out of the box, the system contains a dozen simple 2D games made on the basis of the “world of cells”. When creating the world, the developers were inspired by the classic Puddle World, but they supplemented it with their ideas and made a map regeneration every time a new training cycle was launched. Thus, the agent is trained each time on a world that he has not yet seen.

OpenAI Gym / Universe

→ Introductory article on GYM
→ Universe

Gym repository is a set of tools for researching reinforcement learning algorithms. It includes an ever-expanding collection of test environments for experimentation. The website of the project allows you to share the results achieved and compare them with the results of other participants.

Universe allows you to make almost any program a test environment without having to access its internal variables or source code. The program is placed in the Docker container, and interaction with it is carried out through the emulation of keystrokes or mouse events. More than 1000 environments are available (mainly various games) in which the AI agent can perform actions and receive observations. Of this thousand, several hundred also contain information about the "reward" for the perfect action. Such environments also include scripts to “click on” the start menu of the program and go directly to the content of the game or application.

Perhaps Gym is the best choice for beginners.

Tensorflow agents

→ Introductory article
→ Repository

Developers call TensorFlow Agents an infrastructure paradigm. The main focus of this development is on accelerating the training and testing of algorithms due to the parallel execution of a large number of simulation environments and batch processing of data on the GPU and CPU. Thus, the “bottleneck” inherent in most other pralforms expands and the algorithm debugging cycle is accelerated. At the same time, applications that support the OpenAI Gym interface are used as the environments themselves, and, as already mentioned above, there are a lot of them and there are plenty to choose from.

Unity ML Agents

→ Repository

Now you can create simulation environments for machine learning using the Unity Editor. They will work using the Unity Engine. According to the proposed paradigm, it is required to define and develop code for three objects: Academy, Brain, Agent.

Academy - general environment settings, its internal logic. In addition, Academy is the parent object for the remaining entities of the model.

Brain - an object that describes the logic of decision making. There can be several options - an interface to TensorFlow (through an open socket and Python API or through TensorFlowSharp), heuristic-based self-written logic or waiting for keyboard and mouse input to directly control an agent with a human operator.

Agent - an object containing a unique set of states, observations. Undertaking a unique sequence of actions within the simulation environment. The "body" of the simulated object.

There are also built-in tools for monitoring the internal state of agents, the ability to use several cameras as observations (which may be important, for example, if you want to learn how to compare data from several sources, as happens, for example, in autonomous cars) and much more.

Deepmind pycolab

→ Introductory article
→ Repository

In fact, this is a game engine for developing simple games with ASCII graphics. Due to its simplicity and lightness, such games allow you to debug learning algorithms with reinforcement even on relatively weak hardware.

Among the ready-made examples there are already “Space Invaders”, “Labyrinth”, an analogue of “Supplex” and some more small games.

SC2LE (StarCraft II Learning Environment)

→ Introductory article
→ Repository An

environment for learning how to play StarCraft II. StarCraft II is a challenging machine learning challenge that many of the best minds are fighting right now. Hundreds of units, incomplete information about the map due to the presence of the "fog of war", the huge variability of development strategies and the reward delayed by thousands of steps. It looks like StarCraft II will be the next big milestone in the victories of machine learning techniques over humans after winning the go.

The environment provides open-source Python tools for interacting with the game engine. In addition to the standard game cards, the developers made several of their mini-games for debugging various elements of the gameplay, such as collecting resources, battles, etc.

Records of games of professional players and test results of classical machine learning algorithms as applied to this task are also available for those interested.

Coach

→ Project site
→ Repository

Modular Python-environment for debugging and learning algorithms with reinforcement. It allows you to collect simulation agents "from pieces" and use the full power of multiprocessor systems in the process of evaluating the effectiveness of algorithms and training models.

It contains built-in state-of-the-art implementations of many machine learning algorithms and can be a good starting point for those who want to try how various algorithms work, but not go deep into the features of their implementation.

Coach is able to collect statistics about the learning process and supports advanced visualization techniques to help debug learning models.

Instead of a conclusion

If you missed something, please write in the comments.

If you take a vacation from February 26 to March 7, then you can rest continuously for 17 days. You should now have more ideas to do at this time.

Tags: