OpenAI Universe. Open platform for training strong AI


    A set of tasks for training with reinforcements of a strong AI within the framework of the universal OpenAI platform

    Founded by Ilon Mask and colleagues, the nonprofit organization OpenAI, which aims to create safe (that is, publicly available and open) artificial intelligence, has taken another step to implement its plans. OpenAI introduced the Universe middleware to train and train strong AI. Theoretically, training can occur on all information of mankind, accessible through the Internet. These are games, websites and other applications.

    Only nine lines of code - and thousands of environments are available for your AI for training.

    Using the Universe software platform, the intelligent agent will use the computer in the same way as a person does: he will look at the pixels of the computer screen and interact using the keyboard and mouse (while virtual).


    Artificial intelligence learns the world through the VNC program interface for remote desktop access.

    It is supposed to train an intelligent agent on a full set of tasks. The Universe platform opens for AI any tasks that a person is capable of solving at the computer.

    OpenAI Gym Environments


    The discovery of the universal universal platform - the continuation of the planned actions of OpenAI to create a worldwide open universal AI. In April of this year, the organization released a public beta version of the OpenAI Gym toolkit for developing and comparing reinforcement learning algorithms. The "Gym" OpenAI Gym consists of a large number of environments (from a humanoid robot simulator to Atari games ). There is a site for comparing and reproducing the results .

    OpenAI Gym is compatible with algorithms written in any framework, including Tensorflow and Theano. Initially, environments are created on Python, but in the future, developers plan to make it possible to implement them in any programming language.

    OpenAI believes that reinforcement learning is an important way of machine learning that will greatly improve AI. In the process of learning by this method, the test system (agent) learns by interacting with a certain environment. Unlike traditional teaching with a teacher, reinforcement signals are the response to the AI ​​decisions, while some reinforcement rules are dynamically formed and difficult to understand for a person, that is, they are based on the simultaneous activity of formal neurons.


    Reinforcement signal recognized by OCR at 60 fps: video

    OpenAI Universe Software


    The Universe introduced today is middleware that fully supports the toolkit environment and the runtime environment of OpenAI Gym. Thanks to this middleware, it is planned to drastically increase the number of environments for training AI.

    Previously, the largest catalog of apps for learning with reinforcements included only 55 Atari games (Atari Learning Environment), then on the Universe platform, games from many other developers, including Valve, EA and Microsoft, are expected to appear.

    From the very beginning, thousands of games (flash games, multiplayer Slither snakes , Starcraft, GTA V and others), various browser-based tasks (like filling out forms) and applications (such as fold.it puzzles) are available via the Universe “middleware” .). Almost any game can be freely launched using the Python library universe , which is published in the public domain on Github.

    import gym
    import universe # register Universe environments into Gym
    env = gym.make('flashgames.DuskDrive-v0') # any Universe environment ID here
    observation_n = env.reset()
    whileTrue:
      # agent which presses the Up arrow 60 times per second
      action_n = [[('KeyEvent', 'ArrowUp', True)] for _ in observation_n]
      observation_n, reward_n, done_n, info = env.step(action_n)
      env.render()

    The above code launches an artificial intelligence agent to play the game Dusk Drive .

    Dusk Drive game

    "Our ultimate goal is to develop a single intelligent agent who is able to flexibly apply the experience gained in the Universe to solve new problems and quickly gain new experience, which will be an important step on the way to a strong AI," the OpenAI statement said. .

    Universe software environments are installed in Docker containers. As already mentioned, they communicate with an intelligent agent through a visual interface - through the "screen", "keyboard" and "mouse", as with a person. The interface is implemented using the VNC program for remote desktop access.

    The idea is that the constant improvement of AI skills with the accumulation of experience in various small tasks will help him to master each new task faster and faster, applying existing knowledge. The platform and the Universe environment set can become for intellectual agents the same standard uniform platform for training and reinforcement training, which is the ImageNet data set — the image base for training neural network classifiers when training with a teacher.

    Reinforcement training can really be very effective. For example, the intelligent agent Universe has been training for about six days to play the multiplayer web game Slither. After six days, the AI ​​gains an average of 1,000 points in gaming sessions, with a maximum result of 1,400 points. For comparison, an employee from the OpenAI organization with a five-hour game experience gains an average of 1400 points with a maximum result of 7050.

    Currently, the following games and applications from OpenAI partners are available to agents via the Universe middleware: Portal , Fable Anniversary , World of Goo , RimWorld , Slime Rancher , Shovel Knight , SpaceChem, Wing Commander III , Command & Conquer: Red Alert 2 , Syndicate , Magic Carpet , Mirror's Edge , Sid Meier's Alpha Centauri and Wolfram Mathematica . The list will increase.

    Also popular now: