
Machine Learning Agents at Unity
- Transfer
- Tutorial

This article about machine learning agents at Unity was written by Michael Lanham, a technical innovator, active developer for Unity, a consultant, manager, and author of many Unity games, graphic projects, and books.
Unity developers have implemented support for machine learning, and in particular reinforcement learning for the creation of deep reinforcement learning (DRL) SDKs for game and simulation developers. Fortunately, the Unity team, led by Danny Lange, has successfully implemented a reliable and modern DRL engine capable of delivering impressive results. Unity uses the proximal policy optimization (PPO) model as the basis of the DRL engine; this model is much more complex and may differ in some aspects.
In this article, I will introduce you to the tools and SDKs for creating DRL agents in games and simulations. Despite the novelty and power of this tool, it is easy to use and it has auxiliary tools that allow you to learn machine learning concepts on the go. To work with the tutorial you need to install the Unity engine.
Install ML-Agents
In this section, I will briefly talk about the steps that must be taken to install the ML-Agents SDK. This material is still in beta and may vary from version to version. Follow these steps:
- Install Git on the computer; It works from the command line. Git is a very popular source code management system, and there are many resources on the Internet about installing and using Git across platforms. After installing Git, make sure it works by creating a clone of any repository.
- Open a command prompt or regular shell. Windows users can open the Anaconda window.
- Go to the working folder where you want to place your new code and enter the following command (Windows users can select C: \ ML-Agents):
git clone https://github.com/Unity-Technologies/ml-agents
- So you clone the ml-agents repository on your computer and create a new folder with the same name. You can also add a version number to the folder name. Unity, like almost the entire world of artificial intelligence, is constantly changing, at least for now. This means that new changes are constantly appearing. At the time of writing, we are cloning the repository into the ml-agents.6 folder:
git clone https://github.com/Unity-Technologies/ml-agents ml-agents.6
- Create a new virtual environment for ml-agents and specify version 3.6, like this:
#Windows conda create -n ml-agents python = 3.6 #Mac Use the documentation for your preferred environment
- Activate your environment again with Anaconda:
activate ml-agents
- Install TensorFlow. In Anaconda, this can be done with the following command:
pip install tensorflow == 1.7.1
- Install Python packages. In Anaconda, enter the following:
cd ML-Agents #from root folder cd ml-agents or cd ml-agents.6 #for example cd ml-agents pip install -e . or pip3 install -e .
- So you install all the necessary Agents SDK packages; this may take several minutes. Do not close the window, it will come in handy soon.
So we installed and configured the Unity Python SDK for ML-Agents. In the next section, we will learn how to set up and train one of the many environments provided by Unity.
Agent Training
Now we can get down to business immediately and explore examples that use deep reinforcement learning (DRL). Fortunately, there are several examples in the toolkit of the new agent to demonstrate the power of the engine. Open Unity or Unity Hub, and follow these steps:
- Click on the Open project button at the top of the Project dialog box.
- Locate and open the UnitySDK project folder, as shown in the screenshot:
Open the Unity SDK Project - Wait for the project to load, and then open the Project window at the bottom of the editor. If a window opens asking you to update the project, then select yes or continue. Currently, all agent code is backward compatible.
- Locate and open the GridWorld scene as shown in the screenshot:
Opening an example of a GridWorld scene - Select the GridAcademy object in the Hierarchy window.
- Go to the Inspector window and next to the Brains field click on the icon to open the Brain selection dialog:
- Select the brain of the GridWorldPlayer. This brain belongs to the player, that is, the player (you) can control the game.
- Click the Play button at the top of the editor and watch the environment. Since the game is now set up to control the player, you can use the WASD keys to move the cube. The task is to move the blue cube to the green + symbol, while avoiding the red X.
Get comfortable in the game. Note that the game only works for a certain period of time and is not turn-based. In the next section, we will learn how to run this example with the DRL agent.
What's in the brain?
One of the amazing aspects of the ML-Agents platform is the ability to quickly and easily switch from player management to AI / agent management. For this, Unity uses the concept of a “brain”. The brain can be controlled either by the player or by the agent (learning brain). The most amazing thing is that you can assemble the game and test it as a player, and then give it under the control of an RL agent. Thanks to this, any written game with a little effort can be made to be controlled using AI.
The process of setting up and starting the RL agent training in Unity is quite simple. Unity uses external Python to build a model of the learning brain. Using Python makes a lot of sense because there are already several deep learning (DL) libraries built around it. To train the agent in GridWorld, complete the following steps:
- Select GridAcademy again and select the GridWorldLearning brain in the Brains field instead of GridWorldPlayer:
Switching to Using GridWorldLearning Brain - Check the Control box on the right. This simple parameter reports that the brain can be controlled externally. This option must be enabled.
- Select the trueAgent object in the Hierarchy window, and then in the Inspector window change the Brain property in the Grid Agent component to the GridWorldLearning brain:
GridWorldLearning brain job for agent - In this example, we need both Academy and Agent to use the same GridWorldLearning brain. Switch to the Anaconda or Python window and select the ML-Agents / ml-agents folder.
- Run the following command in an Anaconda or Python window using the ml-agents virtual environment:
mlagents-learn config / trainer_config.yaml --run-id = firstRun --train
- This will launch the Unity PPO training model and an example agent with the specified configuration. At a certain point, the command prompt window will ask you to start the Unity editor with the loaded environment.
- Click Play in the Unity editor to launch the GridWorld environment. Soon after, you should see agent training and output to the Python script window:
Running GridWorld in Learning Mode - Note that the mlagents-learn script is a Python code that builds an RL model to run an agent. As you can see from the output of the script, there are several parameters (hyperparameters) that need to be configured.
- Let the agent learn a few thousand iterations and notice how fast it learns. The internal model used here called PPO has proven to be a very effective learning model for many different tasks, and it is very well suited for game development. With sufficiently powerful equipment, an agent can ideally learn in less than an hour.
Let the agent learn further and explore other ways to track the agent’s learning process, as presented in the next section.
Monitoring Learning with TensorBoard
Training an agent using the RL model or any DL model is often a daunting task and requires attention to detail. Fortunately, TensorFlow has a set of charting tools called TensorBoard that you can use to monitor your learning process. Follow these steps to start TensorBoard:
- Open an Anaconda or Python window. Activate the ml-agents virtual environment. Do not close the window in which the training model is running; we need it to continue.
- Go to the ML-Agents / ml-agents folder and run the following command:
tensorboard --logdir = summaries
- So we launch TensorBoard on our own built-in web server. You can load the page using the URL shown after the previous command.
- Enter the URL for the TensorBoard as shown in the window, or type localhost: 6006 or machinename: 6006 in the browser. After about an hour, you should see something like this:
TensorBoard Chart Window - The previous screenshot shows graphs, each of which displays a separate aspect of training. To understand how our agent is trained, you need to deal with each of these graphs, so we will analyze the output from each section:
- Environment: this section shows how the agent manifests itself in the environment as a whole. Below is a more detailed view of the charts with the preferred trend:

A detailed picture of the graphs of the Environment section
- Cumulative Reward: This is the total reward that maximizes the agent. Usually it is necessary that it increases, but for some reason it may decrease. It is always best to maximize rewards between 1 and -1. If the schedule rewards go beyond this range, then this also needs to be fixed.
- Episode Length: if this value decreases, then it is usually a good sign. Ultimately, the shorter the episodes, the more training. However, keep in mind that if necessary, the length of the episodes may increase, so the picture may be different.
- Lesson: this chart makes it clear which lesson the agent is in; It is intended for curriculum learning.
- Losses: This section shows graphs representing the calculated losses or costs for the policy and value. Below is a screenshot of this section with arrows pointing to the optimal settings:
Losses and preferred training
- Policy Loss: This chart determines the amount of policy change over time. Politics is an element that defines actions, and in the general case, this schedule should tend downward, showing that politics is making better decisions.
- Value Loss: This is the average loss of the value function. In essence, it models how well the agent predicts the value of its next state. Initially, this value should increase, and after stabilization of remuneration, it should decrease.
- Policy: to assess the quality of actions in PPO, the concept of politics is used, not the model. The screenshot below shows the policy charts and the preferred trend:
Policy Charts and Preferred Trends - Entropy: This graph shows the magnitude of the research agent. This value needs to be reduced, because the agent learns more about the environment and needs less research.
- Learning Rate: in this case, this value should gradually decrease linearly.
- Value Estimate: This is the average value visited by all agent states. To reflect an agent’s increased knowledge, this value must grow and then stabilize.
6. Leave the agent running until complete and do not close the TensorBoard.
7. Return to the Anaconda / Python window that trained the brain and run this command:
mlagents-learn config / trainer_config.yaml --run-id = secondRun --train
8. You will again be asked to click Play in the editor; so do it. Let the agent begin training and conduct several sessions. In the process, watch the TensorBoard window and notice how secondRun is displayed on charts. You can let this agent run until completion, but you can stop it if you wish.
In previous versions of ML-Agents, you had to first build the Unity executable as a learning environment for the game, and then run it. Python's outer brain should have worked the same way. This method made it very difficult to debug problems in the code or in the game. In the new technique, all these difficulties were eliminated.
Now that we have seen how easy it is to set up and train the agent, we will move on to the next section, in which we learn how to run the agent without the Python external brain and execute it directly in Unity.
Agent Launch
Python training is great, but you can't use it in a real game. Ideally, we would like to build a TensorFlow chart and use it in Unity. Fortunately, the TensorFlowSharp library was created that allows .NET to use TensorFlow graphics. This allows us to build offline TFModels models, and later inject them into the game. Unfortunately, we can only use trained models, but not train them that way, at least not yet.
Let's see how this works, using the example of the graph we just trained for the GridWorld environment; use it as an inner brain in Unity. Follow the steps in the following section to set up and use your inner brain:
- Download the TFSharp plugin from here
- From the editor menu, select Assets | Import Package | Custom Package ...
- Find the asset package you just downloaded and use the import dialogs to load the plugin into the project.
- From the menu, select Edit | Project Settings. The Settings window opens (appeared in version 2018.3)
- Find the Scripting Define Symbols characters in the Player options and change the text to ENABLE_TENSORFLOW, and also enable the Allow Unsafe Code, as shown in the screenshot:
Setting the flag ENABLE_TENSORFLOW - Find the GridWorldAcademy object in the Hierarchy window and make sure that it uses Brains | GridWorldLearning. Disable the Control option in the Brains section of the Grid Academy script.
- Find the GridWorldLearning brain in the Assets / Examples / GridWorld / Brains folder and make sure that the Model parameter in the Inspector window is set, as shown in the screenshot:
Model task for the brain - GridWorldLearning should already be set as the model. In this example, we use the TFModel that comes with the GridWorld example.
- Click Play to start the editor and see how the agent manages the cube.
We are now launching the Unity pre-trained environment. In the next section, we will learn how to use the brain that we trained in the previous section.
Trained brain loading
All Unity examples have pre-trained brains that can be used to study examples. Of course, we want to be able to load our own TF graphs into Unity and run them. To load a trained graph, follow these steps:
- Go to the ML-Agents / ml-agents / models / firstRun-0 folder. Inside this folder is the GridWorldLearning.bytes file. Drag this file into the Project / Assets / ML-Agents / Examples / GridWorld / TFModels folder inside the Unity editor:
Dragging a bytes graph into Unity - So we import the graph into the Unity project as a resource and rename it to GridWorldLearning 1. The engine does this because the default model already has the same name.
- Find GridWorldLearning in the brain folder, select it in the Inspector window and drag the new GridWorldLearning 1 model into the Model field of the Brain Parameters parameters:
Loading the brain into the Graph Model field - At this stage, we do not need to change any other parameters, but pay special attention to how the brain is configured. For now, the standard settings will do.
- Click Play in the Unity editor and see how the agent successfully moves around the game.
- The success of the agent in the game depends on the time of his training. If you allow him to complete the training, the agent will be similar to a fully trained Unity agent.