Reinforced deep neural network machine learning on tensorflow.js: Tricks

  • Tutorial
To train deep neural networks from scratch is not an easy task.

It takes a lot of data and time to learn, but some tricks can help speed up the process, which I will talk about under the cut.

Demonstration of the passage of a simple maze using tricks. Network training duration: 1 hour 06 minutes. Recording accelerated by 8 times.

For each task, you need to develop your own set of tricks to speed up network learning. I will share a few tricks that helped me train the network much faster.

For theoretical knowledge, I recommend switching to the sim0nsays channel .
And I will tell about my modest successes in training neural networks.

Formulation of the problem

To approximate the convergence function by minimizing the quadratic loss function by the backward propagation of error by deep neural networks.

I had a strategy choice on how to train a neural network.
Encourage for the successful completion of the task or encourage as you approach the completion of the task.

I chose the second method, for two reasons:

  • The probability that the network will ever reach the finish line on its own is very small, so it will be doomed to receive a lot of negative reinforcement. This will reset the weights of all neurons and the network will not be capable of further training.
  • Deep neural networks are powerful. I do not exclude that the first method would have been successful if I had huge computing power and a lot of time for training. I took the path of least cost by developing tricks.

Neural network architecture

Architecture is being developed experimentally, based on the experience of the architect and good luck.

Architecture for solving the problem:

  • 3 input neurons - the coordinates of the agent and the value of the cell passed (we normalize in the range from 0 to 1).
  • 2 hidden layers of 256 and 128 neurons (we reduce the dimension of the layers towards the network output).
  • 1 layer dropping random neurons for sustainability learning network.
  • 4 output neurons - the probability of deciding which side to choose for the next step.
  • Neuron activation function: sigmoid. Optimizer: adam.

sigmoid gives 4 probabilities at the output in the range from 0 to 1, choosing the maximum one, we get the side for the next step: [jumpTop, jumpRight, jumpBottom, jumpLeft].

Architecture development

Retraining occurs when using overly complex models.

This is when the network remembered the training data and for new data that the network has not yet seen, it will work poorly because the network did not need to look for generalizations, since it had enough memory to memorize.

Lack of education - with insufficiently complex models. This is when the network had little training data to find generalizations.

Conclusion: the more layers and neurons in them, the more data is needed for training.

Playing field

Rules of the game

0 - Entering this cell, the agent is destroyed.
1..44 - Cells whose values ​​increase with each step.
The further the agent goes, the more reward he will receive.
45 - Finish. At the same time, training does not occur, it is only when all agents are destroyed, and the finish line is an exception that simply uses the already trained network for the next forecasting from the very beginning of the maze.

Description of parameters

The agent has a "antennae" in four directions from it - they play the role of environmental intelligence and are a description for the coordinates of the agent and the value of the cell on which it stands.

The description plays the role of predicting the next direction for the movement of the agent. That is, the agent scans in advance what is next and, accordingly, over time, the network learns to move in the direction of increasing the cell value and not go beyond the limits of permissible movement.

The purpose of the neural network: to get more rewards.
Learning purpose: to encourage for correct actions, the closer the agent is to solving the task, the higher the reward for the neural network.


The first attempts to learn without tricks took several hours of training and the result was far from complete. Applying certain techniques, the result was achieved in just one hour and six minutes!

Agent looping

During the training, the network began to make decisions, make moves back and forth - the problem of “use”. Both moves give the network a positive reward, which stopped the process of exploring the maze and did not allow to get out of the local minimum.

The first attempt at a solution was to limit the number of moves of the agent, but this was not optimal, since the agent spent a lot of time in a loop before self-destructing. The best solution was to destroy the agent if he went to the cell with a lower value than the one on which he stood - the ban to go in the opposite direction.

Research or use

A simple trick was used to explore the paths around the current position of the agent: at each step, 5 agents will be “voluntary” researchers. The course of these agents will be chosen randomly, and not by the forecast of the neural network.

Thus, we have an increased likelihood that one of the five agents will advance further than the others and will help in training the network with better results.

Genetic algorithm

Each era, 500 agents participate on the playing field. Predictions for all agents are performed in asynchronous mode for all agents at once, moreover, the calculations are delegated to gpu. Thus, we get a more efficient use of computing power of the computer, which leads to a reduction in the time to predict a neural network for 500 agents at the same time.

Prediction works faster than training, so the network has more chances to move further through the maze with the least amount of time and the best result.

Learning the best in the generation

Throughout the era, for 500 agents, the results of their advancement through the maze are preserved. When the last agent is destroyed, the 5 best agents out of 500 are selected - who reached the maze the farthest.

Based on the best results of the era, a neural network will be trained.

Thus, we will reduce the amount of memory used by not saving and not training the network on agents that do not advance the network.


Not being a specialist in this field, I managed to achieve some success in training the neural network, and you will succeed - go for it!

Strive to learn faster than computers, while we do better.


Repository with code
Run training in a browser.
Tensorflow.js documentation , where you can also find additional resources for learning.


  • Deep learning. Immersion in the world of neural networks
    S. Nikolenko, A. Kadurin, E. Arkhangelskaya
  • Machine Learning and TensorFlow
    N. Shackle
  • Self-learning systems
    S. I. Nikolenko, A. L. Tulupyev
  • Reinforced Learning
    R. S. Sutton, E. G. Barto
  • Self-organizing cards
    T. Kohonen

Thanks for attention!

Also popular now: