MIT homework: writing a neural network for maneuvers in traffic



    DeepTraffic is an interesting interactive game, in which anyone can participate, and students at the Massachusetts Institute of Technology (MIT) who are studying a deep training course in unmanned vehicles are required to show a good result in this game so that they complete the task.

    Participants are invited to design an AI agent, namely, to design and train a neural network that will show itself better than competitors in a dense traffic stream. The agent is given one car (red) to control the agent. He must learn to maneuver in the stream in the most efficient way.

    According to the conditions of the game, a security system was originally built into the car, that is, it will not be able to crash or fly off the road. The player's task is only to control acceleration / braking and changing lanes. The agent will do this with maximum efficiency, but without crashing into other cars.

    Initially, a basic agent code is proposed, which can be modified directly in the game window - and immediately launched for execution, that is, for training a neural network.

    Base code
    //

    To the left of the code on the page is a real simulation of the road the agent is moving along with the current state of the neural network. There is also some basic information, such as the current speed of the car and the number of other cars that it overtook.

    When training a neural network and evaluating the result, the number of frames is measured, so that computer performance or animation speed does not affect the result.

    Different Road Overlay modes allow you to understand how a neural network works and learns. In Full Map mode, the entire road is presented in the form of grid cells, and in Learning Input mode, it is shown which cells are taken into account at the input of the neural network to decide on maneuver.



    The size of the “control zone” at the input of the neural network is determined by the following variables:

    lanesSide = 1;
    patchesAhead = 10;
    patchesBehind = 0;
    trainIterations = 10000;

    The larger the zone, the more information about the surrounding traffic the neural network receives. But by making noise in the neural network with unnecessary data, we prevent it from learning really effective maneuvers, that is, learn the right incentives. To process a larger area, you should probably increase the number of iterations during training ( trainIterations).

    Switching to Safety System mode, you can see how the basic algorithm of our car works. If the mesh cells turn red, the car is not allowed to move in that direction. In front of the cars in front, the agent slows down.

    The car is controlled by a function learnthat takes into account the current state of the agent (argument state), the reward for the previous step (lastReward, the average speed in mph) and returns one of the following values:

    var noAction = 0;
    var accelerateAction = 1;
    var decelerateAction = 2;
    var goLeftAction = 3;
    var goRightAction = 4;

    That is, do not take any action (0, keep your lane and speed), accelerate (1), slow down (2), turn left (3), turn right (4).

    Bottom of the block with the code is some service information about the state of the neural network, buttons to start learning the neural network and to start the tests.

    The result of the test race will be the average speed shown by the agent on the track (in miles / hour). You can compare your result with the results of other programmers. But it should be borne in mind that the "test run" shows only an approximate approximate speed, with a small element of randomness. During this test, the neural network is driven through ten 30-minute runs with the calculation of the average speed in each run, and then the result is calculated as the average median speed of ten speeds in these runs. If you send a neural network to the competition, then the organizers of the competition will launch their own test and determine the true speed that the unmanned vehicle shows.


    The result with 5 bands, 10 cells in front, 3 behind, 20,000 iterations, 12 neurons.

    Apparently, in addition to the basic parameters, you also need to change the number of neurons in the hidden layer in this code fragment:

    layer_defs.push({
        type: 'fc',
        num_neurons: 1,
        activation: 'relu'
    });

    So far, the maximum results in the competition have been shown by the teacher of the course, Lex Fridman (74.45 mph) and habrauser Anton Pechenko parilo (74.41 mph). Perhaps in the comments parilo will explain with what settings he did it. I wonder if he changed the code in any way, or was limited to selecting four basic parameters and the number of neurons in the hidden layer.

    Ideas for more advanced neural network optimization can be obtained from comments in the code of the neural network fragment on Github .

    Students of 6.S094: Deep Learning for Self-Driving Cars are required to show at least 65 mph in the game so that instructor Lex Friedman counts this assignment for them.

    Also popular now: