Growing artificial intelligence on the example of a simple game

Recovery Mode

In this article, I will share the experience of cultivating the simplest artificial intelligence (AI) using a genetic algorithm, and also tell you about the minimum set of commands necessary for the formation of any behavior.

The result of the work was that the AI, not knowing the rules, independently mastered the game of tic-tac-toe and found the weaknesses of the bots that played against it. But I started with an even simpler task.

Command set

It all started with the preparation of a set of commands that could have an AI. High-level languages contain hundreds of different operators. To highlight the required minimum, I decided to turn to Assembly language. However, it turned out that it also contains many commands.

I needed the AI to read and output data, work with memory, perform calculations and logical operations, make transitions and loops. I came across the Brainfuck language, which contains only 8 commands and can perform any calculations (i.e., is Turing-complete). In principle, it is suitable for genetic programming, but I went further.

I wondered: what is the minimum number of commands needed to implement any algorithm? As it turned out - one!

The URISC processor contains only one command: subtract and skip the next instruction, if the deductible was more decreasing. This is enough to build any algorithm.

Oleg Mazonka went even further, he developed the BitBitJump team and proved that it is Turing-complete. The command contains three addresses, copies one bit from the first to the second memory address and transfers control to the third address.

Borrowing Oleg's ideas, to simplify the work, I developed the SumIfJump team. The command contains four operands: A, B, C, D and does the following: adds the data from the cell to address A to the cell at address B, if the value is greater than the specified value *, then goes to address C, otherwise goes to address D.

Note

* In this case, 128 were used - half of the genome length.

When operand A accesses memory cell N0, data is entered, and when cell N1, it is output.

Below is the code SumIfJump on FreePascal (free analog Delphi).

procedureRunProg(s: TData);var
  a, b, c, d: TData;
begin
  Inc(NStep);
  if NStep > MaxStep thenbegin
    ProgResult := 'MaxStep';
    Exit;
  end;
  a := s;
  b := s + 1;
  c := s + 2;
  d := s + 3;
  a := Prog[a];
  b := Prog[b];
  c := Prog[c];
  d := Prog[d];
  if a = 0thenbegin
    ProgResult := 'Input';
    Exit;
  end;
  if a = 1thenbegin
    ProgResult := 'Output';
    Exit;
  end;
  Prog[b] := Prog[b] + Prog[a];
  if Prog[b] < ProgLength div2then
    RunProg(c)
  else
    RunProg(d);
end;

SumIfJump implements self-modifying code. It can perform any algorithms that are available in the usual programming language. The code changes easily and withstands any manipulations.

Simple task

So, our AI has only one team. While tic-tac-toe is a very difficult game for him, and I started with a simpler one.

The bot issues a random number, and the AI must read the data and provide an answer. If the number is greater than the average (from a range of random numbers), the AI should produce a number less than the average and vice versa.

The genome of our AI consists of 256 cells with values from 0 to 255. Each value is a memory, a code, and an address. The number of code execution steps is limited to 256. The operands are read one after the other.

Initially, the genome is formed by a set of random numbers, so the AI does not know what he needs to play. Moreover, he does not know that it is necessary to consistently enter and output data, responding to the bot.

Population and selection

The first population consists of 256 AI, which begin to play with the bot. If the AI performs the correct actions, for example, requested input data, and then derived something, then the AI gets points. The more correct actions, the more points.

16 AI, who scored the most points, give 15 descendants and continue to participate in the game. A descendant is a mutant. Mutation occurs by replacing the copy of the parent of one random cell with a random value.

If in the first population no AI has scored points, the next population is formed. And so on until any of the AI starts to perform the right actions and give the “right” offspring.

Evolution

N	Milestones
one	AI does nothing. The program takes 256 steps and ends.
2	Began to request data.
3	Began to request data and display something. The sequence of requests and responses is chaotic.
four	Data input and output occurs sequentially, sometimes errors occur. In half the cases, the AI gives the correct answer.
five	Regularly gives the correct answers, but sometimes errors occur.
6	Gave the correct answer in 30 thousand iterations. Selection is closed.

Between significant events passed thousands of generations. The program was launched in multiple threads on a Core i7. The calculations took about 15 minutes.

Interesting moments

When the AI "leader" made a random mistake and did not score enough points, the population began to degrade, because the offspring was formed from "minor" parents.
It so happened that in the flow with outsiders, who trampled on the spot, there was a successful mutation, providing an explosive growth of points scored. After that, this stream became the leader.
Sometimes for a long time there were no successful mutations, and even 500 thousand generations were not enough to complete the selection.

Conclusion

In conclusion, I did the same with tic-tac-toe. The size of the genome used the same as in the first case. The number of steps was increased to 1024, and the population size to 64 (for faster calculation). The calculation took a little longer. Everything happened in approximately the same scenario.

At first, the AI played against the randomizer. I called the bot, which goes randomly. Pretty soon, the AI began to beat him, filling in any line. Next, I complicated the task by adding a little mind to the randomizer: hold the line, if possible, or defend. However, in this case, the AI found the bot's weaknesses and began to beat him. Perhaps the story of this is a topic for a separate article.

The son asked to write a program so that the AIs would play among themselves, and not with the bot. There were ideas to do the same for playing checkers or go, however, for this I did not have enough time.

The only method that I used to get new individuals is mutation. You can also use crossover and inversion. Perhaps these methods will accelerate the desired result.

At the end, an idea was born: to give AI the ability to manage all the processes on a PC and fight for computer resources. Connect a PC to the Internet, and use a pool of old Bitcoin farms as computing power ...

As said, conducting a similar experiment, blogger Mikhail Tsarkov :

Maybe they will take over the world, what if?

Links

Tags: