Randl February 25, 2016 at 00:51

How does a computer play chess?

Recovery mode

Hikaru Nakamura, who recently challenged a computer.

A computer has long beat a man in chess, now the strongest chess players are not able to beat even an old laptop. Now chess engines are used to analyze games, search for new options and play by correspondence.

If you are interested in how the chess engines are arranged, welcome to cat.

Introduction

Once I was sure that chess programs (they are also engines, but more on that later) just keep in mind the huge number of games played and find their current position in them and make the right move. In my opinion, I read about it in some book.

This is undoubtedly a very naive opinion. A new position in chess can be obtained by the tenth move. Although there are fewer positions in chess than in go , however, after 3 moves (a move is one move of white and black, a half-move is a move of only one side) the move tree consists of almost 120 million nodes. Moreover, the size of the tree after 14 half-moves from the initial position has been considered by enthusiasts for more than a year, so far having advanced by about a third.

I also thought that chess programs, despite the long-standingwinning the match over the world champion are still within reach of the best people. This is also not true.

In a recent human-machine mini-match , Hikaru Nakamura , one of the strongest chess players in the world, played with Komodo , one of the (two) strongest chess programs in the world. The program was launched on a 24-core Xeon. Since people can no longer compete on equal terms with a computer, the grandmaster got a head start in each of the 4 games:

In the first game - a pawn and a move: the computer played black and without a pawn f7
In the second - only a pawn: the computer played white without a pawn f2
In the third - quality (the difference between a rook and a light piece is estimated at about 2 pawns): a white computer without a1 rook, a man without b8 knight and a8 rook in his place.
In the fourth - four moves: a person plays with white and instead of the first move he makes 4 any moves without crossing the middle of the board.

There were certain disputes about the handicap - for example, the absence of the f-pawn weakens the king a little, but after castling gives an open line to the rook. The absence of a central pawn probably gives a greater advantage. 4 moves give a good positional advantage, but if you play a closed debut like the Old Indian defense, then this advantage is not so difficult to nullify.

In addition, the games were played with a control of 45 "+15 ', that is, 45 minutes per game and 15 seconds of additionevery move. Usually, shorter controls give an additional advantage to the computer, while longer ones slightly increase a person's chances. Even in a split second, the computer will be able to sweep openly losing moves, while due to the exponential growth of the variant tree, each subsequent improvement in the analysis takes longer.

However, there was a handicap and the person lost in the match 2.5-1.5, having drawn the first 3 games and lost the fourth. At the same time, the weak grandmaster won quite confidentlywith a handicap of 2 pawns. Therefore, the advantage of the best programs over the best people at the moment is somewhere between 1 and 2 pawns of the handicap. Of course, this assessment is very rough, but for an accurate assessment it is necessary to play several thousand games between people and programs, and hardly anyone will do this. Please note that the ELO rating, often indicated for programs, has nothing to do with the rating of people.

What is a chess engine?

So that a person can play chess with a computer, in addition to actually searching for the best move, you need a GUI. Fortunately, a universal interface was invented (even two, Winboard and UCI , but most engines use UCI) for communication between the GUI and the chess program itself (engine). Thus, programmers can focus on the algorithm of the game of chess, without thinking about the interface. The flip side of the coin is that creating a GUI is much more boring than writing an engine, then free GUIs noticeably lose out on paid ones. Unlike engines, where free Stockfish is confidently fighting for the first line of the rating with paid Komodo.

How do they still play?

So, how does a modern chess engine work?

Board Presentation

The basis of any engine is the representation of a chessboard. First of all, it is necessary to “explain” to the computer all the rules of chess and give it the opportunity to keep the chess position. Without this, it is impossible to evaluate the position and make moves.

There are two main ways to store a representation of a board — by shapes or by cells . In the first case, we store for each piece its place on the board, in the second - on the contrary, for each cell we store what is there. Each method has its advantages and disadvantages, but at the moment all the top engines use the same representation of the board - bitboards.

Bitboards

Fortunately, there are 64 cells on the chessboard. So, if we use one bit for each cell, we can store the entire board in a 64-bit integer.
In one variable we will store all white pieces, in another - all black ones, and in 6 more - each type of figures separately (another option is 12 bitboards for each color and type of figures separately).

What is the advantage of this option?
The first is memory. As we learn later, during the analysis, the representation of the board is copied many times, and, accordingly, the RAM eats away. Bitboards are one of the most compact chessboard representations.
Secondly, speed. Many calculations, for example, the calculation of possible moves, come down to several bit operations. Due to this, for example, using the POPCNT instruction gives ~ 15% acceleration to modern engines. In addition, during the existence of bitboards, many algorithms and optimizations were invented, such as, for example, “magic” bitboards .

Search

Minimax

At the heart of most chess engines is the minimax search algorithm or its modification of non-max. In short, we go down the tree, evaluate the leaves, and then go up, each time choosing the optimal move for the current player, minimizing the score for one (black) and maximizing for the second (white). Hence the name. Once in the root, we get a sequence of moves that is optimal for both players. The difference between minimax and non-hamax is that in the first case, we take turns choosing the moves with the maximum and minimum ratings, and in the second, instead, change the sign for all ratings and always choose the maximum (we understood where they came from). More details here and here .

Alpha beta

The first optimization is alpha beta . The idea of alpha-beta is simple - if I already have a good move, then you can cut off moves that are obviously worse. Consider the example in the creepy picture on the left. Suppose player A has 2 possible moves - a3 and b3. After analyzing the course of a3, the program received a score of +1.75. Starting to evaluate move b3, the program saw that player B has two moves - a6 and a5. Assessment of the course a6 +0.5. Since player B chooses a move with a minimum score, he will not choose a move with a score higher than 0.5, which means that the estimate of move b3 is less than 0.5, and there is no sense in considering it. Thus, the remaining subtree of b3 is cut off.

For clipping, we store the upper and lower boundaries - alpha and beta. If during the analysis a move gets a score higher than beta, then the current node is cut off. If the score is higher than alpha, then alpha is updated.

Nodes in alpha beta are divided into 3 categories:

PV-Nodes - nodes whose evaluation fell into the window (between alpha and beta). The root and the leftmost node are always nodes of this type.
Cut-Nodes (or fail-high nodes ) - nodes in which beta cut-off occurred.
All-Nodes (or fail-low nodes ) - nodes in which no move exceeded the alpha according to the assessment.

Sorting moves

When using alpha beta, the order of moves becomes important. If we can put the best move first, then the remaining moves will be analyzed much faster due to beta cutoffs.

In addition to using the hash and the best move from the previous iteration, there are several techniques for sorting moves.

For captures, for example, a simple heuristic MVV-LVA (Most Valuable Victim - Least Valuable Aggressor) can be used . We sort all captures in descending order of the value of the “victim”, and inside we sort again in ascending order of the value of the “aggressor”. Obviously, it is usually more profitable to pick up the queen by pawn than vice versa.

For “silent” moves the method of “killer” moves is used - moves that caused beta cutoff. These moves are usually checked immediately after moves from the hash and captures.

Hash tables or permutation tables

Despite the huge size of the tree, many nodes in it are identical. In order not to analyze the same position twice, the computer stores the results of the analysis in a table and each time checks whether there is already a ready analysis of this position. Typically, such a table stores the actual hash of the position, rating, best move and rating age. Age is required to replace old positions when filling out the table.

Iterative search

As you know, if we cannot analyze the whole tree completely, minimax needs an evaluation function. Then having reached a certain depth, we stop the search, evaluate the position and begin to climb the tree. But such a method requires a predetermined depth and does not provide high-quality intermediate results.

Iterative search solves these problems. First, we analyze to a depth of 1, then to a depth of 2, etc. Thus, each time we go down a little deeper than the last time, until the analysis is stopped. To reduce the size of the search tree, the results of the last iteration are usually used to cut off deliberately bad moves on the current one. This method is called the “aspiration window” and is used universally.

Quiescence Search

This method is designed to combat the “horizon effect”. Just stopping the search at the right depth can be very dangerous. Imagine that we stopped in the middle of exchanging queens - the white took the black queen, and the next move the black should pick white. But at the moment on the board - White has an extra queen and a static assessment will be fundamentally wrong.

To do this, before doing a static assessment, we check all the captures (sometimes also checkers) and go down the tree to a position in which there are no possible captures and checkers. Naturally, if all captures worsen the estimate, then we return the estimate of the current position.

Selective search

The idea of a selective search is to take longer to consider “interesting” moves and less to consider uninteresting. For this, extensions are used that increase the depth of the search in certain positions, and abbreviations that reduce the depth of the search.

The depth is increased in the case of captures, checkers, if the move is unique or much better than the alternatives or in the presence of a passing pawn.

Clipping and cutting

With cuts and cuts, everything is much more interesting. They can significantly reduce the size of the tree.

Briefly about clipping:

Delta clipping - check if taking can improve current alpha. To do this, add the value of the figure taken to the node evaluation and a little more and see if the resulting value is greater than the alpha. For example, if White lacks a rook, then taking a pawn is unlikely to help them; on the other hand, taking an elephant can help.
Cutting off worthlessness is the same, only for non-captures. If the current estimate is so much less than alpha that no positional advantage can compensate for this, then such nodes are cut off. Typically used at low depths (1-2).
Historical cutoff - for each move we store how many times this move triggered the cutoff, regardless of position. Moves with a high value of this heuristic are cut off. It is usually applied starting from a certain depth and will not be applied to PV nodes. Sometimes combined with the previous method.
Multi-Cut - if out of the first M (for example, 6) nodes at least C (for example 3) are Cut-nodes, then we cut off all the nodes.
Cut-off by null-move - if after a null-move (simple transfer of the turn to the opponent) the score is still higher than beta, then we cut off the node. Simply put, if the position is so bad that even after making two moves in a row, the player still cannot improve it, then there is no point in considering this position.

Abbreviations are used when we are not so sure that the move is bad, and therefore do not cut it off, but simply reduce the depth. For example, razoring is an abbreviation provided that the static estimate of the current position is less than alpha.

Due to the high-quality sorting of moves and cutoffs, modern engines manage to achieve a branching coefficient below 2 . Due to this, unfortunately, they sometimes do not notice non-standard victims and combinations.

NegaScout and PVS

Two very similar techniques that use the fact that after we found the PV-node (assuming our moves are pretty well sorted), it most likely will not change, that is, all remaining nodes will return a lower rating than alpha. Therefore, instead of searching with a window from alpha to beta, we search with a window from alpha to alpha + 1, which allows us to speed up the search. Of course, if in some node we get beta clipping, then it must be re-appreciated, already by a normal search.

The difference between the two methods is only in the wording - they were developed at about the same time, but independently, and therefore are known under different names.

Parallel search

Parallelization of alpha beta is a separate big topic. I will go through it briefly, and who cares, check out Parallel Alpha-Beta Search on Shared Memory Multiprocessors . The difficulty is that with a parallel search, many Cut-nodes are analyzed before another thread finds a rebuttal (installs a beta), while in a sequential search, with good sorting, many of these nodes would be cut off.

Lazy SMP

A very simple algorithm. We just start all threads at the same time with the same search. Communication flows through a hash table. Lazy SMP was surprisingly effective, so much so that the top-end Stockfish switched to it with YBW. True, some believe that the improvement was due to poor implementation of YBWC and too aggressive clipping, and not because of the advantage of Lazy SMP.

Young Brothers Wait Concept (YBWC)

The first node (elder brother) should be fully analyzed, after which a parallel analysis of the remaining nodes (younger brothers) is started. The idea is the same, the first move will either significantly improve alpha, or even allow you to cut off all the other nodes.

Dynamic Tree Splitting (DTS)

Fast and complex algorithm. A little about speed: the search speed is measured through ttd (time to depth), that is, the time during which the search reaches a certain depth. This indicator can usually be used to compare the work of different versions of an engine or an engine running on a different number of cores (although Komodo, for example, increases the width of the tree with more cores available). In addition, during operation, the engine displays the search speed in nps (nodes per second). This metric is much more popular, but it does not allow even the engine to compare with itself. Lazy SMP, in which there is no synchronization, almost linearly increases nps, but due to the large amount of unnecessary work, its ttd is not so impressive. While for DTS, nps and ttd change almost the same .

To be honest, I still could not fully figure out this algorithm, which, despite its high efficiency, is used literally in a pair of engines. To whom it is very interesting, follow the link above.

Rating

So, we have reached the necessary depth, made a search for calm and, finally, we need to evaluate the static position.

The computer evaluates the position in pawns: +1.0 means that White has an advantage equal to 1 pawn, -0.5 means that Black has an advantage of half a pawn. The checkmate is estimated at 300 pawns, and the position in which the number of moves to the mat x is known is at (300-0.01x) pawns. +299.85 means that White mates in 15 moves. In this case, the program itself usually operates with whole estimates in centipes (1/100 pawns).

What parameters does the computer take into account when evaluating a position?

Material and mobility

The easiest. The queen is 9-12 pawns, the rook 5-6, the knight and bishop 2.5-4 and the pawn, respectively, one pawn. In general, a material is a worthy heuristic for evaluating a position and any positional advantage usually transforms in the end into a material one.

Mobility is considered simple - the number of possible moves in the current position. The more of them, the more mobile the player’s army.

Shape Position Tables

The horse in the corner of the board is usually bad, pawns closer to the enemy rear are becoming more valuable and so on. For each figure, a table of bonuses and penalties is compiled depending on its position on the board.

Pawn structure

Double pawns - two pawns on the same vertical. Often it is difficult to defend them with other pawns, considered a weakness.
Lagging pawns are pawns whose neighbors are in front of them. Such pawns cannot be protected by other pawns, and therefore they are considered a weakness.
Passing pawns are pawns that can reach the last horizontal line without interference from enemy pawns. Strong threat to the opponent, especially in the endgame
Isolated pawns are pawns that have no neighbors. Such pawns cannot be defended by other pawns at all, and therefore they are considered a serious weakness.

Game stages

All of the above parameters affect the game rating differently, depending on the stage of the game. In the opening there is no sense in the passed pawn, but in the endgame you need to bring the king to the center of the board, and not hide behind the pawns.

Therefore, many engines have a separate rating for the endgame and for the debut. They evaluate the stage of the game depending on the material remaining on the board and, in accordance with this, consider the assessment - the closer to the end of the game, the less influences the opening score and the more - endgame.

Other

In addition to these basic factors, the engines can add some other factors to the assessment - for example, the safety of the king, locked pieces, pawn islands, control of the center, etc.

Accurate rating or quick search?

Traditional dispute: which is more effective, accurately assess the position or achieve greater search depth. Experience has shown that overly "heavy" evaluation functions are ineffective. On the other hand, a more detailed assessment, taking into account more factors, usually leads to a more “beautiful” and “aggressive” game.

Debut books and endgame tables

Debut books

At the dawn of computer chess, programs played their debut very weakly. Debut often requires strategic decisions that will affect the entire game. On the other hand, the opening theory was well developed in people, the opening was repeatedly analyzed and played from memory. So a similar “memory" was created for computers. Starting from the initial position, a tree of moves was built and each move was evaluated. During the game, the engine simply chose one of the “good” moves with a certain probability.

Since then, debut books have grown, many debuts are analyzed using computers right up to the endgame. There is no need for them, strong engines have learned to play the debut, but they are leaving the main lines quite quickly.

Endgame tables

Back to the introduction. Remember the idea of storing many positions in memory and choosing the right one. Here she is. For a small (up to 7) number of figures, all existing positions are calculated. That is, in these positions the computer starts playing perfectly, winning in the minimum number of moves. Minus - the size and time of generation. Creating these tables helped in the study of endgames.

Table generation

We generate all possible (taking into account symmetry) positions with a certain set of shapes. Among them we find and designate all the positions where the mat is standing. By the next pass, we denote all positions in which you can get into positions with a mat - in these positions a mat is put in 1 turn. Thus, we find all positions with a mate 2,3,4, 549 moves. In all unmarked positions - a draw.

Nalimov Tables

The first endgame tables published back in 1998. For each position, the result of the game and the number of moves to the mat with an ideal game are stored. The size of all six-figure endings is 1.2 terabytes.

Lomonosov Tables

In 2012, all seven-figure endings (except 6 versus 1) were counted on the Lomonosov supercomputer at Moscow State University . These bases are available only for money and these are the only existing complete seven-figure endgame tables.

Syzygy

The de facto standard. These bases are much more compact than the Nalimov bases. They consist of two parts - WDL (Win Draw Lose) and DTZ (Distance to zeroing). WDL databases are intended for use during the search. Once the tree node is found in the table, we have the exact result of the game in this position. DTZ are intended for use in the root - they store the number of moves to zeroing the counter of the moves (move by pawn or capture). Thus, WDL bases are sufficient for analysis, and DTZ bases can be useful in analyzing endgames. Syzygy is much smaller - 68 gigabytes for six-figured WDL and 83 for DTZ. There are no seven-figure bases, since their generation requires approximately terabytes of RAM.

Using

Endgame tables are used mainly for analysis, the increase in the strength of the game engines is small - 20-30 points ELO . Nevertheless, since the search depth of modern engines can be very large, queries to endgame bases from the search tree still occur in the debut.

Other interesting

Giraffe or neural networks play chess

Some of you may have heard of a chess engine on neural networks that has reached the IM level (which, as we understood in the introduction, is not so cool for the engine). It was written and posted on Bitbucket by Matthew Lai, who unfortunately stopped working on it because he started working on Google DeepMind .

Tuning Parameters

Adding a new feature to the engine is not difficult, but how can I verify that it gave amplification? The simplest option is to play several games between the old and new versions and see who wins. But if the improvement is small, and it usually happens after all the main features have been added, there should be several thousand games, otherwise there will be no reliability.

Stockfish

There are a lot of people working on this engine, and each of their ideas needs to be checked. With the current strength of the engine, each improvement gives an increase in a couple of rating points, but in the end, a steady growth of several tens of points is obtained annually.

Their solution is typical of open source - volunteers provide their power to drive hundreds of thousands of games to them.

CLOP

A program that optimizes parameters through linear regression, using the results of engine games with itself with different parameters. Of the minuses - a very limited task size: to optimize a hundred parameters (a completely adequate number for the engine) it is not possible, at least for an adequate time.

Texel's tuning

Solves the problem of the previous method. We take a large number of positions (the author offered 9 million positions from 64,000 games, I took 8 million out of almost 200,000), for each we save the result of the game (White won 1, draw 0.5, defeated 0). Now we minimize the error, which is the sum of the squares of the difference of the result and the sigmoid of the estimate. The method is effective and popular, but it does not work on all engines.

Stockfish tuning

Another technique from the leader. We take a parameter equal to x, and compare (in several tens of thousands of lots) the engine with a parameter equal to x-sigma and x + sigma. If the engine with a large parameter won, move it a little up, otherwise - a little down, and repeat.

Engine competitions

Of all the competition tests conducted, I would like to separately distinguish TCEC . It differs from all others in its powerful hardware, careful selection of openings and long control. In the last final, 100 games were played on 2 x Intel Xeon E5-2690v3 with 256 gigabytes of RAM with 180 '+ 30 "control. Under such conditions, the number of draws was huge, and only 11 games were effective.

Conclusion

So, briefly in this long article, I talked about the structure of chess engines. Many details were not disclosed, I simply did not know about something or forgot to say. If you have any questions, write them in the comments. In addition, I will advise you of two resources that you probably noticed if you carefully opened all the links scattered throughout the article:

Chess programming wiki
Talkchess Forum
And another, not mentioned earlier - programmer corner

Only registered users can participate in the survey. Please come in.

Are you interested in the topic of chess and chess programming?

85.7% yes 516
14.2% no 86

Do you play chess?

63.9% yes 414
36% no 233

Tags: