Solving the traveling salesman problem by recursive exhaustive search

We state the problem.
Given N nodes located on the plane. The input node (In) and the output node (Out) are specified. You must find the shortest path that spans all nodes, starting at the input node, ending at the output node, and passing through each node only once.

There are opinions that the salesman problem can be formulated in two other ways:
1. It is necessary to detect the shortest Hamiltonian cycle.
2. It is necessary to detect the shortest path starting at a given node.

However, both of these formulations, upon closer examination, turn out to be special cases of the original formulation.
Formulation 1 implies that any node can be an input node, and one of the closest to it can be an output node. Which requires a complete search of all the nearest nodes to an arbitrarily selected node.
Formulation 2 implies that the input node is specified, and the output node can be any. Which requires a complete enumeration of all output nodes with the subsequent selection of the shortest path from all shortest paths.
Therefore, we will focus on the initial formulation, and we will solve the problem in a general way.

It is generally recognized that the salesman problem in the general case is guaranteed to be solved optimally only by exhaustive search of all options.
In the algorithmic implementation of full enumeration, two conditions can be used that reduce the enumeration time and cut off deliberately non-optimal paths, namely:
1. When constructing the next version of the path, count its length and if this length exceeds the already found local minimum of length - skip this option and all those that it generates.
2. If the next selected edge intersects one of the previously constructed edges, skip this option and all those that it generates.

The last condition actually prohibits cycles, because if we imagine ourselves in the place of a real salesman, then the intersection of edges (despite the fact that only reuse of nodes is explicitly prohibited in the task) will actually mean the return of the salesman to the place where he was already, which contradicts the definition of the optimal (shortest) the way.

An optimal solution to the desktop traveling salesman problem is possible. However, with the increase in the number of nodes, the time spent on exhaustive search grows exponentially. Thus, on a machine with a 2.67 GHz quad-core processor, 10 nodes are calculated on average in 5 milliseconds, 20 nodes in 15 minutes, and it will take more than 6 trillion years to calculate the optimal path for 60 nodes ...

To get around this obstacle, people simply refuse to get the optimal solution for large N, and instead of the traveling salesman problem, they solve some other, very similar, similar problem. This method of substituting a task is called a “heuristic solution”, although heuristics in each case is nothing more than a randomly chosen analogy. For example, annealing is a solution to the problem by simulating the cooling of crystallized substances, a greedy algorithm is an imitation of the behavior of a greedy person, there are methods built by analogy with the behavior of ants, an inflated balloon, etc.

The analogy method is bad in that exhaustive search, as the optimal solution method, is completely discarded and completely replaced by some other method. I believe that it is impossible to act so scornfully with exhaustive search, because this is the only method that is guaranteed to give the correct solution to the problem.

We will try to solve the traveling salesman problem only by exhaustive search, replacing it with something else only where it is really necessary for reasons of time saving.

We will not look for an analogy at random, we will try to solve the traveling salesman problem as impartially as possible, without invoking extraneous analogies. Let's try to solve the general problem in a general way. And for this we’ll only call for help one analogy - the analogy with intelligence in general. How does human intelligence solve complex problems (including the traveling salesman problem)? It splits the original task into a tree of simpler sub-tasks, solves them in the reverse order, and gets the result. We will do the same!

It is quite obvious that to simplify the traveling salesman problem in order to solve it by exhaustive search in an acceptable time, there is only one way - to reduce the number of nodes being sorted. And this can also be done in only one way - to combine nodes into groups - over-nodes, over-tasks (sub-tasks).

To set the salesman problem recursively, let's try to look at it recursively.
The main element in this task is a node that includes one edge and one edge comes out:
image

Without invoking extraneous analogies, we immediately notice an analogy with the statement of the general problem itself, in which there are two “extreme” nodes: Вх , which includes the traveling salesman at the start, and Vykh , from which the salesman comes out at the finish. We can say that the entire set of N nodes in the problem is one large node into which the salesman enters and leaves along the edge:
image

Thus, a general view of the recursive algorithm emerges .
1. Set the nodes N.
2. Set the maximum allowable time for solving the problem Tmax.
3. Conduct a benchmark on this machine and determine the dependence of the solution time on the number of nodes with a complete enumeration of the T (N) options.
4. Based on the given number of nodes N, Tmax and the dependence for T, determine how many groups (over-nodes) the set of N nodes should be divided. Or set this number of nodes manually (this is exactly what we will do).
5. Break the set of N nodes into M groups.
6. Complete the search for M groups and find the shortest way to connect these groups.
7. After step 6, we know the entry and exit for each group, so we recursively pose the traveling salesman problem for each group, that is, go to step 4, where N is equal to the number of nodes in each group.
8. After exiting recursion, we obtain a suboptimal path for this machine and this Tmax (or a given number of nodes in the group).

Point 4 is needed so that we do not degrade the algorithm beyond what is necessary. If the machine can conduct a full search for a given number of nodes, then we must allow it.

In step 5, it is obviously best to prefer M as much as possible. When M tends to N, we get an increasingly accurate solution. The less we go from exhaustive search, the more accurate our decision. You can choose M from the table made up of the simplest crude assumption that at each step of the recursion we will split the sub-task into groups with the same number of nodes in the group.

image

As can be seen from the table, even with 18 nodes in one group, we can process more than 10 billion nodes with a recursion depth of up to 8.

If we want to improve the route found in this way, then we have the following options:
- choose a more efficient machine
- increase the allowable time Tmax
- increase the number of groups
- increase the nesting of recursion. That is, at step 6, divide the problem into a significantly (depending on N) larger number of groups, but it is not optimal, but suboptimal, to sort them. For example, to break a task of 1000 nodes into 500 groups, but to search not the shortest way to connect them by exhaustive search (this is too long), but to break groups into sub-groups, for example, 250 groups and search for a sub optimal path among them. And so on, until we go down to the level where the machine can do a full search.

How to group nodes into groups?
In order not to attract abstract analogies and not to use foggy heuristics, we will try to find the answer in the problem itself. It is perfectly clear that initially in the task we have one group for which Bx and Ox are given.

In order to split this task into at least two, we need to distinguish among all the nodes an intermediate node, which will become an intermediate input / output.
Since we use the Euclidean distance as a sign of optimality in the problem, in order to detect a new intermediate input / output, we need to introduce a certain distance limit from each node to the existing input and output. Since we do not know what this limit is, we will mentally take it so large that it turns out that all nodes belong only to the initial group. Now, mentally, we will gradually reduce this limit. In this case, one of the nodes that previously belonged to the original group should stand out from it. Regardless of what we take as the limit, this node will be that node that is located at the farthest distance from the initial entrance and exit. Therefore, instead of inventing a limit formula, we can immediately take the farthest node. After we found this intermediate in / out,

If we need to divide the task into an arbitrary number of groups, then we perform this partition recursively. First, we find the intermediate input / output is mutually distant with respect to the initial input and output. Then we find the second intermediate input / output mutually distant to the first three, and so on.

Obviously, with the number of groups equal to N, the intermediate inputs / outputs of these groups will coincide with the starting nodes. Thus, we found the only correct division of all nodes into groups.

You can experiment with grouping here (the source codes for Object Pascal are included).
To create a salesman task, you need to enter the number of nodes and press Enter. Or load the task from a file.
By moving the slider you can observe the principle of splitting into groups. The “Link groups” button connects the current groups along the optimal route by exhaustive search. Button “Link nodes” connects all nodes along the optimal route by exhaustive search. Therefore, you should not press these buttons with a large (> 18) number of nodes or groups.

Here you can download the program and source codes for solving the traveling salesman problem using the algorithm described above.
Controls:
The “Load” button loads a task from a file with the extension * .txt
The txt file format is as follows: in each line, 1 space indicates the x and y coordinates for each node in the task. The first line shows the coordinates of the input node. The last line contains the coordinates of the output node. The remaining nodes between them in random order. The coordinates are in the range 0-1 with an accuracy of 6 decimal places. The separator of the integer and fractional parts is a comma.

The Generate button generates a random traveling salesman task, with the number of nodes indicated in the Nodes field.

The “Save” button saves the coordinates of the nodes of this task.

The Solve button solves the traveling salesman problem, recursively breaking it into a constant number of groups indicated in the Groups field. If the specified number of groups is greater than or equal to the number of nodes, then, naturally, a complete enumeration of nodes will be performed. Therefore, it is not recommended to indicate the number of groups greater than 18.

In this program, the breakdown into groups is not optimal, therefore intersections occur. But this drawback is fundamentally removable.

To improve the algorithm, the Nodes parameter should be divided into two independent parameters:
Nmax - the maximum number of nodes for which full enumeration is performed without splitting into groups.
Gmax- the number of groups for splitting nodes, the number of which exceeded Nmax. And this number can be set in different ways. You can divide the number of nodes by a constant, you can subtract 1 from the current number of nodes, you can divide the number of nodes in the proportion of the "golden section", you can take a fixed percentage. In this regard, I am conducting research.
In this program, for simplicity and clarity, Gmax = Nmax = Groups.

The program calculates one thousand nodes at Groups = 13 in less than 5 seconds. For a complete search, the result is very good.

Also popular now: