The algorithm for finding the smallest common ancestor in a tree

At leisure, an interesting idea came to me, which I developed into an algorithm for finding the least common ancestor (LCA) of two vertices in a tree. Until this idea came up, I did not know other algorithms for searching for LCAs. After checking the correctness of the work, I hastened to study other algorithms to solve this problem, but I did not find any similar ones. Now I’m hurrying to share it with the community.

Introduction

A tree is an undirected connected graph of N vertices and N-1 edges. From any vertex to any other, there is exactly one simple path.
The root of the tree will be called such a vertex, from which the direction of movement along the tree is given when it is traversed.
The smallest common ancestor of the two vertices u and v will be called the vertex p that lies on the path from the root to the vertex v , and to the vertex u , and also as far as possible from it.

Input data

Information about the tree arrives at the input: N is the number of vertices, N-1 is a pair of vertices that are connected by an edge, and M is the number of requests. Then the program receives requests: two vertices u and v , for which it is required to find their least common ancestor.

Idea of ​​the algorithm

We will store for each vertex the distance to the root and the previous vertex (ancestor) from which we came to it. Next, we will climb from the vertices u and v to the root of the tree. At each step, we will choose the vertex u or v that is the most distant from the root and then consider its ancestor instead, until the paths formed from the initial u and v lead to one vertex - their least common ancestor. With different tree options, such a path can consist of N steps, which, with a large number of vertices and queries, will work too slowly. This implementation requires O (N) time to complete each request.
Now we will improve the algorithm. For each vertex, we will store the distance to the root of the dist tree , the number of children of its children , as well as the ancestor (the choice of which will be determined below) from which we arrived last and the number of the vertex to which the edge goes from the ancestor on the way to this vertex turn .
We will declare the necessary variables, for convenience this will be done in global memory.
const int SIZE = 100050;
vector v[SIZE]; // список ребер для каждой вершины
bool vis[SIZE]; // пометки о посещении вершин при обходе
int kids[SIZE]; // кол-во потомков у каждой вершины
int dist[SIZE]; // расстояние от вершины до корня
int last[SIZE]; // номер предыдущей вершины
int turn[SIZE];

Now, for each vertex, using the recursive function (call k_go (0) ), we calculate the number of descendants of it:
int k_go(int s) {
    int res = 0;
    vis[s] = true;
    for(int i = 0, maxi = v[s].size(); i < maxi; i++) {
        if(!vis[v[s][i]]) 
			res += k_go(v[s][i]) + 1;
    }
    kids[s] = res;
    return res;
}

And now we’ll finish the preparatory part of the algorithm and fill in the necessary information for each vertex. We call the function l_go (0, 0, 0, 1) :
void l_go(int s, int d, int l, int r) {
    vis[s] = true;
    dist[s] = d;
    last[s] = l;
    turn[s] = r;
    int maxval = 0;
    int idx = -1;
    for(int i = 0, maxi = v[s].size(); i < maxi; i++) {
        if(!vis[v[s][i]]) {
            int k = kids[v[s][i]];
            if(k > maxval) idx = v[s][i], maxval = k;
        }
    }
    for(int i = 0, maxi = v[s].size(); i < maxi; i++) {
        if(!vis[v[s][i]]) {
            if(idx == v[s][i])
                l_go(v[s][i], d + 1, l, r);
            else
                l_go(v[s][i], d + 1, s, v[s][i]);
        }
    }
}


Now I’ll explain how the last part of the code works, since finding the number of descendants at the vertex is a fairly typical task.
For each vertex we look at its descendants and from them we select a vertex with the maximum number of descendants. For her, all information about the ancestor will be inherited from the current one, and only the distance to the root will change. For all other descendants of this vertex, the current vertex will now be the ancestor of last , and the descendant itself will turn turn .
Now I claim that if we take some vertex a in the tree and until we get to the root from it, the number of transitions to the last ancestor will not exceed the binary logarithm of the number of vertices in the tree.
Proof: Suppose we are at some vertex v that has one descendant. Then it is obvious that the descendant exiting from it does not change the last link to the ancestor. In the case of two or more vertices, we get one descendant (which of all descendants of the current vertex has the largest number of descendants) whose link is inherited from the current and all the others, from whom it is updated. We denote the total number of descendants of the current node N . Then from the descendant of this vertex, in which we update the link to the ancestor, there will be no more than N / 2 descendants (otherwise this number will be the maximum, but then the update of the link is not required). Thus, on each of the vertices, which is the ancestor of lastfor someone, no more than half the peaks remain, from all descendants coming from it. Total length of this path is less than the binary logarithm of N .

Now let's move on to the main function of the algorithm and explain why it works.
So, we have two vertices u and v , for which we need to find out the answer to the request. We call the function p = lca (u, v) :
int lca(int a, int b) {
	if(turn[a] == turn[b]) {
		if(dist[a] < dist[b]) return a;
		else return b;
    }
    if(dist[last[a]] > dist[last[b]]) return lca(last[a], b);
    return lca(last[b], a);
}


The function recursively rises in the root direction. For each subsequent vertices a and b, we first check the vertex at which the last rotation was made. If it coincides, this means that either vertex a lies on the path to vertex b from the root, or vice versa. Depending on which vertex is closer to the root, that is the desired smallest common ancestor.
Otherwise, you need a vertex a or bbring to the root. We will approximate on the basis of which one in the case of a transition to its ancestor will be further from the root (one vertex will constantly try to catch up with the second peak on the way to the root). Thus, sooner or later, the vertices will come either to one vertex, which will immediately be their smallest common ancestor, or one of the vertices will lie on the path from the root to another vertex, which is described in the first step.

Evaluation of the algorithm

To implement the algorithm, O (N) memory is required ( 6N total memory is consumed ), O (N) preliminary calculation, and O (M * log (N)) time to answer all requests.
The algorithm allows you to respond to requests on a predefined tree and respond to requests as they arrive for O (log (N)) for each.

The effectiveness of other algorithms for solving the LCA search problem

There are several algorithms for solving this problem, each of which is difficult to write, time for preliminary calculation, response to a request and the size of the required memory.
1) Response to the request for O (log N), preliminary calculation for O (N). The complexity of the implementation is the need to use the data structure "tree of segments" or "sqrt-decomposition".
2) The binary lift method . Response to the request for O (log N), preliminary calculation for O (log N), used memory (N * log (N)). A fairly simple implementation with a slightly better runtime than my algorithm, but due to the additional memory used.
3) Farah-Colton and Bender Algorithm. This is the most optimal algorithm, it allows you to respond to a request for O (1), but also requires a lot of additional memory and is difficult to implement.
As well as an algorithm that allows you to respond to a request for O (1), but for this you need to know the information about all requests in advance.
4) Tarjan algorithm

Conclusion

Thus, the presented algorithm in the implementation complexity is slightly inferior to the slightly faster binary lift algorithm, but it requires less memory and time for preliminary calculation. I would like to hear the opinion of those who are familiar with the algorithms for solving this problem in order to assess the appropriateness of its use and uniqueness. Thanks to everyone who took the time to understand this article.

UPD: Knowledgeable people told me that my algorithm is just one of the applications of Heavy-Light decomposition. Before that, I was not familiar with this decomposition, but in any case, it’s nice to come up with something myself, even if it’s existing.

Also popular now: