Tree walk in multiple threads
Hello everyone, I want to share with the public a certain amount of information, which seemed to me difficult to find on the Internet.
What is a tree, see Wikipedia .
Fig. 1 Example of a tree.
So, why do you even need to traverse the tree in multiple threads? In my case, it was the need to search for files on disk. It is clear that physically the disk works in one stream, but nevertheless, multi-threaded file search can give acceleration, in the case when a single stream searches for a file in a multi-level hierarchy, and the desired file lies in an adjacent folder with one level. Also proof of the relevance of using multithreaded search is its implementation in many successful commercial products. I do not exclude that other variants of the application of the algorithm are possible, write in the comments.
To begin with, I propose to consider animation:
What is happening here? The whole essence of the algorithm is that:
In fact, in general, the algorithm looks like this (one color - one thread):
I give an example of a code that might come in handy for someone, I searched for it for a long time, but did not find it:
As you can see, multithreading in Java can be implemented quite simply through BlockingQueue, a data structure that provides access of several threads to stored data.
In general, that’s all, in short, write comments about what other methods or methods of tree traversal exist, I would like to hear the opinion of the guru on this issue. Write questions, wishes.
Link to GitHub . There is also a JavaFX search engine nearby, if someone wants to test, I will only be glad.
What is a tree, see Wikipedia .
Fig. 1 Example of a tree.
So, why do you even need to traverse the tree in multiple threads? In my case, it was the need to search for files on disk. It is clear that physically the disk works in one stream, but nevertheless, multi-threaded file search can give acceleration, in the case when a single stream searches for a file in a multi-level hierarchy, and the desired file lies in an adjacent folder with one level. Also proof of the relevance of using multithreaded search is its implementation in many successful commercial products. I do not exclude that other variants of the application of the algorithm are possible, write in the comments.
To begin with, I propose to consider animation:
What is happening here? The whole essence of the algorithm is that:
- The first stream bypasses the tree starting from the root to the entire depth along the “extreme left” path, ie always moves to the left when moving inward, or in other words always selects the last child node.
- In parallel, the first thread collects all the missing child nodes and sends them to the queue (or to the stack) where another thread takes them, the queue or stack must implement a multi-threaded approach, and the algorithm repeats, substituting the just taken node in place of the root.
In fact, in general, the algorithm looks like this (one color - one thread):
In fact, in general, the algorithm looks like this (one color - one thread):
Java software implementation
I give an example of a code that might come in handy for someone, I searched for it for a long time, but did not find it:
import java.io.File;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingDeque;
public class MultiThreadTreeWalker extends Thread {
private static BlockingQueue nodesToReview = new LinkedBlockingDeque<>();
private File f;
public MultiThreadTreeWalker(File f) {
this.f = f;
}
public MultiThreadTreeWalker() {
}
public static void main(String[] args) {
Executor ex = Executors.newFixedThreadPool(2);
MultiThreadTreeWalker mw = new MultiThreadTreeWalker(new File("C:\\"));
mw.run();
for (int i = 0;i<2;i++) {
ex.execute(new MultiThreadTreeWalker());
}
}
@Override
public void run() {
if (f != null) { //только для первого потока
reviewFileSystem(f);
} else {
try {
while (true) {
f = nodesToReview.take();
reviewFileSystem(f);
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
void reviewFileSystem(File f) {
if (f == null) {
return;
}
if (f.isFile()) {
//завершение обхода (+дополнительная логика)
System.out.println("Файл " + f.getName() + " найден потоком " + Thread.currentThread());
return;
}
File[] files = f.listFiles();
if (files.length != 0) {
for (int i = 0; i < files.length - 1; i++) {
nodesToReview.add(files[i]); //добавление файлов всех кроме последнего
}
//последний дочерний узел используется для перехода дальше
File last = files[files.length - 1];
reviewFileSystem(last);
}
}
}
Conclusion
As you can see, multithreading in Java can be implemented quite simply through BlockingQueue, a data structure that provides access of several threads to stored data.
In general, that’s all, in short, write comments about what other methods or methods of tree traversal exist, I would like to hear the opinion of the guru on this issue. Write questions, wishes.
Link to GitHub . There is also a JavaFX search engine nearby, if someone wants to test, I will only be glad.