Simple PHP multithreading implementation
Multithreading in PHP is absent "out of the box", so a great many options were invented for its implementation, including the extensions pthreads , AzaThread (CThread), and even some of the PHP developers' own developments.
The main disadvantage for me was too many “bells and whistles” for these solutions - there is not always a need for the exchange of information between threads and the parent process or for saving resources. There should always be the ability to quickly and cost-effectively solve the problem.
I want to make a reservation in advance that great secrets do not open in this post - it is more likely for beginners in the language, and I decided to publish it only because I had run into a problem myself and, having not found a ready-made solution, I made a kind of multithreading emulation myself.
So, the task is to process a large amount of data that came into our script. My task was to process a JSON array of textual information, digesting which the script had to collect from it an equally large commit for PostgreSQL.
First of all, we collect the data in the parent file:
index.php
The array size fluctuated around 400mb (later it was reduced to ~ 50mb), and all the information was textual. It is not difficult to estimate the speed with which it was all digested, and given that the script was run on cron every 15 minutes, and the computing power was so-so - the performance suffered very much.
After receiving the data, you can estimate their volume and, if necessary, calculate the necessary number of threads for each CPU core, or you can simply decide that there will be 4 threads and calculate the number of rows for each thread:
index.php
It is worth mentioning right away - such a “head-on” calculation will not give an exact result by the number of elements for each stream. It is needed more likely to simplify the calculations.
And now the very essence - we create tasks for each thread and run it. We will do this “forehead” - creating a task for the second file - thread.php. It will act as a "stream", receiving a range of elements for processing and running independently of the main script:
index.php
The passthru () function is used to run console commands, but the script will wait for each of them to complete. To do this, we wrap the start command in a set of statements that will start the process and immediately return nothing, starting the process and the parent process will not stop waiting for each child to execute:
What exactly is going on here, unfortunately, I can’t say for sure - a set of parameters was suggested to me by my familiar Linuxoid. If you can decipher this magic in the comments, I will be grateful and will supplement the post.
File thread.php:
In this, quite simple, way, you can implement multithreading emulation in PHP.
If we reduce the whole example to dry output, then I think it would sound like this: the parent thread through the command line launches child processes, telling them what information to process.
When I say “emulation”, I mean that with this implementation method there is no way to exchange information between threads or between parent and child threads. It is suitable if it is known in advance that such features are not needed.
The main disadvantage for me was too many “bells and whistles” for these solutions - there is not always a need for the exchange of information between threads and the parent process or for saving resources. There should always be the ability to quickly and cost-effectively solve the problem.
I want to make a reservation in advance that great secrets do not open in this post - it is more likely for beginners in the language, and I decided to publish it only because I had run into a problem myself and, having not found a ready-made solution, I made a kind of multithreading emulation myself.
So, the task is to process a large amount of data that came into our script. My task was to process a JSON array of textual information, digesting which the script had to collect from it an equally large commit for PostgreSQL.
First of all, we collect the data in the parent file:
index.php
// bigdata.json - файл с входными данными. Это может быть что угодно - файл, таблица в СуБД и т.д.
$big_json = file_get_contents('bigdata.json');
$items = json_decode($big_json, true);
// хоть в php и есть сборщик мусора, но лучше подчистить неиспользуемые, большие, хвосты
unset($big_json);
// ...
The array size fluctuated around 400mb (later it was reduced to ~ 50mb), and all the information was textual. It is not difficult to estimate the speed with which it was all digested, and given that the script was run on cron every 15 minutes, and the computing power was so-so - the performance suffered very much.
After receiving the data, you can estimate their volume and, if necessary, calculate the necessary number of threads for each CPU core, or you can simply decide that there will be 4 threads and calculate the number of rows for each thread:
index.php
// ...
$threads = 4;
$strs_per_thread = ceil(count($items) / $threads);
// для запуска в ручном режиме - немного информации
echo "Items: ".count($items)."\n";
echo "Items per thread: ".$strs_per_thread."\n";
echo "Threads: ".$threads."\n";
// ...
It is worth mentioning right away - such a “head-on” calculation will not give an exact result by the number of elements for each stream. It is needed more likely to simplify the calculations.
And now the very essence - we create tasks for each thread and run it. We will do this “forehead” - creating a task for the second file - thread.php. It will act as a "stream", receiving a range of elements for processing and running independently of the main script:
index.php
// ...
for($i = 0; $i < $threads; $i++){
if($i == 0) {
passthru("(php -f thread.php 0 ".$strs_per_thread." & ) >> /dev/null 2>&1");
}
if($i == $threads-1) {
passthru("(php -f thread.php ".($strs_per_thread * $i)." ".count($items)." & ) >> /dev/null 2>&1");
}
if(($i !== 0)&&($i !== $threads-1)) {
$start = $strs_per_thread * $i + 1;
$end = $start -1 + $strs_per_thread;
passthru("(php -f thread.php ".$start." ".$end." & ) >> /dev/null 2>&1");
}
}
// ...
The passthru () function is used to run console commands, but the script will wait for each of them to complete. To do this, we wrap the start command in a set of statements that will start the process and immediately return nothing, starting the process and the parent process will not stop waiting for each child to execute:
# вся магия, как это часто бывает, в самом Linux-е
(php -f thread.php start stop & ) >> /dev/null 2>&1
What exactly is going on here, unfortunately, I can’t say for sure - a set of parameters was suggested to me by my familiar Linuxoid. If you can decipher this magic in the comments, I will be grateful and will supplement the post.
File thread.php:
$start = $argv[1];
$stop = $argv[2];
for ($i = $start; $i <= $stop; $i++) {
// какие-то действия с каждым элементом массива или строки из СуБД
}
In this, quite simple, way, you can implement multithreading emulation in PHP.
If we reduce the whole example to dry output, then I think it would sound like this: the parent thread through the command line launches child processes, telling them what information to process.
When I say “emulation”, I mean that with this implementation method there is no way to exchange information between threads or between parent and child threads. It is suitable if it is known in advance that such features are not needed.