Multithreaded computing in PHP: pthreads

    I recently tried pthreads and was pleasantly surprised - this is an extension that adds PHP the ability to work with several of the most real threads. No emulation, no magic, no fakes - all for real.



    I consider such a task. There is a pool of tasks that need to be completed quickly. There are other tools in PHP to solve this problem, they are not mentioned here, the article is about pthreads.


    It is worth noting that the author of the extension, Joe Watkins , in his articles warns that multithreading is always not easy and you need to be prepared for this.


    Who is not afraid, go further.


    What are pthreads


    Pthreads is an object-oriented API that provides a convenient way to organize multi-threaded computing in PHP. The API includes all the tools needed to create multi-threaded applications. PHP applications can create, read, write, execute, and synchronize threads using the objects of the Threads, Workers, and Threaded classes.


    What's inside pthreads


    The hierarchy of the main classes that we just mentioned is shown in the diagram.


    Threaded - the basis of pthreads, makes it possible to run code in parallel. Provides methods for synchronization and other useful methods.


    Thread A . You can create a thread by inheriting from Thread and implementing the run () method. The run () method starts to be executed, and in a separate thread, at the moment the start () method is called. This can only be triggered from the context that the thread creates. Streams can also be combined only in the same context.


    Worker Have . Persistent state, which in most cases is used by different threads. Available while the object is in scope or until shutdown () is forced to call.


    In addition to these classes, there is also the Pool class. Pool - the pool (container) of Workers can be used to distribute Threaded objects among Workers. Pool is the easiest and most efficient way to organize multiple threads.


    We will not be very sad about the theory, but immediately try all this with an example.


    Example


    You can solve different problems in several threads. It was interesting to me to solve one specific and, as it seems to me, a very typical problem. Let me remind her again. There is a pool of tasks, they need to be completed quickly.


    So let's get started. To do this, create a data provider MyDataProvider(Threaded), it will be the same and common to all threads.


    /**
     * Провайдер данных для потоков
     */
    class MyDataProvider extends Threaded
    {
        /**
         * @var int Сколько элементов в нашей воображаемой БД
         */
        private $total = 2000000;
        /**
         * @var int Сколько элементов было обработано
         */
        private $processed = 0;
        /**
         * Переходим к следующему элементу и возвращаем его
         * 
         * @return mixed
         */
        public function getNext()
        {
            if ($this->processed === $this->total) {
                return null;
            }
            $this->processed++;
            return $this->processed;
        }
    }

    For each stream we will have MyWorker(Worker), where the link to the provider will be stored.


    /**
     * MyWorker тут используется, чтобы расшарить провайдер между экземплярами MyWork.
     */
    class MyWorker extends Worker
    {
        /**
         * @var MyDataProvider
         */
        private $provider;
        /**
         * @param MyDataProvider $provider
         */
        public function __construct(MyDataProvider $provider)
        {
            $this->provider = $provider;
        }
        /**
         * Вызывается при отправке в Pool.
         */
        public function run()
        {
            // В этом примере нам тут делать ничего не надо
        }
        /**
         * Возвращает провайдера
         * 
         * @return MyDataProvider
         */
        public function getProvider()
        {
            return $this->provider;
        }
    }

    The very processing of each pool task, (let it be some kind of resource-consuming operation), our narrow neck, for the sake of which we started multithreading, will be in MyWork(Threaded).


    /**
     * MyWork это задача, которая может выполняться параллельно
     */
    class MyWork extends Threaded
    {
        public function run()
        {
            do {
                $value = null;
                $provider = $this->worker->getProvider();
                // Синхронизируем получение данных
                $provider->synchronized(function($provider) use (&$value) {
                   $value = $provider->getNext();
                }, $provider);
                if ($value === null) {
                    continue;
                }
                // Некая ресурсоемкая операция
                $count = 100;
                for ($j = 1; $j <= $count; $j++) {
                    sqrt($j+$value) + sin($value/$j) + cos($value);
                }
            }
            while ($value !== null);
        }
    }

    Please note that the data from the provider is collected at synchronized(). Otherwise, it is possible to process part of the data more than 1 time, or to skip part of the data.
    Now let's make it all work with Pool.


    require_once 'MyWorker.php';
    require_once 'MyWork.php';
    require_once 'MyDataProvider.php';
    $threads = 8;
    // Создадим провайдер. Этот сервис может например читать некие данные
    // из файла или из БД
    $provider = new MyDataProvider();
    // Создадим пул воркеров
    $pool = new Pool($threads, 'MyWorker', [$provider]);
    $start = microtime(true);
    // В нашем случае потоки сбалансированы. 
    // Поэтому тут хорошо создать столько потоков, сколько процессов в нашем пуле.
    $workers = $threads;
    for ($i = 0; $i < $workers; $i++) {
        $pool->submit(new MyWork());
    }
    $pool->shutdown();
    printf("Done for %.2f seconds" . PHP_EOL, microtime(true) - $start);

    It turns out pretty elegantly in my opinion. I put this example on github .


    That's all! Well, almost everything. In fact, there is something that can upset the inquisitive reader. All this does not work on standard PHP compiled with default options. To enjoy multi-threading, you need to have ZTS (Zend Thread Safety) enabled in your PHP.


    PHP setup


    The documentation says that PHP must be compiled with the --enable-maintainer-zts option. I did not try to compile myself, instead I found a package for Debian, which I installed for myself.


    sudo add-apt-repository ppa:ondrej/php-zts
    sudo apt update
    sudo apt-get install php7.0-zts php7.0-zts-dev

    Thus, I still have the same PHP, which is launched from the console in the usual way, using the command php. Accordingly, the web server uses it. And there was another PHP that can be run from the console through php7.0-zts.


    After that, you can install the pthreads extension.


    git clone https://github.com/krakjoe/pthreads.git
    ./configure
    make -j8
    sudo make install
    echo "extension=pthreads.so" > /etc/pthreads.ini
    sudo cp pthreads.ini /etc/php/7.0-zts/cli/conf.d/pthreads.ini

    Now it’s all. Well ... almost everything. Imagine that you wrote multi-threaded code, and PHP on your colleague’s machine is not configured accordingly? Embarrassment, isn't it? But there is a way.


    pthreads-polyfill


    Here again, thanks to Joe Watkins for the pthreads-polyfill package . The essence of the solution is this: this package contains the same classes as in the pthreads extension, they allow your code to run, even if the pthreads extension is not installed. Just the code will be executed in a single thread.
    To make this work, you simply connect this package through composer and don’t think about anything else. There it checks whether the extension is installed. If the extension is installed, then the polyfill ends here. Otherwise, plug-in classes are connected so that the code works in at least 1 thread.


    Check


    Let's now see if the processing actually occurs in several threads and evaluate the benefits of using this approach.
    I will change the value $threadsfrom the example above and see what happens.


    Information about the processor on which the tests were run


    $ lscpu
    CPU(s):                8
    Потоков на ядро:       2
    Ядер на сокет:         4
    Model name:            Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz
    

    Let's see the processor core loading diagram. Everything is in line with expectations.


    $threads = 1


    $ threads = 1


    $threads = 2


    $ threads = 2


    $threads = 4


    $ threads = 4


    $threads = 8


    $ threads = 8


    And now the most important thing is why all this is for. Compare the lead time.


    $ threadsNoteLead time, seconds
    PHP without ZTS
    1no pthreads, no polyfill265.05
    1polyfill298.26
    PHP with ZTS
    1no pthreads, no polyfill37.65
    168.58
    226.18
    316.87
    412.96
    512.57
    612.07
    711.78
    811.62

    The first two lines show that when using polyfill we lost about 13% of the performance in this example, this is relatively linear code in very simple PHP “without everything” .


    Next, PHP with ZTS. Do not pay attention to such a big difference in runtime compared to PHP without ZTS (37.65 versus 265.05 seconds), I did not try to lead to a common denominator for PHP settings. In the case without ZTS, I have XDebug enabled for example.


    As you can see, when using 2 threads, the program execution speed is about 1.5 times higher than in the case of linear code. When using 4 streams - 3 times.


    You can notice that even though the processor is 8-core, the program execution time did not change much if more than 4 threads were used. It seems that this is due to the fact that the physical core of my processor is 4. For clarity, I depicted a plate in the form of a diagram.



    Summary


    In PHP, quite elegant multithreading is possible using the pthreads extension. This gives a tangible performance boost.


    Also popular now: