Image resizing at 180 pieces per second

    Programmers from shared their experience of how they managed to effectively solve the problem of batch resizing photos from 1.5 MB to 3 KB (after changing the design, it turned out that the old preview windows did not fit into the new page templates). The task is not as banal as it seems. The fact is that is a major online auction, and the number of images of various products exceeds 135 million pieces.

    For the sake of a joke, they figured out how much this manual job would take in Photoshop. If you give 40 seconds to each photo, then 170 years of continuous labor come out. Then they began to consider whether it was possible to send the packet to the EC2 cloud and what time it would get. Having looked at the resulting amount, the programmers decided to look for another way.

    As a result, they managed to complete the processing of 135 million photos in just 9 days, using four 16-core servers. The average processing speed was 180 images per second.

    They used three tools.
    1. GraphicsMagick , this is a fork of ImageMagick that delivers better performance with support for multiprocessing. Flexible command line options provide the ability to fine-tune performance.

    2. Perl. It would seem, where would it be without him. But this is not a mandatory tool at all, because the guys did not use the GraphicsMagick-Perl library , and all the commands were written by hand, and they can be written in any other language.

    3. Monitoring systemGanglia was used to build graphs to visualize the process and immediately understand which link acts as a “bottleneck” and slows down the work - image search, file copying, resizing, comparison with the original, copying the results back.

    To set up GraphicsMagick, we first generated a test page with 200 images of varying degrees of compression, which were presented to the management. They chose pictures of acceptable quality. It is very important that the manual does not see on the page information about the compression parameters of each image, file sizes, etc. (even in file names you cannot hint at this) - then their solution will be completely impartial.

    After that, a comparative testing of all the filters from the GraphicsMagick kit was done to determine which one provides a slightly better performance.

    The resulting images were supposed to have a size of 170 x 135 pixels. During testing, it was found that resizing a pre-sized image provides better quality and higher speed than resizing directly a full-size original. The author of GraphicsMagick confirmed this and suggested using the thumbnail request function, which is supported by the JPEG format itself.

    After that, the program was launched for testing on real servers and it turned out that the “bottleneck” is not the CPU at all, but the file system (NFS seek time). In fact, the CPU was only 1% loaded. I had to rewrite the script to start the processes for parallel photo search - this significantly increased productivity, up to 15 images per second. But this result again cannot satisfy, because at such a speed all the work will take 104 days.

    We decided to use the 16-core Nehalem server, but it turned out that GraphicsMagick distributes each task into 16 parts, and then assembles them together. That is, he does redundant work for each small task - as a result, the resizing speed dropped to 10 images per second. I had to change the settings, and the situation was corrected.

    After that, we conducted another important test to determine the optimal ratio of threads in Perl (children) and GraphicsMagick (threads). The results are shown in the table.

    It turned out that the highest performance is achieved using two processor cores per task. Theoretically, it is 140 images per second on one server - the estimated speed of 11 days. That was what we needed.

    After that, the process was launched on four 16-core Nehalem servers. In reality, the speed was not so high - again everything was slowed down by NFS, but in total four servers stably gave out about 180 images per second.

    Also popular now: