We are looking for a fast universal library for working with graphic files, we are sorting out with Google benchmark



    Nowadays, when neural networks surf the Big Data, and artificial intelligence is wondering whether it is profitable for him to get paid for his work in Bitcoin, the task of searching for the fastest open cross-platform library for loading, saving and transcribing graphic files got to me looked like an anachronism . But in fact, this task is more urgent than ever - for all technologies of computer vision and machine learning, it is necessary to download gigabytes of pictures and sometimes save intermediate data as images. So to make it the fastest way is very desirable. In this article we will find the library we are looking for, and, most importantly, we will deal with a very useful product that greatly simplifies similar and many other tasks - Google Benchmark.

    So, the exact formulation of the problem says: the application loads, that is, jpeg and tiff files with a color depth of 24 and 8 bits are decoded into memory, as well as 32-bit bmp. Image size varies from tiny (32x32 pixels) to large, with a resolution of 15K. In the process, the files are modified, after which they need to be saved to disk in the specified formats. And this should be done by an open-source cross-platform library with maximum performance on modern Intel processors with support for AVX2 vector instructions. Also desirable is library support for DirectX DXT1 compressed texture formats. Windows Imaging Component is taken as a benchmark for performance .- standard framework for working with images in Windows, that is, you need to find a library that works on equal or faster than WIC.

    But the most important requirement is that the decision is needed right now, and better yesterday.

    Meet libraries for working with bmp, tiff, jpeg


    The solution begins with an obvious and simple, though not very fast step - a thorough study of Wikileaks github , stackoverflow and other google in search of suitable candidates for the role of the required library. Those turned out to be a bit:

    • FreeImage . Shell over known libraries LibJPEG, LibPNG, LibTIFF. DXT1 support is present through the plugin. The disadvantage is that the quality of jpeg saving in the API is set too discretely - 100, 75.50 and 25%. To change this parameter will have to understand and edit the code. The project is lively and developing - the latest version 3.18.0 was released on July 31, 2018. Building under Windows is trivial, all components are built automatically.
    • Cimg is a header C ++ file wrapper over an ancient artifact package ImageMagick . The package requires a separate assembly-installation, it is also possible to use it directly, bypassing Cimg. It has a lot of opportunities for working with images: filters, transformations, definition of morphology, etc. Supports HDR, does not support DXT1.
    • DevIL ( Developer's Image Library ). Very simple library with OpenGL style C interface. It contains a shell over LibJPEG, LibPNG, LibTIFF, but also has extensive built-in functionality, additionally supports a lot of image formats, including DXT1. For assembly uses CMake. Most dependencies, including LibJPEG, LibPNG, LibTIFF, are not included in DevIL and must be loaded and assembled separately. The latest update of the DevIL regarding the build system is dated 01.2017, and the previous one happened in general in 2014, so in case of possible problems with the library, problems with their solution are possible.
    • OpenImageIO . Positioned as a developer tool for professional software for working with images. Supports in the form of plug-ins work with numerous exotic photo formats and even video. Building for Windows requires precompiled Boost and Qt 4. There is no ready-made build for testing.
    • Boost GIL (Generic Image Library) Boost and that says it all. Although not all. This library also contains a wrapper over LibJPEG, LibPNG and LibTIFF.
    • SDL_image 2.0 Used with the SDL library and, you will laugh, but also contain a wrapper on LibJPEG, LibPNG and LibTIFF.

    All found libraries were compiled under Windows using the maximum optimization level of the Visual Studio compiler and the / arch: AVX2 key.

    The same applies to the LibJPEG, LibPNG and LibTIFF libraries, to speed up the work taken from the fresh package of the OpenCV library .

    Meet Google Benchmark


    The next step of the solution is also obvious - the creation of a benchmark for comparing the performance of found libraries, and the use of Google Benchmark, which is widely known in narrow circles for the microbench library, makes it easy and fast.
    Google Benchmark can accurately measure the performance of pieces of code that you insert into the body of a C ++ cycle.

    static void BM_foo1(benchmark::State& state) {
    //Этот кусок кода не измеряется
    Init_your_code();
    for (auto _ : state){
    //А этот - измеряется
        your_code_to_benchmark();
    }
    

    in functions registered as benchmark

    // Регистрируем функцию выше в качестве бенчмарка
    BENCHMARK(BM_foo1);
    

    And run them:

    BENCHMARK_MAIN();

    After that, issue a report in the specified format - console output, json, csv.

    The report will contain information about the execution system (processor, cache configuration), the total global operating time of each of the measured functions, as well as the time they take up the processor. These times are generally different - the first, for example, includes a delay in reading / writing, and the second for multi-threaded benchmarks is made up of the operating time of all cores.

    The last benchmark parameter displayed by Google is the number of iterations of the function that is required for a statistically correct, accurate measurement of its running time. The system selects it automatically, automatically, making preliminary measurements.

    What is the “accurate measurement” of work time? On this topic you can write a dissertation, but in this case it is enough to say that:

    • by default, the measurement goes in processor boxes, that is, the theoretical order of accuracy is exactly that. The output of the default result is in nanoseconds;
    • the results in all the tests I have seen are very stable from launch to launch;
    • According to my intelligence, Google’s benchmark is used and fully trusted by its results by the developers of the on-board computers of one of the world's largest automotive companies. So we believe.

    The only point to which you should pay attention: Google benchmark does not provide "cleaning" of the cache memory between launches of benchmark iterations. If necessary, you should take care of this yourself.

    But Google benchmark can do a lot of other things:

    • calculate the asymptotic complexity of the algorithm (O);
    • work correctly with multi-threaded benchmarks, measuring their duration not in processor ticks, but in the “real time” mode (wall clock);
    • use its own function of “manual” time measurement, which can be useful, for example, when measuring work on the GPU;
    • automatically generate benchmarks with different sets of arguments for a given body of the function being measured;
    • show mean, median and standard deviation for multiple benchmark launches;
    • Set your own counters and tags, which will be reflected in the Google benchmark report.

    Google benchmark is loaded from the repository on github , compiled for the appropriate platform using Cmake (Visual Studio is available for Windows), the resulting library is linked to your project (in the case of Windows, linking to the shlwapi library is also required), the benchmark header file is added to your code .h, after which everything works as described above.

    If it doesn’t work, then the only place besides the already mentioned site where you can get at least some information and help on Google benchmark is a specialized forum on the product .

    In our case, everything worked without problems. After talking with customers, 4 benchmarks were identified, representing loading and saving under a different name:

    • 8-bit jpeg file with a resolution of 15k
    • 24-bit jpeg file with 15k resolution
    • 24-bit tiff file with 15k resolution
    • 32-bit bmp file with 32x32 resolution

    Meet the results


    It was originally planned that all the found libraries, ie FreeImage, Cimg, DevIL, OpenImageIO, Boost GIL and SDL_image 2.0, will take part in the testing-comparison with the Windows Imaging Component (WIC). But the last three libraries, dependent on such “monsters” as Boost and SDL, were asked to leave in reserve in case of emergency, if the required library is not found among the first three. And, fortunately, she was found. Although not immediately.

    Below is a report generated by Google benchmark, which shows that:

    • FreeImage completely with a crushing score loses WIC in all tests, so it can no longer be considered.
    • Cimg loses WIC cleanly everywhere, except for loading tiff, where it is slightly (less than 5%) faster. Alas, it will also have to strike out. Moreover, this also applies to the direct use of the ImageMagick package.

    Remains library DevIL. It shows excellent results in cases of bmp and tiff downloads (3 and 2.8 times higher than WIC, respectively), black and white jpeg (1.75x better than WIC), but slows down a bit at loading regular 24-bit jpeg - it does as much as 3 % slower than WIC.
    08/15/18 11:15:44
    Running c:\WIC\WIC_test\Release\WIC_test.exe
    Run on (8 X 4008 MHz CPU s)
    CPU Caches:
    L1 Data 32K (x4)
    L1 Instruction 32K (x4)
    L2 Unified 262K (x4)
    L3 Unified 8388K (x1)
    Benchmark Time CPU Iterations
    BM_WIC8jpeg 72 ms 70 ms 11
    BM_cimg8jpeg 562 ms 52 ms 10
    BM_FreeImage8jpeg 147 ms 144 ms 5
    BM_devIL8jpeg 41 ms 41 ms 17
    BM_WIC24jpeg 266 ms 260 ms 3
    BM_cimg24jpeg 656 ms 128 ms 6
    BM_FreeImage24jpeg 594 ms 594 ms 1
    BM_devIL24jpeg 276 ms 276 ms 3
    BM_WIC24tiff 844 ms 844 ms 1
    BM_cimg24tiff 808 ms 131 ms 5
    BM_FreeImage24tiff 953 ms 938 ms 1
    BM_devIL24tiff 305 ms 305 ms 2
    BM_WIC32 3 ms 3 ms 236
    BM_cimg32 71 ms 7 ms 90
    BM_FreeImage32 6 ms 5 ms 112
    BM_devIL32 1 ms 1 ms 747
    Of course, at this stage DevIL could be rejected, but here another library appears in the frame - Libjpeg-turbo .

    Its output can be met with applause - Libjpeg-turbo is a cross-platform library that fully implements the functionality (API) of libjpeg and adds its own functionality to it (for example, working with 32-bit buffers). At the same time, for the x86 architecture, Libjpeg-turbo actively uses vector instructions (SSE2, AVX2) and, according to its creators, exceeds the speed of libjpeg by 2-6 times (!)

    Therefore, the next step is to build DevIL with Libjpeg-turbo instead of libjpeg. Libjpeg-turbo using Visual CMake is assembled without problems by Visual Studio, after which almost immediately (with the replacement of the only #define that defines the version of libjpeg in the DevIL header file) it starts working as part of DevIL.

    As a result, the Google benchmark report looks like this:
    Benchmark Time CPU Iterations
    BM_WIC8jpeg 72 ms 68 ms 9
    BM_cimg8jpeg 565 ms 39 ms 10
    BM_FreeImage8jpeg 148 ms 141 ms 5
    BM_devIL8jpeg 31 ms 31 ms 24
    BM_WIC24jpeg 269 ms 266 ms 2
    BM_cimg24jpeg 675 ms 131 ms 5
    BM_FreeImage24jpeg 604 ms 594 ms 1
    BM_devIL24jpeg 149 ms 150 ms 5
    BM_WIC24tiff 833 ms 828 ms 1
    BM_cimg24tiff 785 ms 138 ms 5
    BM_FreeImage24tiff 943 ms 938 ms 1
    BM_devIL24tiff 318 ms 320 ms 2
    BM_WIC32 4 ms 3 ms 236
    BM_cimg32 74 ms 8 ms 56
    BM_FreeImage32 6 ms 5 ms 100
    BM_devIL32 1 ms 1 ms 747
    Of course, the performance improvements with jpeg even twice as compared to libjpeg are not visible here, but it should be so - the superiority in speed only applies to jpeg encoding / decoding, and the test includes read / write file overhead.

    But it can be seen that on average DevIL is faster than WIC in the case of 8-bit jpeg 2.3 times, 24-bit jpeg 1.8 times, 24-bit tiff - 2.7 times, 32-bit bmp - 3.5 times.

    Problem solved. The decision was completely spent three summer pre-holiday working days. Of course, if there were a little more, it is possible that there would be a library with even more impressive results, and if it is much more, then perhaps I would write the library I was looking for myself.

    But even that which is is impressive. Therefore, if you are looking for a fast and easy-to-use cross-platform library for working with graphic files, then pay attention to DevIL , and if you need to quickly and accurately make comparative measurements of the code, then Google benchmark is at your service .

    Also popular now: