eao197 August 17, 2018 at 12:08

We make Shrimp even more useful: add image transcoding to other formats

Since the beginning of 2017, our small team has been developing the RESTinio OpenSource library for embedding an HTTP server in C ++ applications. To our great surprise, from time to time we get questions from the category “And why might you need an embedded HTTP server in C ++?” Unfortunately, to answer simple questions is the most difficult. Sometimes the best answer is sample code.

A couple of months ago we started a small demo project, Shrimp , which clearly demonstrates a typical scenario, under which our library is “sharpened”. The demo project is a simple Web service that receives requests for scaling images stored on the server and which returns a picture of the size the user needs.

This demo project is good in that, firstly, it requires integration with a long time ago written and working correctly code in C or C ++ (in this case, ImageMagick). Therefore, it should be clear why it makes sense to embed the HTTP server in a C ++ application.

And secondly, in this case, asynchronous request processing is required so that the HTTP server does not block while the image is being scaled (and this can take hundreds of milliseconds or even seconds). And we started the development of RESTinio precisely because we could not find a sane C ++ embedded server focused specifically on asynchronous request processing.

We built the work on Shrimp iteratively: it was first done and describedthe simplest version that only scaled pictures. Then we fixed a number of shortcomings of the first version and described this in the second article . Finally, we got around to expand the functionality of Shrimp once again: the conversion of images from one format to another was added. About how this was done and will be discussed in this article.

Target-format support

So, in the next version of Shrimp, we added the ability to give a scaled picture in a different format. So, if you issue a Shrimp request of the form:

curl "http://localhost:8080/my_picture.jpg?op=resize&max=1920"

then Shrimp will render the image in the same JPG format as the original image.

But if you add the target-format parameter to the URL, then Shrimp converts the image to the specified target format. For instance:

curl "http://localhost:8080/my_picture.jpg?op=resize&max=1920&target-format=webp"

In this case, Shrimp will render the image in webp format.

Updated Shrimp supports five image formats: jpg, png, gif, webp and heic (also known as HEIF). You can experiment with different formats on a special web page :

(on this page there is no way to select the heic format, because ordinary desktop browsers do not support this format by default).

In order to support the target-format in Shrimp, it was required to slightly modify the Shrimp code (which we ourselves were surprised, because there really were few changes). But on the other hand, I had to play with ImageMagick's assembly, which we were even more surprised with, as Earlier, we had to deal with this kitchen, by a fortunate coincidence. But let's talk about everything in order.

ImageMagick must understand different formats

ImageMagick uses external libraries for encoding / decoding images: libjpeg, libpng, libgif, etc. These libraries must be installed on the system before ImageMagick is configured and built.

The same thing should happen in order for ImageMagick to support webp and heic formats: first you need to build and install libwebp and libheif, then configure and install ImageMagick. And if everything is simple with libwebp, then around libheif I had to dance with a tambourine. Although after some time, after everything had finally gathered and worked, it was already not clear: why did you have to resort to a tambourine, everything seems to be trivial? ;)

In general, if someone wants to make friends with heic and ImageMagick, you will have to install:

It is in this order (you may have to install nasm so that x265 works at maximum speed). Then, when issuing the ./configure command, ImageMagick will be able to find everything it needs to support .heic files.

Support for target-format in the query string of incoming requests

After we made ImageMagick friends with the webp and heic formats, it's time to modify the Shrimp code. First of all, we need to learn how to recognize the target-format argument in incoming HTTP requests.

From the point of view of RESTinio, this is not a problem at all. Well, another argument appeared in the query string, so what? But from the point of view of Shrimp, the situation turned out to be somewhat more complicated, so the code of the function that was responsible for parsing the HTTP request became more complicated.

The fact is that before it was necessary to distinguish only two situations:

came a request of the form "/filename.ext" without any other parameters. So you just need to give the file "filename.ext" as it is;
A request came in of the form "/filename.ext?op=resize & ...". In this case, you need to scale the image from the file "filename.ext".

But after adding target-format, we need to distinguish between four situations:

came a request of the form "/filename.ext" without any other parameters. So you just need to give the file "filename.ext" as it is, without scaling and without transcoding to another format;
came a request of the form "/filename.ext?target-format=fmt" without any other parameters. It means to take an image from the file “filename.ext” and transcode it into the format “fmt” while preserving the original sizes;
a request came of the form "/filename.ext?op=resize & ..." but without target-format. In this case, you need to scale the image from the file “filename.ext” and give it in the original format;
A request came of the form "/filename.ext?op=resize&...&target-format=fmt". In this case, you need to perform scaling, and then transcode the result to the “fmt” format.

As a result, the function for determining the query parameters took the following form :

void
add_transform_op_handler(
   const app_params_t & app_params,
   http_req_router_t & router,
   so_5::mbox_t req_handler_mbox )
{
   router.http_get(
      R"(/:path(.*)\.:ext(.{3,4}))",
         restinio::path2regex::options_t{}.strict( true ),
         [req_handler_mbox, &app_params]( auto req, auto params )
         {
            if( has_illegal_path_components( req->header().path() ) )
            {
               // Задан недопустимый путь к файлу.
               return do_400_response( std::move( req ) );
            }
            // Разбираем параметры запроса.
            const auto qp = restinio::parse_query( req->header().query() );
            const auto target_format = qp.get_param( "target-format"sv );
            // Пытаемся определить в каком формате отдать итоговое
            // изображение. Если задан target-format, то именно этот
            // аргумент определяет формат. Если же target-format не
            // задан, то используется исходный формат, заданный
            // расширением файла с изображением.
            const auto image_format = try_detect_target_image_format(
                  params[ "ext" ],
                  target_format );
            if( !image_format )
            {
               // Не смогли разобраться с форматом. Запрос не обслуживаем.
               return do_400_response( std::move( req ) );
            }
            if( !qp.size() )
            {
               // Нет никаких дополнительных аргументов, отдаем картинку как есть.
               return serve_as_regular_file(
                     app_params.m_storage.m_root_dir,
                     std::move( req ),
                     *image_format );
            }
            const auto operation = qp.get_param( "op"sv );
            if( operation && "resize"sv != *operation )
            {
               // Задана операция над изображением, но эта операция не resize.
               return do_400_response( std::move( req ) );
            }
            if( !operation && !target_format )
            {
               // В запросе должно быть либо op=resize,
               // либо же target-format=something.
               return do_400_response( std::move( req ) );
            }
            handle_resize_op_request(
                  req_handler_mbox,
                  *image_format,
                  qp,
                  std::move( req ) );
            return restinio::request_accepted();
   } );
}

In the previous version of Shrimp, where you did not need to transcode the image, working with the request parameters looked a little easier .

Request queue and image cache tailored to target-format

The next point in the implementation of target-format support was the work on the queue of waiting requests and a cache of ready-made images in the a_transform_manager agent. We talked about these things in more detail in the previous article , but let’s slightly remind you of what it was about.

When a request for image conversion arrives, it may turn out that the finished image with such parameters is already in the cache. In this case, you do not need to do anything, just send the image from the cache in response. If the picture needs to be transformed, then it may turn out that there are no free workers at the moment and you need to wait until it appears. To do this, request information must be queued. But at the same time, it is necessary to check the uniqueness of the requests - if we have three identical requests waiting for processing (i.e., we need to convert the same image in the same way), then we should only process the image once and give the result of the processing to these three requests. Those. In the waiting queue, identical requests must be grouped.

Earlier in Shrimp, we used a simple composite key to search the image cache and the wait queue: a combination of the original file name + image resizing options . Now, two new factors had to be taken into account:

firstly, the target image format (i.e., the original image can be in jpg, and the resulting image can be in png);
secondly, the fact that scaling the picture may not be necessary. This happens in a situation where the client orders only the conversion of the image from one format to another, but with the original size of the image preserved.

I must say that here we went along the simplest path, without trying to somehow optimize anything. For example, one could try to make two caches: one would store images in the original format, but scaled to the desired size, and in the second, the scaled images converted to the target format.

Why would such double caching be needed? The fact is that when transforming pictures, the two most expensive operations in time are resizing and serializing the picture to the target format. Therefore, if we received a request to scale the example.jpg image to a size of 1920 in width and transform it into webp format, then we could store two images in our memory: example_1920px_width.jpg and example_1920px_width.webp. We would give an example_1920px_width.webp picture when we received a second request. But the example_1920px_width.jpg picture could be used when receiving requests for scaling example.jpg to a size of 1920 in width and transforming it into heic format. We could skip the resizing operation and only do the format conversion (i.e. the finished image example_1920px_width.

Another potential opportunity: when a request comes to transcode an image to another format without resizing, you can determine the actual size of the image and use this size inside the composite key. For example, let example.jpg have a size of 3000x2000 pixels. If we next receive a request for scaling example.jpg to 2000px in height, then we can immediately determine that we already have a picture in this size.

In theory, all these considerations deserve attention. But from a practical point of view, it is not clear how high the likelihood of such a development of events is. Those. how often will we receive a request for scaling example.jpg to 1920px with conversion to webp, and then a request for the same scaling of the same image, but with conversion to png? Having no real statistics is hard to say. Therefore, we decided not to complicate our lives in our demo project, but to go first along the simplest path. With the expectation that if someone needs more advanced caching schemes, then this can be added later, starting from real, not fictitious, scenarios for using Shrimp.

As a result, in the updated version of Shrimp, we slightly expanded the key, adding to it also such a parameter as the target format:

class resize_request_key_t
{
   std::string m_path;
   image_format_t m_format;
   resize_params_t m_params;
public:
   resize_request_key_t(
      std::string path,
      image_format_t format,
      resize_params_t params )
      :  m_path{ std::move(path) }
      ,  m_format{ format }
      ,  m_params{ params }
   {}
   [[nodiscard]] bool
   operator<(const resize_request_key_t & o ) const noexcept
   {
      return std::tie( m_path, m_format, m_params )
            < std::tie( o.m_path, o.m_format, o.m_params );
   }
   [[nodiscard]] const std::string &
   path() const noexcept
   {
      return m_path;
   }
   [[nodiscard]] image_format_t
   format() const noexcept
   {
      return m_format;
   }
   [[nodiscard]] resize_params_t
   params() const noexcept
   {
      return m_params;
   }
};

Those. request for resizing example.jpg up to 1920px with conversion to png differs from the same resizing, but with conversion to webp or heic.

But the main focus is hiding in the new implementation of the resize_params_t class , which determines the new sizes of the scaled image. Previously, this class supported three options: only the width was set, only the height was set, or the long side was set (the height or width is determined by the current image size). Accordingly, the resize_params_t :: value () method always returned some real value (what value was determined by the resize_params_t :: mode () method ).

But in the new Shrimp, another mode was added - keep_original, which means that scaling is not performed and the picture is rendered in its original size. To support this mode, resize_params_t had to make some changes. Firstly, now the resize_params_t :: make () method determines whether the keep_original mode is used (it is believed that this mode is used if none of the width, height and max parameters in the query string of the incoming request are specified). This allowed us not to rewrite the handle_resize_op_request () function , which pushes the request for scaling the picture to be executed.

Secondly, the resize_params_t :: value () method can now be called not always, but only when the scaling mode differs from keep_original.

But the most important thing is that resize_params_t :: operator <() continued to work as it intended.

Thanks to all these changes in the a_transform_manager, both the scaled image cache and the queue of waiting requests have remained the same. But now, information about various queries is stored in these data structures. So, the key {"example.jpg", "jpg", keep_original} will differ both from the key {"example.jpg", "png", keep_original}, and from the key {"example.jpg", "jpg", width = 1920px}.

It turned out that having spoiled a bit with the definition of such simple data structures as resize_params_t and resize_params_key_t, we avoided altering more complex structures such as the cache of resulting images and the queue of waiting requests.

Support for target-format in a_transformer

Well, the final step in supporting target-format is to expand the logic of the a_transformer agent so that the picture, possibly already scaled, is then converted to the target format.

It turned out to be the easiest to do this, all that was needed was to expand the code of the a_transform_t :: handle_resize_request () method :

[[nodiscard]]
a_transform_manager_t::resize_result_t::result_t
a_transformer_t::handle_resize_request(
   const transform::resize_request_key_t & key )
{
   try
   {
      m_logger->trace( "transformation started; request_key={}", key );
      auto image = load_image( key.path() );
      const auto resize_duration = measure_duration( [&]{
            // Реальный ресайзинг требуется только если режим
            // масштабирования отличается от keep_original.
            if( transform::resize_params_t::mode_t::keep_original !=
                  key.params().mode() )
            {
               transform::resize(
                     key.params(),
                     total_pixel_count,
                     image );
            }
         } );
      m_logger->debug( "resize finished; request_key={}, time={}ms",
            key,
            std::chrono::duration_cast(
                  resize_duration).count() );
      image.magick( magick_from_image_format( key.format() ) );
      datasizable_blob_shared_ptr_t blob;
      const auto serialize_duration = measure_duration( [&] {
               blob = make_blob( image );
            } );
      m_logger->debug( "serialization finished; request_key={}, time={}ms",
            key,
            std::chrono::duration_cast(
                  serialize_duration).count() );
      return a_transform_manager_t::successful_resize_t{
            std::move(blob),
            std::chrono::duration_cast(
                  resize_duration),
            std::chrono::duration_cast(
                  serialize_duration) };
   }
   catch( const std::exception & x )
   {
      return a_transform_manager_t::failed_resize_t{ x.what() };
   }
}

Compared with the previous version, there are two fundamental additions.

Firstly, calling the truly magic image.magick () method after resizing. This method tells ImageMagick the resulting image format. At the same time, the representation of the image in the memory does not change - ImageMagick continues to store it as it suits it. But then the value set by the magick () method will be taken into account during the subsequent call to Image :: write ().

Secondly, the updated version records the time it takes to serialize the image to the specified format. The new version of Shrimp now separately fixes the time spent on scaling, and the time spent on converting to the target format.

The rest of the agent a_transformer_t has not undergone any changes.

ImageMagick Parallelization

By default, ImageMagic is built with OpenMP support. Those. it is possible to parallelize operations on images that ImageMagick performs. You can control the number of workflows that ImageMagick uses in this case using the environment variable MAGICK_THREAD_LIMIT.

For example, on my test machine with the value MAGICK_THREAD_LIMIT = 1 (i.e., without real parallelization), I get the following results:

curl "http://localhost:8080/DSC08084.jpg?op=resize&max=2400" -v > /dev/null
> GET /DSC08084.jpg?op=resize&max=2400 HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Connection: keep-alive
< Content-Length: 2043917
< Server: Shrimp draft server
< Date: Wed, 15 Aug 2018 11:51:24 GMT
< Last-Modified: Wed, 15 Aug 2018 11:51:24 GMT
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Shrimp-Processing-Time, Shrimp-Resize-Time, Shrimp-Encoding-Time, Shrimp-Image-Src
< Content-Type: image/jpeg
< Shrimp-Image-Src: transform
< Shrimp-Processing-Time: 1323
< Shrimp-Resize-Time: 1086.72
< Shrimp-Encoding-Time: 236.276

The time spent on resizing is indicated in the Shrimp-Resize-Time header. In this case, it is 1086.72ms.

But if you set MAGICK_THREAD_LIMIT = 3 on the same machine and run Shrimp, then we get different values:

curl "http://localhost:8080/DSC08084.jpg?op=resize&max=2400" -v > /dev/null
> GET /DSC08084.jpg?op=resize&max=2400 HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Connection: keep-alive
< Content-Length: 2043917
< Server: Shrimp draft server
< Date: Wed, 15 Aug 2018 11:53:49 GMT
< Last-Modified: Wed, 15 Aug 2018 11:53:49 GMT
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Shrimp-Processing-Time, Shrimp-Resize-Time, Shrimp-Encoding-Time, Shrimp-Image-Src
< Content-Type: image/jpeg
< Shrimp-Image-Src: transform
< Shrimp-Processing-Time: 779.901
< Shrimp-Resize-Time: 558.246
< Shrimp-Encoding-Time: 221.655

Those. resizing time was reduced to 558.25ms.

Accordingly, since ImageMagick provides the ability to parallelize computations, you can use this opportunity. But at the same time, it is desirable to be able to control how many working threads Shrimp takes for itself. In previous versions of Shrimp, it was not possible to influence how many workflows Shrimp creates. And in the updated version of Shrimp, this can be done. Or through environment variables, for example:

SHRIMP_IO_THREADS=1 \
SHRIMP_WORKER_THREADS=3 \
MAGICK_THREAD_LIMIT=4 \
shrimp.app -p 8080 -i ...

Or through command line arguments, for example:

MAGICK_THREAD_LIMIT=4 \
shrimp.app -p 8080 -i ... --io-threads 1 --worker-threads 4

Values specified through the command line have a higher priority.

It should be emphasized that MAGICK_THREAD_LIMIT affects only those operations that ImageMagick performs itself. For example, resizing is done by ImageMagick. But the conversion from one format to another ImageMagick delegates to external libraries. And how operations in these external libraries are parallelized is a separate issue that we did not understand.

Conclusion

Perhaps, in this version of Shrimp, we brought our demo project to an acceptable state. Those who want to see and experiment can find the source texts of Shrimp on BitBucket or GitHub . You can also find the Dockerfile there to build Shrimp for your experiments.

In general, we have achieved our goals that we set ourselves by starting this demo project. A number of ideas appeared for the further development of both RESTinio and SObjectizer, and some of them have already found their embodiment. Therefore, whether Shrimp will develop somewhere further completely depends on questions and wishes. If there are, then Shrimp can expand. If not, then Shrimp will remain a demo project and a training ground for experimenting with new versions of RESTinio and SObjectizer.

In conclusion, I would like to express a special thanks to aensidhe for their help and advice, without which our dances with a tambourine would be much longer and sad.

Tags: