
Likbez: image resizing methods
Why the image scaled with bicubic interpolation does not look like in Photoshop. Why does one program resize quickly and the other does not, although the result is the same. Which resize method is best for increasing, and which for decreasing. What filters do and how they differ.
In general, this was an introduction to another article, but it dragged on and resulted in separate material.

This man is sitting among the daisies to draw your attention to the article.
For a clear comparison, I will use images of the same resolution of 1920 × 1280 ( one , the second), which will lead to sizes 330 × 220, 1067 × 667 and 4800 × 3200. Under the illustrations, it will be written how many milliseconds resize took in one or another resolution. The numbers are given only to understand the complexity of the algorithm, so the specific hardware or software on which they are received is not so important.
This is the most primitive and fastest method. For each pixel of the final image, one pixel of the original one is selected, which is closest to its position taking into account scaling. This method produces a pixelated image when zoomed in and a very grainy image when zoomed out.

In general, the quality and performance of any reduction method can be estimated by the ratio of the number of pixels involved in the formation of the final image to the number of pixels in the original image. The larger this ratio, the more likely the algorithm is better and slower. A ratio of one means that at least every pixel in the original image has contributed to the final. But for advanced methods, it can be more than one. So, if for example we reduce the image by the nearest neighbor method 3 times on each side, then this ratio is 1/9. Those. most of the original pixels are not taken into account.






Theoretical speed depends only on the size of the final image. In practice, when decreasing, the processor cache misses contribute: the smaller the scale, the less data is used from each line loaded into the cache.
The method is deliberately used to reduce extremely rarely, because gives a very poor quality, although it can be useful when increasing. Due to the speed and ease of implementation, it is available in all libraries and applications that work with graphics.
Affine transformations are a common method for distorting images. They allow you to rotate, stretch and reflect the image in one operation. Therefore, in many applications and libraries that implement the method of affine transformations, the function of changing images is just a wrapper that calculates the coefficients for the conversion.
The principle of operation is that for each point of the final image, a fixed set of points of the source is taken and interpolated in accordance with their relative position and the selected filter. The number of points also depends on the filter. For bilinear interpolation, 2x2 source pixels are taken, for bicubic 4x4. This method gives a smooth image when zoomed in, but when zoomed out, the result is very similar to the nearest neighbor. See for yourself: theoretically, with a bicubic filter and a decrease of 3 times, the ratio of processed pixels to the source is 4² / 3² = 1.78. In practice, the result is much worse. in existing implementations, the filter window and the interpolation function are not scaled in accordance with the image scale, and the pixels closer to the edge of the window are taken with negative coefficients (in accordance with the function), i.e. do not make a useful contribution to the final image. As a result, the image reduced with the bicubic filter differs from the image reduced with the bilinear only in that it is even clearer. Well, for a bilinear filter and a threefold reduction, the ratio of the processed pixels to the original ones is 2² / 3² = 0.44, which basically does not differ from the nearest neighbor. In fact, affine transformations cannot be used to reduce by more than 2 times. And even when reduced to two times, they give noticeable stairs for the lines. which basically does not differ from the nearest neighbor. In fact, affine transformations cannot be used to reduce by more than 2 times. And even when reduced to two times, they give noticeable stairs for the lines. which basically does not differ from the nearest neighbor. In fact, affine transformations cannot be used to reduce by more than 2 times. And even when reduced to two times, they give noticeable stairs for the lines.
Theoretically, there should be implementations of affine transformations that scale the filter window and the filter itself in accordance with the specified distortions, but I have not seen such in popular open source libraries.






The operating time is noticeably longer than that of the nearest neighbor, and depends on the size of the final image and the window size of the selected filter. From cache misses is almost independent, because at least two of the original pixels are used.
My humble opinion is that using this method to arbitrarily reduce images is simply a bug, because the result is very poor and similar to the nearest neighbor, and resources for this method need much more. However, this method is widely used in programs and libraries. The most amazing thing is that this method is used in all browsers for the canvas method
In addition, it is this method that video cards use to render three-dimensional scenes. But the difference is that video cards for each texture prepare a set of reduced versions (mip levels) in advance, and for the final rendering a level is selected with such a resolution that the texture reduction is no more than two times. In addition, to eliminate a sharp jump when changing the mip level (when a textured object approaches or moves away), linear interpolation between adjacent mip levels is used (this is already trilinear filtering). Thus, to draw each pixel of a three-dimensional object, it is necessary to interpolate between 2³ pixels. This gives an acceptable result for a fast moving picture in a time linear with respect to the final resolution.
Using this method, the very mip levels are created, using it (if greatly simplified), full-screen anti-aliasing works in games. Its essence is to divide the source image into a grid of pixels of the final and add up all the source pixels per pixel of the final in accordance with the area that fell under the final pixel. When using this method to enlarge, for every pixel of the final image there is exactly one pixel of the original. Therefore, the result for the increase is equal to the nearest neighbor.


Two subspecies of this method can be distinguished: with rounded pixel borders to the nearest integer number of pixels and without. In the first case, the algorithm becomes unsuitable for scaling less than 3 times, because one final pixel can have one source pixel, and the next pixel four (2x2) pixels, which leads to a disproportion at the local level. At the same time, the rounding algorithm can obviously be used in cases where the size of the original image is a multiple of the size of the final one, or the reduction scale is quite small (versions with a resolution of 330 × 220 are almost the same). The ratio of processed pixels to the original ones when rounding off the borders is always equal to one.






A subset without rounding gives excellent quality when reduced at any scale, and when enlarged it gives a strange effect when most of the original pixel in the final image looks uniform, but a transition is visible at the edges. The ratio of processed pixels to the source without rounding the borders can be from one to four, because each source pixel contributes to either one final, or two neighboring, or four neighboring pixels.






The performance of this method to reduce is lower than that of affine transformations, because all pixels of the source are involved in the calculation of the final image. The version with rounding to the nearest borders is usually several times faster. It is also possible to create separate versions for scaling in a fixed number of times (for example, reducing by 2 times), which will be even faster.
This method is used in the function of
This method is similar to affine transformations in that filters are used, but it does not have a fixed window, but a window proportional to scale. For example, if the size of the filter window is 6, and the image size is reduced by 2.5 times, then (2.5 * 6) ² = 225 pixels participate in the formation of each pixel of the final image, which is much larger than in the case of supersampling (from 9 to 16). Fortunately, convolutions can be counted in 2 passes, first one way, then the other, so the algorithmic complexity of calculating each pixel is not 225, but only (2.5 * 6) * 2 = 30. The contribution of each source pixel to the final one is times determined by the filter. The ratio of processed pixels to the source is entirely determined by the size of the filter window and is equal to its square. Those. for a bilinear filter, this ratio will be 4, for a bicubic 16, for Lanczos 36.






The speed of this method depends on all parameters: the size of the source image, the size of the final image, the size of the filter window.
This method is implemented in ImageMagick, GIMP, in the current version of Pillow with the ANTIALIAS flag.
One of the advantages of this method is that filters can be set by a separate function that is not tied to the implementation of the method. At the same time, the function of the filter itself can be quite complex without much loss of performance, because the coefficients for all pixels in one column and for all pixels in one row are counted only once. Those. the filter function itself is called only (m + n) * w times, where m and n are the dimensions of the final image, and w is the size of the filter window. And you can rivet many of these functions, it would be a mathematical justification. In ImageMagick, for example, there are 15. Here's what the most popular look like:
Bilinear filter (

Bicubic filter (

Lanczos filter (

It is noteworthy that some filters have zones of negative coefficients (such as a bicubic filter or a Lanczos filter). This is necessary to give the transitions on the final image the sharpness that was on the original one.
In general, this was an introduction to another article, but it dragged on and resulted in separate material.

This man is sitting among the daisies to draw your attention to the article.
For a clear comparison, I will use images of the same resolution of 1920 × 1280 ( one , the second), which will lead to sizes 330 × 220, 1067 × 667 and 4800 × 3200. Under the illustrations, it will be written how many milliseconds resize took in one or another resolution. The numbers are given only to understand the complexity of the algorithm, so the specific hardware or software on which they are received is not so important.
Nearest neighbor
This is the most primitive and fastest method. For each pixel of the final image, one pixel of the original one is selected, which is closest to its position taking into account scaling. This method produces a pixelated image when zoomed in and a very grainy image when zoomed out.

In general, the quality and performance of any reduction method can be estimated by the ratio of the number of pixels involved in the formation of the final image to the number of pixels in the original image. The larger this ratio, the more likely the algorithm is better and slower. A ratio of one means that at least every pixel in the original image has contributed to the final. But for advanced methods, it can be more than one. So, if for example we reduce the image by the nearest neighbor method 3 times on each side, then this ratio is 1/9. Those. most of the original pixels are not taken into account.






1920×1280 → 330×220 = 0,12 ms
1920×1280 → 1067×667 = 1,86 ms
1920×1280 → 4800×3200 = 22,5 ms
Theoretical speed depends only on the size of the final image. In practice, when decreasing, the processor cache misses contribute: the smaller the scale, the less data is used from each line loaded into the cache.
The method is deliberately used to reduce extremely rarely, because gives a very poor quality, although it can be useful when increasing. Due to the speed and ease of implementation, it is available in all libraries and applications that work with graphics.
Affine transformations
Affine transformations are a common method for distorting images. They allow you to rotate, stretch and reflect the image in one operation. Therefore, in many applications and libraries that implement the method of affine transformations, the function of changing images is just a wrapper that calculates the coefficients for the conversion.
The principle of operation is that for each point of the final image, a fixed set of points of the source is taken and interpolated in accordance with their relative position and the selected filter. The number of points also depends on the filter. For bilinear interpolation, 2x2 source pixels are taken, for bicubic 4x4. This method gives a smooth image when zoomed in, but when zoomed out, the result is very similar to the nearest neighbor. See for yourself: theoretically, with a bicubic filter and a decrease of 3 times, the ratio of processed pixels to the source is 4² / 3² = 1.78. In practice, the result is much worse. in existing implementations, the filter window and the interpolation function are not scaled in accordance with the image scale, and the pixels closer to the edge of the window are taken with negative coefficients (in accordance with the function), i.e. do not make a useful contribution to the final image. As a result, the image reduced with the bicubic filter differs from the image reduced with the bilinear only in that it is even clearer. Well, for a bilinear filter and a threefold reduction, the ratio of the processed pixels to the original ones is 2² / 3² = 0.44, which basically does not differ from the nearest neighbor. In fact, affine transformations cannot be used to reduce by more than 2 times. And even when reduced to two times, they give noticeable stairs for the lines. which basically does not differ from the nearest neighbor. In fact, affine transformations cannot be used to reduce by more than 2 times. And even when reduced to two times, they give noticeable stairs for the lines. which basically does not differ from the nearest neighbor. In fact, affine transformations cannot be used to reduce by more than 2 times. And even when reduced to two times, they give noticeable stairs for the lines.
Theoretically, there should be implementations of affine transformations that scale the filter window and the filter itself in accordance with the specified distortions, but I have not seen such in popular open source libraries.






1920×1280 → 330×220 = 6.13 ms
1920×1280 → 1067×667 = 17.7 ms
1920×1280 → 4800×3200 = 869 ms
The operating time is noticeably longer than that of the nearest neighbor, and depends on the size of the final image and the window size of the selected filter. From cache misses is almost independent, because at least two of the original pixels are used.
My humble opinion is that using this method to arbitrarily reduce images is simply a bug, because the result is very poor and similar to the nearest neighbor, and resources for this method need much more. However, this method is widely used in programs and libraries. The most amazing thing is that this method is used in all browsers for the canvas method
drawImage()
(a good example ), although more simple methods are used for simple display of images in an element (except IE, it uses affine transformations for both cases). In addition, this method is used in OpenCV, the current version of the Pillow Python library (I hope to write about this separately), in Paint.NET.In addition, it is this method that video cards use to render three-dimensional scenes. But the difference is that video cards for each texture prepare a set of reduced versions (mip levels) in advance, and for the final rendering a level is selected with such a resolution that the texture reduction is no more than two times. In addition, to eliminate a sharp jump when changing the mip level (when a textured object approaches or moves away), linear interpolation between adjacent mip levels is used (this is already trilinear filtering). Thus, to draw each pixel of a three-dimensional object, it is necessary to interpolate between 2³ pixels. This gives an acceptable result for a fast moving picture in a time linear with respect to the final resolution.
Supersampling
Using this method, the very mip levels are created, using it (if greatly simplified), full-screen anti-aliasing works in games. Its essence is to divide the source image into a grid of pixels of the final and add up all the source pixels per pixel of the final in accordance with the area that fell under the final pixel. When using this method to enlarge, for every pixel of the final image there is exactly one pixel of the original. Therefore, the result for the increase is equal to the nearest neighbor.


Two subspecies of this method can be distinguished: with rounded pixel borders to the nearest integer number of pixels and without. In the first case, the algorithm becomes unsuitable for scaling less than 3 times, because one final pixel can have one source pixel, and the next pixel four (2x2) pixels, which leads to a disproportion at the local level. At the same time, the rounding algorithm can obviously be used in cases where the size of the original image is a multiple of the size of the final one, or the reduction scale is quite small (versions with a resolution of 330 × 220 are almost the same). The ratio of processed pixels to the original ones when rounding off the borders is always equal to one.






1920×1280 → 330×220 = 7 ms
1920×1280 → 1067×667 = 15 ms
1920×1280 → 4800×3200 = 22,5 ms
A subset without rounding gives excellent quality when reduced at any scale, and when enlarged it gives a strange effect when most of the original pixel in the final image looks uniform, but a transition is visible at the edges. The ratio of processed pixels to the source without rounding the borders can be from one to four, because each source pixel contributes to either one final, or two neighboring, or four neighboring pixels.






1920×1280 → 330×220 = 19 ms
1920×1280 → 1067×667 = 45 ms
1920×1280 → 4800×3200 = 112 ms
The performance of this method to reduce is lower than that of affine transformations, because all pixels of the source are involved in the calculation of the final image. The version with rounding to the nearest borders is usually several times faster. It is also possible to create separate versions for scaling in a fixed number of times (for example, reducing by 2 times), which will be even faster.
This method is used in the function of
gdImageCopyResampled()
the GD library, which is part of PHP, is in OpenCV (flag INTER_AREA), Intel IPP, AMD Framewave. Libjpeg works in approximately the same way when it opens images in a several-fold reduced form. The latter allows many applications to open JPEG images pre-reduced several times without much overhead (in practice, libjpeg opens thumbnails even a little faster than full-size), and then use other methods to resize to exact sizes. For example, if you want to resize JPEGs with a resolution of 1920 × 1280 to a resolution of 330 × 220, you can open the original image in a resolution of 480 × 320, and then reduce it to the desired 330 × 220.Convolution
This method is similar to affine transformations in that filters are used, but it does not have a fixed window, but a window proportional to scale. For example, if the size of the filter window is 6, and the image size is reduced by 2.5 times, then (2.5 * 6) ² = 225 pixels participate in the formation of each pixel of the final image, which is much larger than in the case of supersampling (from 9 to 16). Fortunately, convolutions can be counted in 2 passes, first one way, then the other, so the algorithmic complexity of calculating each pixel is not 225, but only (2.5 * 6) * 2 = 30. The contribution of each source pixel to the final one is times determined by the filter. The ratio of processed pixels to the source is entirely determined by the size of the filter window and is equal to its square. Those. for a bilinear filter, this ratio will be 4, for a bicubic 16, for Lanczos 36.






1920×1280 → 330×220 = 76 ms
1920×1280 → 1067×667 = 160 ms
1920×1280 → 4800×3200 = 1540 ms
The speed of this method depends on all parameters: the size of the source image, the size of the final image, the size of the filter window.
This method is implemented in ImageMagick, GIMP, in the current version of Pillow with the ANTIALIAS flag.
One of the advantages of this method is that filters can be set by a separate function that is not tied to the implementation of the method. At the same time, the function of the filter itself can be quite complex without much loss of performance, because the coefficients for all pixels in one column and for all pixels in one row are counted only once. Those. the filter function itself is called only (m + n) * w times, where m and n are the dimensions of the final image, and w is the size of the filter window. And you can rivet many of these functions, it would be a mathematical justification. In ImageMagick, for example, there are 15. Here's what the most popular look like:
Bilinear filter (
bilinear
or triangle
in ImageMagick) 
Bicubic filter (
bicubic
, catrom
in ImageMagick) 
Lanczos filter (
Lanczos
)
It is noteworthy that some filters have zones of negative coefficients (such as a bicubic filter or a Lanczos filter). This is necessary to give the transitions on the final image the sharpness that was on the original one.