Digital image stabilization from stationary cameras - correlation approach
I decided to write this article after reading the article “ Massively-parallel image stabilization ”, which describes an algorithm for image stabilization from PTZ cameras. The fact is that at one time I implemented an algorithm for stabilizing images from stationary cameras, which is used in the MagicBox IP video server and some other products of the Sinesis companywhich I work for now. The algorithm turned out to be quite successful in terms of its speed characteristics. In particular, it very effectively implements an algorithm for finding the displacement of the current image relative to the background. This efficiency made it possible to use its main elements (of course with some modifications) for tracking objects, as well as for checking their immobility.
The stabilization algorithm includes the following basic elements: detecting an offset for the current frame, compensating for this offset, and periodically updating the background against which stabilization occurs. Below I will detail each of them.
Fig. 1 Image stabilization is sometimes very useful.
Current frame offset detection
The basic approach on which the correlation approach to determine the offset is based can be briefly described as follows:
1) The central part of the background image is taken. The indent value is determined by the maximum possible offset that we want to determine. The central part should not be too small, otherwise the correlation function (see below) will not have enough data for stable operation.
2) At the current frame, a part of the same size is selected, but shifted relative to the center of the picture.
3) For each offset, a certain metric is calculated that describes the correlation of the central part of the background and the current image. For this, for example, the sum of the squares of the difference for each point of these two images or, for example, the sum of the absolute difference for each point can be used.
4) The bias for which the correlation is maximum (less than the sum of quadratic differences or the sum of absolute differences) will be the desired bias.
Fig. 2 Offset of the current frame relative to the background.
Naturally, if this approach is applied head-on, then the speed of the algorithm will be catastrophically low, even though the speed of the correlation functions can be very high. This is not surprising, since we will need to sort through all the options for possible displacement of images relative to each other (the complexity of the algorithm can be estimated as O (n ^ 2), where n is the number of image points).
The first optimization is the use of not exhaustive search of all possible options, but the use of the gradient descent method: at the beginning, the correlation is calculated in the 3x3 region for zero bias, then the bias with the maximum correlation is selected and the process is repeated until a local maximum is detected. This method is much faster, but in the worst case of large offsets it will have complexity O (n ^ 1.5), which is also not acceptable.
Fig. 3 Search for the maximum of the correlation function. Gradient descent.
The way out of this situation is the use of multiscale images (each zoom level reduces the image by half). Now we will search for the local maximum of correlation for the maximum scale, and then subsequently refine it at smaller scales. Thus, the complexity of the algorithm decreases to O (n), which is already quite acceptable.
Fig. 4 Multiscale image.
If you compensate for camera shake with an accuracy of a pixel, the stabilized image will still twitch very noticeably. Fortunately, this can be fixed. If we carefully analyze the neighborhood of the correlation function near the maximum (see Fig. 3), we can see that the values of the function are not symmetrical with respect to the maximum, which indicates that the maximum is not located at the point (3, 2), somewhere between and point (1, 4). If we approximate the behavior of the correlation function near the maximum by the paraboloid A * x ^ 2 + B * x * y + C * y ^ 2 + D * x + E * y + F = 0, then the task of clarifying the coordinates of the maximum will be reduced to the selection of such parameters of the paraboloid for which its deviation from the actual values at known points is minimal. Experience suggests that the accuracy of the refinement thus obtained will be of the order of 0.1-0.2. When compensating for jitter with such accuracy, the stabilized image almost does not twitch.
We perform offset compensation for the whole shift as follows: we shift the current image by the found shift with the opposite sign. Empty areas near the edge fill the background. For the subpixel shift, compensation is performed by the bilinear interpolation method. However, however, slight blurring of the stabilized image is possible. If this is critical, then bicubic interpolation can be used.
As a background, you can use just any previous frame. However, the stabilization quality improves markedly if you use an image averaged over many frames as the background. It is advisable to update the background periodically to compensate for possible changes in the brightness of the scene. When updating the background, you need to make sure that the background value is quite contrasting and heterogeneous. Otherwise, the correlation function will not have a clear maximum, which will greatly reduce the accuracy of the stabilizer. It is also highly undesirable for moving objects to be present in the background.
Work in tandem with a motion detector
If the stabilizer is paired with a motion detector, then the background updating process for it is greatly simplified. Usually, a motion detector already has a background averaged over many frames, relative to which it determines the presence of motion. The same background can be used for the stabilizer. The stabilized image from the stabilizer in turn reduces the number of false positives of the motion detector. You can also use the fact that the motion detector in the process of its work receives a mask of areas with the presence of motion. This mask obtained by the motion detector on the last frame can be used in calculating the correlation function to exclude areas with motion. Which also has a positive effect on the image stabilizer.
Pros of the proposed approach:
1) High speed algorithm. In particular, for image stabilization with a resolution of 1280x720 in BGRA32 format on a Core i7-4470 processor (1 core is involved), the algorithm requires 1.5 milliseconds.
2) Compensation for camera shake with subpixel accuracy.
The disadvantages of the proposed approach
1) Image stabilization in the current implementation is possible only for stationary cameras.
2) Only the spatial shift of the camera is detected and compensated, the rotation of the camera is not compensated.
3) The background should be quite clear and heterogeneous, otherwise the correlation function will have nothing to cling to. Therefore, in the dark or in fog, stabilization will not work well.
4) The background should be motionless. The stabilizer on the background of traveling waves is also impossible.
To begin with, we note that to determine the shift it is enough to use only a gray image, color characteristics practically do not affect accuracy, but naturally slow down the calculations.
When implementing the stabilizer, it is desirable to use optimized functions for working with images. I used the Simd library for these purposes . In it, in particular, you can find:
1) SimdAbsDifferenceSum and SimdAbsDifferenceSumMasked - for calculating the correlation function.
2) SimdReduceGray2x2, SimdReduceGray3x3, SimdReduceGray4x4 and SimdReduceGray5x5 - for building multiscale images.
3) SimdBgrToGray - to get a gray image.
4) SimdShiftBilinear - to compensate for the shift.
View the result of the algorithm