
How does automatic document selection on an image work in ABBYY FineScanner?

What is ABBYY FineScanner
ABBYY FineScanner is a program for iOS devices that can photograph documents and process images so that the resulting electronic copies (essentially scans) are convenient for working - reading, printing or storing / sending in a readable form. We wrote about the release of the first version here .
Photos of documents received on mobile devices have different distortions compared to images obtained from a conventional scanner. Such distortions include: digital noise, geometric distortions caused by the rotation of the document or the presence of perspective, unevenness in illumination, defocusing, blur. Next, we describe an algorithm that automatically eliminates geometric distortions of a document in an image.
The whole process can be divided into several main stages:
1) Reducing the original image
2) Selecting the most informative channel
3) Preprocessing the image, selecting contours
4) Detecting the borders and determining the corners of the document
5) Testing the hypotheses
6) Clarification of the coordinates of the corners of the document
Consider each of the steps in more detail.
1. Reducing the original image
The purpose of this step is to obtain a greatly reduced copy of the image such that max (w, h) is of the order of 200 pixels, where w is the width and h is the height of the resulting image. Immediately by so many times it is impossible to reduce the original image (having a size of the order of 2-3 thousand pixels in width and height)! Even if we use bilinear or bicubic interpolation algorithms , we get an image with a strong “beating” effect on the contours, also called aliasing (see Fig. 1, left). Therefore, we reduce the image by several iterations, each time changing the size by 2 times. As a result, we get a more “smooth” image (Fig. 1, right). One could use Gaussian smoothingand immediately reduce it by the required number of times, but, firstly, Gaussian smoothing is a resource-intensive procedure for an image of several megapixels, and secondly, bilinear interpolation is implemented by built-in OpenGL ES tools, we only need to consistently change the texture size.


Figure 1. The image obtained by reducing the original to a size of 150 pixels in different ways. On the left you can see the effect of “beating” on the contours, on the right - a more “smooth” result of a consistent decrease of 2 times.
2. The choice of the most informative channel
First of all, we note that the document from the surrounding background can differ not only in brightness, but also in color. Therefore, no matter how much we would like to reduce the search space, we can’t just get rid of the color channels in the image. Let's see which one is most suitable for further analysis. To do this, calculate the histograms of 4 channels: R, G, B and brightness. Next, from the histograms, we calculate the average values and the variance of the channels. If the maximum dispersion of one of the color channels is greater than the brightness channel by a certain coefficient K> 1 (algorithm parameter), then we will use it later, otherwise we will use the brightness channel.








Figure 2. From top to bottom: images of channels R, G, B and brightness and their corresponding histograms.
3. Image preprocessing, contouring
The channel of the reduced image obtained in the previous step will have to work a little. We still get in the way of some elements inside the document (lines of text, headings, separators remain visible even at this scale), so we will medianly filter the image, with radii R1 = 3 and R2 = 5 pixels, 3 iterations for each radius. Save the resulting image (see Fig. 2). The first one is used to highlight borders using the famous Canny edge detection algorithm., and the second will be used to test the hypotheses obtained on the sides. The first stage of the Canny edge detection algorithm (image filtering) can be skipped, since we have already done this taking into account the properties of our signal. The values of the “upper” and “lower” threshold for the Canny edge detection algorithm should be configured, as well as the other parameters of our algorithm, but more on that later.





Figure 3. From left to right: the original image of one of the channels, after 1, 2, 3 iterations of median filtering with a radius of R = 3, the result of the selection of contours by the Canny edge detection algorithm.
4. Border detection and definition of document corners
So, at the previous stage, we got a contour representation of our image, we can only find the borders of our document on it. For this use the other well-known algorithm - Hough transform ( Hough transform ). Strictly speaking, for each side you can use your “own Hough subspace, with the most suitable parameterization, which can be calculated much faster than the full Hough space. Using these subspaces, we find lines with maximum response for each side of the document. The intersection points of these lines define the corners of the document. Go to the next step.




Figure 4. From left to right: the contour representation of the image, the Hough subspace, the result of detecting borders on the image.
5. Testing the hypotheses
We would like to test the hypotheses obtained for the boundaries of the document, because the algorithm, due to its simplicity, quite often makes mistakes in difficult situations. The most serious errors can be avoided by transferring (when testing hypotheses) the algorithm from automatic to semi-automatic, or even manual, when the user is prompted to correct the result, or select the document on his own.
To test the hypothesis obtained on the contour, we use an image obtained as a result of median filtering with a radius of R2 = 5 pixels (see stage 3). For each of the boundaries we will sort out the positions separated by + -1, + - 2, ..., + - 10 pixels, and calculate the energy of the contour (the integral along the contour from the gradient) in each of the positions. We select the position with the maximum energy of the circuit, according to the energy value, we decide whether the boundary is suitable for an automatic algorithm, for a semi-automatic one, or should it be discarded and switch to manual mode. Thus, any of the boundaries of the document may affect the transfer of the algorithm to another mode.
The geometry check for the obtained boundaries includes several rules:
1) The ratio of the lengths of all opposite boundaries should be, for example, 0.5 2) The angles on the common side should not differ by more than 12 - 15 degrees.
3) The area of the resulting quadrangle should be at least 15 - 25% of the image.
4) The shift of the document relative to the center of the image is not more than 5 - 10% of the width or height of the image.
6. Refinement of the coordinates of the corners of the document
The solution obtained in this way is approximate in the geometric sense, the coordinates of the corners of the document can be tried to clarify. Around each corner of the document, we define a square region in which we will search for a point for which the median filter value differs from the signal (the image channel selected by us, see step 2) by an amount exceeding a certain threshold T = 12. By the sign of differences, the angle can also be defined as “normal” or “inverted”, i.e. belonging to a document that is lighter or darker than its surroundings. In the end, we must decide on the entire document as a whole.
Of course, the specific values of the method parameters may be different, their adjustment should be carried out on a large database of images containing various types of documents shot under different conditions.
After the coordinates of all 4 corners of the document are determined, you can perform a perspective transformation that corrects geometric distortions in the image. In this case, you can automatically determine the proportions of the original document using only the coordinates of the corners. However, consideration of this problem is beyond the scope of this article.
Ivan logicview Zagainov,
Department of Technology Development