Search for periodic protection elements of the Passport of the Russian Federation using the Fourier transform: part two

    Many documents contain security features such as holograms, watermarks, guilloche, etc. In the process of scanning such documents, a problem arises - security elements interfere with recognition systems (OCR). When developing Smart PassportReader, we conducted a study aimed at finding and eliminating such protective elements from document images.



    In our previous articleon this topic, we talked about the first half of the solution to the search problem - detection, i.e. determining the presence of periodic elements in the image. Today we will tell you how to find the immediate position of the periodic elements in the image, provided that the detection was successful: we are sure that the elements are present in the image. The second part is highly dependent on the first, so it is strongly recommended that you first read the first, if you have not already done so.

    As last time, the Fourier transform will be used for this.

    Image model


    As last time, many arguments will be shown for the one-dimensional case, which then can easily be transferred to the two-dimensional one.

    Recall that I (x)- the original image of the length N, consisting of two others: the background image h (x)and the image g (x)containing the periodic template. The image g (x), in turn, we presented as a convolution of one instance of the template f (x)with the Dirac crest c (x):

    I (x) = h (x) + g (x) = h (x) + f (x) \ ast c (x)

    Since we know the structure of the periodic template in advance (in the case of a Russian passport, the size and period of the holograms), to restore the position of the elements, it is enough to know only the horizontal and vertical horizontal shift of the entire template, which is the same as the shift of one of the “eagles”.

    Discrete Fourier Transform from a Shifted Periodic Pattern


    The shift of the pattern signal is g (x)equivalent to the shift of the impulse function c (x), which leads to a changed shape \ mathcal {F} c (x). The spectrum of the Fourier transform of the shifted signal changes only in phase, leaving the amplitude unchanged, thereby allowing the detection of periodic patterns regardless of their initial shift. As an important consequence, all useful phase information is concentrated in the same DFT positions (frequencies) as the amplitude peaks of the pulse signal.

    Suppose that c (x)shifted to the right by S, then:

    \ mathcal {F} _k c (x - S) = \ mathcal {F} _k c (x) \ cdot e ^ {i \ Phi_k},

    \ Phi_k = \ Phi \ cdot k, \ Phi = - \ frac {2 \ pi} {N} S

    Since the phase angle \ mathcal {F} _k c (x)for the unbiased c (x)is zero, for the biased c (xS)it becomes equal \ Phi_k. To obtain a phase shift, \ mathcal {F} _k g (x)we need to add each term \ Phi_kto the corresponding phase angle in \ mathcal {F} _k f (x)for each k. Note that any phase angle is always taken modulo 2 \ piand limited by the interval [- \ pi; \ pi).

    The phase shift in 2 \ piin the frequency domain represents a shift Nin the time domain, but c (x)has a period T = \ frac {N} {M}, so we can avoid redundancy by considering only {s = S \ text {mod} T}, thus ensuring 0 \ le s <T. For convenience, let \ phiis a phase shift in which 2 \ picorresponds to a time shift bys, and also let \ phi_mis the phase angle at the m'th peak amplitude:

    \ phi_m = \ phi \ cdot m, \ phi = - \ frac {2 \ pi} {T} s = \ Phi \ cdot M

    The figure shows an example of the DFT phase of a shifted pulse signal with values {N = 256}, period, T = 32and shift s = 7. For example, \ phi_0 = 0and \ phi_6 = (- \ frac {2 \ pi} {32} \ cdot 7) \ cdot 6 \ approx -1,963.

    And here is the amplitude and phase of the DFT for a chessboard-like Gaussian grid:

    - DFT ->

    For a two-dimensional image, the phase shift \ phinow consists of two components (\ phi_x, \ phi_y). We introduce the coordinate grid for the DFT of the chess-like system of peaks shown in the second and third figures above, such that the central peak has coordinates (0, 0), peaks at the top right (20), (0, 2)and so on; the nearest diagonal peak has coordinates (eleven). In general, a peak is (i, j)present if and only if it (i + j)is even.

    Applying again the theorem on the DFT of the shifted signal, we can expand the shift equation and show that the phase angle \ phi_ {i, j}at the (i, j)peak for a two-dimensional pulse signal is:

    \ phi_ {i, j} = i \ cdot \ phi_x + j \ cdot \ phi_y

    As for the one-dimensional case, to obtain a shifted phase, \ mathcal {F} _ {i, j} g (x)it is necessary to add \ phi_ {i, j}to the corresponding phase \ mathcal {F} _ {i, j} f (x).

    Periodic pattern search


    To determine the exact location of a periodic pattern, it suffices to evaluate its phase shift \ phi = (\ phi_x, \ phi_y), since its pixel periodic structure is known in advance. The pixel shift s = (s_x, s_y)is subsequently easy to recover from the phase shift information.

    Cutting, scaling and image preprocessing


    In the first article, we described the various stages of preprocessing the original image of a passport. First, an area containing a 2x2 grid (similar to a chessboard) and containing 2 patterns horizontally and vertically is cut out of the image. This can be done (with an error of less than 5%), since the physical dimensions of the document, the sizes and the period of the periodic patterns are fixed and known in advance. The position of the region can be established by combining the physical coordinates of the document with its coordinates in the image.

    Next, significant compression and smoothing of the image is performed to suppress the background and level the differences between the periodic patterns. The figure below shows the image of the cut out region of the Russian passport, the results of its pre-processing and the amplitude of its DFT.

    - processing ->- DFT ->

    As expected, the DFT amplitude has noticeable peaks with a distribution similar to a chessboard.

    Spectrum pre-processing


    According to the image model, the original image consists of three independent signals: a background image h (x)and a single sample of a periodic pattern f (x), which is convoluted with a shifted pulse signal c (x)to obtain a periodic pattern g (x):

    I (x) = h (x) + f (x) * c (x)

    After performing the DFT on I (x), we have:

    \ label {eq: immodel_dft_base} \ mathcal {F} I (x) = \ mathcal {F} h (x) + \ mathcal {F} f (x) \ cdot \ mathcal {F} c (x)

    To obtain the phase shift \ phistored in c (x), from \ mathcal {F} I (x), which we calculated for a given image, we need to gradually suppress the remaining components of the equation.

    Background Spectrum Suppression


    If the background h (x)after the preprocessing is quite homogeneous on the processed documents, it is possible to simply calculate the averaged spectrum \ mathcal {F} h (x)on the images of documents that do not have the desired periodic pattern, followed by its subtraction from \ mathcal {F} I (x).

    However, the background image h (x)in our model is not actually a background in terms of the structure of the Russian passport: it contains personal data, which by definition differ on the processed set.

    We have already mentioned the fact that DFT positions (frequencies) without peaks do not contain useful information regarding periodic patterns, since they represent the background. Assuming that \ mathcal {F} h (x)is smooth, we can interpolate its contribution to each peak\ mathcal {F} I (x, y)based on values ​​in the positions adjacent to the peak. The average value \ overline {\ mathcal {F} I (x ', y')}over (x, y)the nearest neighbors of the peak \ {\ mathcal {F} I (x ', y') \}in the 3x3 window is an acceptable estimate \ mathcal {F} h (x, y)for subtracting from \ mathcal {F} I (x, y)to clear the background:

    \ mathcal {F} I (x, y) \ leftarrow \ mathcal {F} I (x, y) - \ overline {\ mathcal {F} I (x ', y')}

    The following figure contains a compressed image and the image obtained as a result of the inverse DFT after subtracting the interpolated spectrum of the background and zeroing the DFT at all non-peak frequencies.

    - DFT, subtraction of the background spectrum, the inverse of DFT ---->

    As expected, only the periodic pattern remained on a simple monotonous background and the instances of the periodic pattern became more similar to each other. Note that the background is not black, since the DFT on (0, 0), which contains the value averaged over the image, was not reset to zero.

    Spectrum pattern suppression


    After removing the background spectrum, we have left \ mathcal {F} f (x) \ cdot \ mathcal {F} c (x). When multiplying two complex numbers, their phases add up. Then, to achieve the phase angle \ mathcal {F} c (i, j)at the peak (i, j), the corresponding phase angle of the spectrum of a single instance of the periodic pattern \ mathcal {F} f (i, j)must be subtracted.

    The problem is that the spectrum of the periodic pattern is usually unknown. One of the possible solutions is the experimental calculation of the phase of the presented pattern sample and its subsequent subtraction. In our experiments, we used a different path, assuming that the phase contribution \ mathcal {F} f (i, j)is constant, which means that the resulting shift\ phione more constant shift is added. This shear constant can be estimated experimentally as a systematic error on the test data set.

    Phase Shift Recovery


    At the end of spectrum preprocessing, we have the phase angle of the pulse signal c (x), which supposedly contains all the necessary information about the phase shift {\ phi = (\ phi_x, \ phi_y) ^ T}.
    We can build a system of equations (each equation corresponds to a (i, j)peak) modulo 2 \ pi:

    \ begin {cases} \ dots \\ i \ cdot \ phi_x + j \ cdot \ phi_y = \ phi_ {i, j} \ pmod {2 \ pi} \\ \ dots \\ \ end {cases}

    Or, in matrix form:

    A \ phi = b \ pmod {2 \ pi}

    On the left side of the system, there are “ideal” phase values ​​corresponding to the peaks, and on the right, the actual values ​​of the DFT phase for a given image at the peak positions.

    This system is overridden because it has nequations (according to the number of peaks considered) and only 2 variables: \ phi_xand \ phi_y. Overdetermined systems of equations can be approximately solved by the least squares method (LSM) by finding a solution that minimizes the sum of squared residuals. However, this system is a modulo system, 2 \ pitherefore, it cannot directly be applied to the least-squares method, but we can calculate the initial solution and correct it with a sequential solution of the system.

    Calculation of the initial solution


    Let's look at two equations for peaks under numbers (20)and (0, 2):

    \ begin {cases} 2 \ cdot \ phi_x + 0 \ cdot \ phi_y = \ phi_ {2, 0} \ pmod {2 \ pi} \\ 0 \ cdot \ phi_x + 2 \ cdot \ phi_y = \ phi_ {0, 2} \ pmod {2 \ pi} \ end {cases}

    This system has 4 allowable solutions: (\ frac {\ phi_ {2, 0}} {2}, \ frac {\ phi_ {0, 2}} {2}), (\ frac {\ phi_ {2, 0}} {2} + \ pi, \ frac {\ phi_ {0, 2}} {2}), (\ frac {\ phi_ {2, 0}} {2}, \ frac {\ phi_ {0, 2}} {2} + \ pi)and (\ frac {\ phi_ {2, 0}} {2} + \ pi, \ frac {\ phi_ {0, 2}} {2} + \ pi). Due to the chessboard-like structure of the peak patterns in our case, and also, since it 2 \ piis a shift by a single period, the first and fourth solutions are identical, as are the second and third. Two potential solutions remain: \ phi ^ * _ 1 = (\ frac {\ phi_ {2, 0}} {2}, \ frac {\ phi_ {0, 2}} {2}) ^ Tand \ phi ^ * _ 2 = (\ frac {\ phi_ {2, 0}} {2} + \ pi, \ frac {\ phi_ {0, 2}} {2}) ^ T.

    To choose the right solution, we can add the two remaining equations for the peaks to the system (eleven)and (eleven), again, making the system overridden. Then, the residual value is \ delta_kcalculated for the kith solution \ phi ^ * _ k, taking into account all the n = 4equations:

    \ delta_k = \ frac {1} {n} \ sum_ {i, j} {\ left (\ begin {pmatrix} i & j \ end {pmatrix} \ cdot \ phi ^ * _ k - \ phi_ {i, j} \ right) ^ 2} = \ frac {1} {n} | A \ phi ^ * _ k-b | ^ 2

    Note that intermediate calculations involving phase angles (calculation of differences) are performed in [- \ pi;  \ pi).

    Finally, the solution \ phi ^ * _ kwith the least value \ delta_kis selected as the quality \ phi ^ *, since the other solution will have a much larger residual value. Of great importance \ delta_kfor \ phi ^ * _ kmay mean false peak detection.

    Solution adjustment


    The initial solution found above could be good if the remaining spectrum contained only information about c (x)and nothing more. Unfortunately, it is distorted by spectral components from the background and from samples of periodic patterns that were not completely removed during pre-processing.

    At this stage, the selected solution \ phi ^ *is still correct, but can be clarified using phase information at other peaks, and not just the first two. Subtract A \ phi ^ *(using n = 4equations) from both sides of the system of equations, thus shifting the space of solutions in \ phi ^ *and obtain the following overdetermined system of equations for the correcting solution \ theta:

    \label{eq:correction_phase}
  A\theta = b - A\phi^*

    Assuming that in the first approximation of the solution we made no more mistakes than on \pi, we can not take the system modulo 2\pi, because after the shift of the system, each equation will be inside 2\pi, which means that we can approximately solve the system using the least squares method. After a solution is \theta^*found, it should be added to the previously found initial solution \phi^*as a correction value.

    So far, we have considered the equations for the first 4 peaks (i, j)having a (|i| + |j|)sum of 2 (and i \ge 0due to the symmetry of the DFT), but there are other peaks with sums of 4, 6, and so on. The reason why they have not been added before is because when you increase(|i| + |j|)the number of suitable system solutions increases, and the closer they become modulo to each other 2\pi.

    To use the phase shift information contained in the remaining peaks, we can repeat the above process of adjusting the initial solution, replacing the \phi^*adjusted solution obtained for the sum equal to 4 and using all the equations having the sum equal to 6 and so on until the required accuracy is achieved. Assuming that with an increase in iteration, the error decreases, each equation will be within 2\pi, which allows the use of OLS to solve the system.

    Recovering the location of periodic patterns


    Now that we have a phase shift \phi = (\phi_x, \phi_y), we can easily convert it back to a pixel shift s = (s_x, s_y). Knowing the period and size of the patterns, we can restore the position of each lattice pattern. The size and position of the cut region that we used during the processing is also known, so we can get the position of the patterns on the original image of the document.

    The following figure, which you already saw at the very beginning of the article, is an example of finding a holographic periodic template on a Russian passport.



    A frame with a circle inside indicates the result found after solving the system of equations. Other frames indicate the extrapolated positions of the remaining patterns.

    Experimental results


    We used a data set of 484 images of scanned Russian passports with a holographic filling. For each image in the set, we prepared the true value of the pixel shift of its periodic pattern. The table below contains the experimental results for this data set with the averaged value of the bias residual as a measure of accuracy.
    The error is set as a percentage of the side of the unit pattern. The pre-treatment consisted of compression to size 56x58 and a morphological closure operation; when solving a system for obtaining a phase shift, n=4peak equations were considered .
    Background suppression Adjustment with OLS x error y error Common mistake
    Is absent Is absent 2.13 6.40 6.75
    Is present Is absent 2.09 6.04 6.40
    Is absent Is present 2.19 5.05 5.50
    Is present Is present 2.20 4.77 5.25

    The following figure shows the distribution xand yerrors for the worst (left) and best (right) method. Black cells represent an error by x, and white cells represent an error y.



    In these histograms, the Y axis indicates how many percent of the images (out of 484) have an error close to the corresponding value along the X axis (as a percentage of the side size of the periodic pattern). It can be seen that the yerror distribution for the best method is less scattered than for the worst.

    Conclusion


    We proposed a method for searching for periodic patterns in document images that uses a discrete Fourier transform and shows the average search error in 6\%(from the side of the periodic element) on a set of 484 images.
    We will talk about the details of our application of this method in one of the following articles.

    Also popular now: