Search for periodic protection elements of the Passport of the Russian Federation using the Fourier transform: part two
Many documents contain security features such as holograms, watermarks, guilloche, etc. In the process of scanning such documents, a problem arises - security elements interfere with recognition systems (OCR). When developing Smart PassportReader, we conducted a study aimed at finding and eliminating such protective elements from document images.
In our previous articleon this topic, we talked about the first half of the solution to the search problem - detection, i.e. determining the presence of periodic elements in the image. Today we will tell you how to find the immediate position of the periodic elements in the image, provided that the detection was successful: we are sure that the elements are present in the image. The second part is highly dependent on the first, so it is strongly recommended that you first read the first, if you have not already done so.
As last time, the Fourier transform will be used for this.
As last time, many arguments will be shown for the one-dimensional case, which then can easily be transferred to the two-dimensional one.
Recall that - the original image of the length , consisting of two others: the background image and the image containing the periodic template. The image , in turn, we presented as a convolution of one instance of the template with the Dirac crest :
Since we know the structure of the periodic template in advance (in the case of a Russian passport, the size and period of the holograms), to restore the position of the elements, it is enough to know only the horizontal and vertical horizontal shift of the entire template, which is the same as the shift of one of the “eagles”.
The shift of the pattern signal is equivalent to the shift of the impulse function , which leads to a changed shape . The spectrum of the Fourier transform of the shifted signal changes only in phase, leaving the amplitude unchanged, thereby allowing the detection of periodic patterns regardless of their initial shift. As an important consequence, all useful phase information is concentrated in the same DFT positions (frequencies) as the amplitude peaks of the pulse signal.
Suppose that shifted to the right by , then:
Since the phase angle for the unbiased is zero, for the biased it becomes equal . To obtain a phase shift, we need to add each term to the corresponding phase angle in for each . Note that any phase angle is always taken modulo and limited by the interval .
The phase shift in in the frequency domain represents a shift in the time domain, but has a period , so we can avoid redundancy by considering only , thus ensuring . For convenience, let is a phase shift in which corresponds to a time shift by, and also let is the phase angle at the 'th peak amplitude:
The figure shows an example of the DFT phase of a shifted pulse signal with values , period, and shift . For example, and .
And here is the amplitude and phase of the DFT for a chessboard-like Gaussian grid:
- DFT ->
For a two-dimensional image, the phase shift now consists of two components . We introduce the coordinate grid for the DFT of the chess-like system of peaks shown in the second and third figures above, such that the central peak has coordinates , peaks at the top right , and so on; the nearest diagonal peak has coordinates . In general, a peak is present if and only if it is even.
Applying again the theorem on the DFT of the shifted signal, we can expand the shift equation and show that the phase angle at the peak for a two-dimensional pulse signal is:
As for the one-dimensional case, to obtain a shifted phase, it is necessary to add to the corresponding phase .
To determine the exact location of a periodic pattern, it suffices to evaluate its phase shift , since its pixel periodic structure is known in advance. The pixel shift is subsequently easy to recover from the phase shift information.
In the first article, we described the various stages of preprocessing the original image of a passport. First, an area containing a 2x2 grid (similar to a chessboard) and containing 2 patterns horizontally and vertically is cut out of the image. This can be done (with an error of less than 5%), since the physical dimensions of the document, the sizes and the period of the periodic patterns are fixed and known in advance. The position of the region can be established by combining the physical coordinates of the document with its coordinates in the image.
Next, significant compression and smoothing of the image is performed to suppress the background and level the differences between the periodic patterns. The figure below shows the image of the cut out region of the Russian passport, the results of its pre-processing and the amplitude of its DFT.
- processing ->- DFT ->
As expected, the DFT amplitude has noticeable peaks with a distribution similar to a chessboard.
According to the image model, the original image consists of three independent signals: a background image and a single sample of a periodic pattern , which is convoluted with a shifted pulse signal to obtain a periodic pattern :
After performing the DFT on , we have:
To obtain the phase shift stored in , from , which we calculated for a given image, we need to gradually suppress the remaining components of the equation.
If the background after the preprocessing is quite homogeneous on the processed documents, it is possible to simply calculate the averaged spectrum on the images of documents that do not have the desired periodic pattern, followed by its subtraction from .
However, the background image in our model is not actually a background in terms of the structure of the Russian passport: it contains personal data, which by definition differ on the processed set.
We have already mentioned the fact that DFT positions (frequencies) without peaks do not contain useful information regarding periodic patterns, since they represent the background. Assuming that is smooth, we can interpolate its contribution to each peakbased on values in the positions adjacent to the peak. The average value over the nearest neighbors of the peak in the 3x3 window is an acceptable estimate for subtracting from to clear the background:
The following figure contains a compressed image and the image obtained as a result of the inverse DFT after subtracting the interpolated spectrum of the background and zeroing the DFT at all non-peak frequencies.
- DFT, subtraction of the background spectrum, the inverse of DFT ---->
As expected, only the periodic pattern remained on a simple monotonous background and the instances of the periodic pattern became more similar to each other. Note that the background is not black, since the DFT on , which contains the value averaged over the image, was not reset to zero.
After removing the background spectrum, we have left . When multiplying two complex numbers, their phases add up. Then, to achieve the phase angle at the peak , the corresponding phase angle of the spectrum of a single instance of the periodic pattern must be subtracted.
The problem is that the spectrum of the periodic pattern is usually unknown. One of the possible solutions is the experimental calculation of the phase of the presented pattern sample and its subsequent subtraction. In our experiments, we used a different path, assuming that the phase contribution is constant, which means that the resulting shiftone more constant shift is added. This shear constant can be estimated experimentally as a systematic error on the test data set.
At the end of spectrum preprocessing, we have the phase angle of the pulse signal , which supposedly contains all the necessary information about the phase shift .
We can build a system of equations (each equation corresponds to a peak) modulo :
Or, in matrix form:
On the left side of the system, there are “ideal” phase values corresponding to the peaks, and on the right, the actual values of the DFT phase for a given image at the peak positions.
This system is overridden because it has equations (according to the number of peaks considered) and only 2 variables: and . Overdetermined systems of equations can be approximately solved by the least squares method (LSM) by finding a solution that minimizes the sum of squared residuals. However, this system is a modulo system, therefore, it cannot directly be applied to the least-squares method, but we can calculate the initial solution and correct it with a sequential solution of the system.
Let's look at two equations for peaks under numbers and :
This system has 4 allowable solutions: , , and . Due to the chessboard-like structure of the peak patterns in our case, and also, since it is a shift by a single period, the first and fourth solutions are identical, as are the second and third. Two potential solutions remain: and .
To choose the right solution, we can add the two remaining equations for the peaks to the system and , again, making the system overridden. Then, the residual value is calculated for the ith solution , taking into account all the equations:
Note that intermediate calculations involving phase angles (calculation of differences) are performed in .
Finally, the solution with the least value is selected as the quality , since the other solution will have a much larger residual value. Of great importance for may mean false peak detection.
The initial solution found above could be good if the remaining spectrum contained only information about and nothing more. Unfortunately, it is distorted by spectral components from the background and from samples of periodic patterns that were not completely removed during pre-processing.
At this stage, the selected solution is still correct, but can be clarified using phase information at other peaks, and not just the first two. Subtract (using equations) from both sides of the system of equations, thus shifting the space of solutions in and obtain the following overdetermined system of equations for the correcting solution :
Assuming that in the first approximation of the solution we made no more mistakes than on , we can not take the system modulo , because after the shift of the system, each equation will be inside , which means that we can approximately solve the system using the least squares method. After a solution is found, it should be added to the previously found initial solution as a correction value.
So far, we have considered the equations for the first 4 peaks having a sum of 2 (and due to the symmetry of the DFT), but there are other peaks with sums of 4, 6, and so on. The reason why they have not been added before is because when you increasethe number of suitable system solutions increases, and the closer they become modulo to each other .
To use the phase shift information contained in the remaining peaks, we can repeat the above process of adjusting the initial solution, replacing the adjusted solution obtained for the sum equal to 4 and using all the equations having the sum equal to 6 and so on until the required accuracy is achieved. Assuming that with an increase in iteration, the error decreases, each equation will be within , which allows the use of OLS to solve the system.
Now that we have a phase shift , we can easily convert it back to a pixel shift . Knowing the period and size of the patterns, we can restore the position of each lattice pattern. The size and position of the cut region that we used during the processing is also known, so we can get the position of the patterns on the original image of the document.
The following figure, which you already saw at the very beginning of the article, is an example of finding a holographic periodic template on a Russian passport.
A frame with a circle inside indicates the result found after solving the system of equations. Other frames indicate the extrapolated positions of the remaining patterns.
We used a data set of 484 images of scanned Russian passports with a holographic filling. For each image in the set, we prepared the true value of the pixel shift of its periodic pattern. The table below contains the experimental results for this data set with the averaged value of the bias residual as a measure of accuracy.
The error is set as a percentage of the side of the unit pattern. The pre-treatment consisted of compression to size 56x58 and a morphological closure operation; when solving a system for obtaining a phase shift, peak equations were considered .
The following figure shows the distribution and errors for the worst (left) and best (right) method. Black cells represent an error by , and white cells represent an error .
In these histograms, the Y axis indicates how many percent of the images (out of 484) have an error close to the corresponding value along the X axis (as a percentage of the side size of the periodic pattern). It can be seen that the error distribution for the best method is less scattered than for the worst.
We proposed a method for searching for periodic patterns in document images that uses a discrete Fourier transform and shows the average search error in (from the side of the periodic element) on a set of 484 images.
We will talk about the details of our application of this method in one of the following articles.
In our previous articleon this topic, we talked about the first half of the solution to the search problem - detection, i.e. determining the presence of periodic elements in the image. Today we will tell you how to find the immediate position of the periodic elements in the image, provided that the detection was successful: we are sure that the elements are present in the image. The second part is highly dependent on the first, so it is strongly recommended that you first read the first, if you have not already done so.
As last time, the Fourier transform will be used for this.
Image model
As last time, many arguments will be shown for the one-dimensional case, which then can easily be transferred to the two-dimensional one.
Recall that - the original image of the length , consisting of two others: the background image and the image containing the periodic template. The image , in turn, we presented as a convolution of one instance of the template with the Dirac crest :
Since we know the structure of the periodic template in advance (in the case of a Russian passport, the size and period of the holograms), to restore the position of the elements, it is enough to know only the horizontal and vertical horizontal shift of the entire template, which is the same as the shift of one of the “eagles”.
Discrete Fourier Transform from a Shifted Periodic Pattern
The shift of the pattern signal is equivalent to the shift of the impulse function , which leads to a changed shape . The spectrum of the Fourier transform of the shifted signal changes only in phase, leaving the amplitude unchanged, thereby allowing the detection of periodic patterns regardless of their initial shift. As an important consequence, all useful phase information is concentrated in the same DFT positions (frequencies) as the amplitude peaks of the pulse signal.
Suppose that shifted to the right by , then:
Since the phase angle for the unbiased is zero, for the biased it becomes equal . To obtain a phase shift, we need to add each term to the corresponding phase angle in for each . Note that any phase angle is always taken modulo and limited by the interval .
The phase shift in in the frequency domain represents a shift in the time domain, but has a period , so we can avoid redundancy by considering only , thus ensuring . For convenience, let is a phase shift in which corresponds to a time shift by, and also let is the phase angle at the 'th peak amplitude:
The figure shows an example of the DFT phase of a shifted pulse signal with values , period, and shift . For example, and .
And here is the amplitude and phase of the DFT for a chessboard-like Gaussian grid:
- DFT ->
For a two-dimensional image, the phase shift now consists of two components . We introduce the coordinate grid for the DFT of the chess-like system of peaks shown in the second and third figures above, such that the central peak has coordinates , peaks at the top right , and so on; the nearest diagonal peak has coordinates . In general, a peak is present if and only if it is even.
Applying again the theorem on the DFT of the shifted signal, we can expand the shift equation and show that the phase angle at the peak for a two-dimensional pulse signal is:
As for the one-dimensional case, to obtain a shifted phase, it is necessary to add to the corresponding phase .
Periodic pattern search
To determine the exact location of a periodic pattern, it suffices to evaluate its phase shift , since its pixel periodic structure is known in advance. The pixel shift is subsequently easy to recover from the phase shift information.
Cutting, scaling and image preprocessing
In the first article, we described the various stages of preprocessing the original image of a passport. First, an area containing a 2x2 grid (similar to a chessboard) and containing 2 patterns horizontally and vertically is cut out of the image. This can be done (with an error of less than 5%), since the physical dimensions of the document, the sizes and the period of the periodic patterns are fixed and known in advance. The position of the region can be established by combining the physical coordinates of the document with its coordinates in the image.
Next, significant compression and smoothing of the image is performed to suppress the background and level the differences between the periodic patterns. The figure below shows the image of the cut out region of the Russian passport, the results of its pre-processing and the amplitude of its DFT.
- processing ->- DFT ->
As expected, the DFT amplitude has noticeable peaks with a distribution similar to a chessboard.
Spectrum pre-processing
According to the image model, the original image consists of three independent signals: a background image and a single sample of a periodic pattern , which is convoluted with a shifted pulse signal to obtain a periodic pattern :
After performing the DFT on , we have:
To obtain the phase shift stored in , from , which we calculated for a given image, we need to gradually suppress the remaining components of the equation.
Background Spectrum Suppression
If the background after the preprocessing is quite homogeneous on the processed documents, it is possible to simply calculate the averaged spectrum on the images of documents that do not have the desired periodic pattern, followed by its subtraction from .
However, the background image in our model is not actually a background in terms of the structure of the Russian passport: it contains personal data, which by definition differ on the processed set.
We have already mentioned the fact that DFT positions (frequencies) without peaks do not contain useful information regarding periodic patterns, since they represent the background. Assuming that is smooth, we can interpolate its contribution to each peakbased on values in the positions adjacent to the peak. The average value over the nearest neighbors of the peak in the 3x3 window is an acceptable estimate for subtracting from to clear the background:
The following figure contains a compressed image and the image obtained as a result of the inverse DFT after subtracting the interpolated spectrum of the background and zeroing the DFT at all non-peak frequencies.
- DFT, subtraction of the background spectrum, the inverse of DFT ---->
As expected, only the periodic pattern remained on a simple monotonous background and the instances of the periodic pattern became more similar to each other. Note that the background is not black, since the DFT on , which contains the value averaged over the image, was not reset to zero.
Spectrum pattern suppression
After removing the background spectrum, we have left . When multiplying two complex numbers, their phases add up. Then, to achieve the phase angle at the peak , the corresponding phase angle of the spectrum of a single instance of the periodic pattern must be subtracted.
The problem is that the spectrum of the periodic pattern is usually unknown. One of the possible solutions is the experimental calculation of the phase of the presented pattern sample and its subsequent subtraction. In our experiments, we used a different path, assuming that the phase contribution is constant, which means that the resulting shiftone more constant shift is added. This shear constant can be estimated experimentally as a systematic error on the test data set.
Phase Shift Recovery
At the end of spectrum preprocessing, we have the phase angle of the pulse signal , which supposedly contains all the necessary information about the phase shift .
We can build a system of equations (each equation corresponds to a peak) modulo :
Or, in matrix form:
On the left side of the system, there are “ideal” phase values corresponding to the peaks, and on the right, the actual values of the DFT phase for a given image at the peak positions.
This system is overridden because it has equations (according to the number of peaks considered) and only 2 variables: and . Overdetermined systems of equations can be approximately solved by the least squares method (LSM) by finding a solution that minimizes the sum of squared residuals. However, this system is a modulo system, therefore, it cannot directly be applied to the least-squares method, but we can calculate the initial solution and correct it with a sequential solution of the system.
Calculation of the initial solution
Let's look at two equations for peaks under numbers and :
This system has 4 allowable solutions: , , and . Due to the chessboard-like structure of the peak patterns in our case, and also, since it is a shift by a single period, the first and fourth solutions are identical, as are the second and third. Two potential solutions remain: and .
To choose the right solution, we can add the two remaining equations for the peaks to the system and , again, making the system overridden. Then, the residual value is calculated for the ith solution , taking into account all the equations:
Note that intermediate calculations involving phase angles (calculation of differences) are performed in .
Finally, the solution with the least value is selected as the quality , since the other solution will have a much larger residual value. Of great importance for may mean false peak detection.
Solution adjustment
The initial solution found above could be good if the remaining spectrum contained only information about and nothing more. Unfortunately, it is distorted by spectral components from the background and from samples of periodic patterns that were not completely removed during pre-processing.
At this stage, the selected solution is still correct, but can be clarified using phase information at other peaks, and not just the first two. Subtract (using equations) from both sides of the system of equations, thus shifting the space of solutions in and obtain the following overdetermined system of equations for the correcting solution :
Assuming that in the first approximation of the solution we made no more mistakes than on , we can not take the system modulo , because after the shift of the system, each equation will be inside , which means that we can approximately solve the system using the least squares method. After a solution is found, it should be added to the previously found initial solution as a correction value.
So far, we have considered the equations for the first 4 peaks having a sum of 2 (and due to the symmetry of the DFT), but there are other peaks with sums of 4, 6, and so on. The reason why they have not been added before is because when you increasethe number of suitable system solutions increases, and the closer they become modulo to each other .
To use the phase shift information contained in the remaining peaks, we can repeat the above process of adjusting the initial solution, replacing the adjusted solution obtained for the sum equal to 4 and using all the equations having the sum equal to 6 and so on until the required accuracy is achieved. Assuming that with an increase in iteration, the error decreases, each equation will be within , which allows the use of OLS to solve the system.
Recovering the location of periodic patterns
Now that we have a phase shift , we can easily convert it back to a pixel shift . Knowing the period and size of the patterns, we can restore the position of each lattice pattern. The size and position of the cut region that we used during the processing is also known, so we can get the position of the patterns on the original image of the document.
The following figure, which you already saw at the very beginning of the article, is an example of finding a holographic periodic template on a Russian passport.
A frame with a circle inside indicates the result found after solving the system of equations. Other frames indicate the extrapolated positions of the remaining patterns.
Experimental results
We used a data set of 484 images of scanned Russian passports with a holographic filling. For each image in the set, we prepared the true value of the pixel shift of its periodic pattern. The table below contains the experimental results for this data set with the averaged value of the bias residual as a measure of accuracy.
The error is set as a percentage of the side of the unit pattern. The pre-treatment consisted of compression to size 56x58 and a morphological closure operation; when solving a system for obtaining a phase shift, peak equations were considered .
Background suppression | Adjustment with OLS | x error | y error | Common mistake |
---|---|---|---|---|
Is absent | Is absent | 2.13 | 6.40 | 6.75 |
Is present | Is absent | 2.09 | 6.04 | 6.40 |
Is absent | Is present | 2.19 | 5.05 | 5.50 |
Is present | Is present | 2.20 | 4.77 | 5.25 |
The following figure shows the distribution and errors for the worst (left) and best (right) method. Black cells represent an error by , and white cells represent an error .
In these histograms, the Y axis indicates how many percent of the images (out of 484) have an error close to the corresponding value along the X axis (as a percentage of the side size of the periodic pattern). It can be seen that the error distribution for the best method is less scattered than for the worst.
Conclusion
We proposed a method for searching for periodic patterns in document images that uses a discrete Fourier transform and shows the average search error in (from the side of the periodic element) on a set of 484 images.
We will talk about the details of our application of this method in one of the following articles.