
Fourier calculations for image comparison
The traditional “entry-level” technique, comparing the current image with the standard, is based on the consideration of images as two-dimensional brightness functions (discrete two-dimensional intensity matrices). In this case, either the distance between the images or a measure of their proximity is measured.
As a rule, to calculate the distances between images, a formula is used, which is the sum of the modules or squares of the intensity differences:
If, in addition to a simple comparison of two images, it is necessary to solve the problem of detecting the position of a fragment of one image in another, then the classical “entry-level” method, which consists in enumerating all the coordinates and calculating the distance using the above formula, usually fails in practical use due to the required large number computing.
One of the methods that can significantly reduce the number of calculations is the use of Fourier transforms and discrete Fourier transforms to calculate the measure of coincidence of two images at different displacements between them. In this case, calculations occur simultaneously for various combinations of image shifts relative to each other.
The presence of a large number of libraries that implement Fourier transforms (in all possible versions of the fast versions) makes the implementation of image comparison algorithms not a very difficult task for programming.
For example, find:


By definition, correlation two functions F and G is called the quantity:
This value is well known from the course of mathematics and geometry, devoted to linear spaces, where it is called the scalar product. We will use the formula as a measure between images:
or
This value is obtained from the scalar product operation of vectors (considering images as vectors in multidimensional space). And even more - the same formula represents the standard statistical formula of the criterion for the hypothesis of the coincidence of two probability distributions.
Note:
When calculating the correlation between fragments of images, if one image is smaller than the other, we will only divide by the value of the norms of the intersecting parts.
By definition, the convolution of two functions F and G is the function FxG:
Let G '(t) = G (-t) and F' (t) = F (-t), then the equality is obvious:
It is also obvious that FxG '(t) is equal to the correlation obtained by shifting one vector relative to another by step t (this can easily be verified by explicitly substituting the values in the correlation formula).
The Fourier transform (ℱ) is an operation that maps one function of a real variable to another function, also a real variable. This new function describes the coefficients ("amplitudes") in the decomposition of the original function into elementary components - harmonic oscillations with different frequencies.
The Fourier transform of the function f of a real variable is integral and is given by the following formula:

Different sources can give definitions that differ from the one given above by choosing the coefficient in front of the integral, as well as the “-" sign in the exponent. But all the properties will be the same, although the appearance of some formulas may change.
In addition, there are various generalizations of this concept.
The Fourier transform of the functions defined on the space ℝ ^ n is determined by the formula:

The inverse transformation in this case is given by the formula:

As before, in different sources the definition of the multidimensional Fourier transform can differ in the choice of a constant in front of the integral.
Discrete Fourier transform (in the English literature DFT, Discrete Fourier Transform) is one of the Fourier transforms widely used in digital signal processing algorithms (its modifications are used in audio compression to MP3, image compression in JPEG, etc.), as well as in others areas related to the analysis of frequencies in a discrete (for example, digitized analog) signal. The discrete Fourier transform requires a discrete function as an input. Such functions are often created by discretization (sampling values from continuous functions). Discrete Fourier transforms help solve partial differential equations and perform operations such as convolutions. Discrete Fourier transforms are also actively used in statistics, in the analysis of time series. There are multidimensional discrete Fourier transforms.
Direct Transform:

Inverse Transform: The

Discrete Fourier Transform is a linear transform that translates a vector of time samples into a vector of spectral samples of the same length. Thus, the transformation can be implemented as multiplying a symmetric square matrix by a vector:

One of the remarkable properties of Fourier transforms is the ability to quickly calculate the correlation of two functions defined, either on a real argument (using the classical formula) or on a finite ring (when using discrete transforms).
And although similar properties are inherent in many linear transformations, for practical use, to calculate the convolution operation, according to our definition, the formula is used
Where
It is quite easy to verify the correctness of the equality by explicitly substituting the Fourier transform formulas and reducing the resulting formulas
Let be(t) is equal to the correlation obtained as a result of the shift of one vector relative to the other by step t
Then, as already shown,
If implementations of the Fourier transform algorithm through complex numbers are used, then such transformations have one more remarkable property:
Where CONJUGATE (FFT (G)) is a matrix composed of conjugate elements of the matrix FFT (G)
Thus, we obtain
When using the formula for estimating the distance between images when shifting (i, j) relative to each other
we get that
Where
When solving the problem of finding one sample, additional normalization of the sample is redundant, and the calculation of the norm for the common part can be replaced by the sum of the brightness of pixels in this common part or by the sum of the squares of brightness in this common part
When using the formula to estimate the distance between images at shift (i, j) relative to each other
we get that
Where
Note:
When using the discrete Fourier transform, the matrix M also contains elements from cyclic shift of images between themselves. Therefore, if it is not necessary to analyze the cyclic shift of frames, then the search for the maximum element in the matrix M must be limited to the region (0,0) - (N1-M1, N2-M2).
The implemented algorithms are part of the open source library FFTTools. Internet Address: github.com/dprotopopov/FFTTools
Software Used

As a rule, to calculate the distances between images, a formula is used, which is the sum of the modules or squares of the intensity differences:
d (X, Y) = SUM (X [i, j] - Y [i, j]) ^ 2
If, in addition to a simple comparison of two images, it is necessary to solve the problem of detecting the position of a fragment of one image in another, then the classical “entry-level” method, which consists in enumerating all the coordinates and calculating the distance using the above formula, usually fails in practical use due to the required large number computing.
One of the methods that can significantly reduce the number of calculations is the use of Fourier transforms and discrete Fourier transforms to calculate the measure of coincidence of two images at different displacements between them. In this case, calculations occur simultaneously for various combinations of image shifts relative to each other.
The presence of a large number of libraries that implement Fourier transforms (in all possible versions of the fast versions) makes the implementation of image comparison algorithms not a very difficult task for programming.
Formulation of the problem
- Let two images X and Y be given - an image and a sample, of sizes (N1, N2) and (M1, M2), respectively, and Ni> Mi
- It is required to find the coordinates of sample Y in the full image X and calculate the estimated value - a measure of proximity.
For example, find:
sample

in the image

Correlation as a measure between images
By definition, correlation
= SUM F (i) * G (i)
This value is well known from the course of mathematics and geometry, devoted to linear spaces, where it is called the scalar product. We will use the formula as a measure between images:
m (X, Y) = SUM (X [i, j] * Y [i, j]) / (SQRT (SUM X [i, j] ^ 2) * SQRT (SUM Y [i, j] ^ 2) )
or
m (X, Y) =/ (SQRT ( ) * SQRT ( )))
This value is obtained from the scalar product operation of vectors (considering images as vectors in multidimensional space). And even more - the same formula represents the standard statistical formula of the criterion for the hypothesis of the coincidence of two probability distributions.
Note:
When calculating the correlation between fragments of images, if one image is smaller than the other, we will only divide by the value of the norms of the intersecting parts.
Convolution of two functions
By definition, the convolution of two functions F and G is the function FxG:
FхG (t) = SUM F (i) * G (j) | i + j = t
Let G '(t) = G (-t) and F' (t) = F (-t), then the equality is obvious:
- FхF '(0) = SUM F (i) ^ 2 - the scalar product of the vector F by itself
- GхG '(0) = SUM G (j) ^ 2– the scalar product of the vector G by itself
- FхG '(0) = SUM F (i) * G (i) - the scalar product of two vectors F and G
It is also obvious that FxG '(t) is equal to the correlation obtained by shifting one vector relative to another by step t (this can easily be verified by explicitly substituting the values in the correlation formula).
Fourier Transform
The Fourier transform (ℱ) is an operation that maps one function of a real variable to another function, also a real variable. This new function describes the coefficients ("amplitudes") in the decomposition of the original function into elementary components - harmonic oscillations with different frequencies.
The Fourier transform of the function f of a real variable is integral and is given by the following formula:

Different sources can give definitions that differ from the one given above by choosing the coefficient in front of the integral, as well as the “-" sign in the exponent. But all the properties will be the same, although the appearance of some formulas may change.
In addition, there are various generalizations of this concept.
Multidimensional Fourier Transform
The Fourier transform of the functions defined on the space ℝ ^ n is determined by the formula:

The inverse transformation in this case is given by the formula:

As before, in different sources the definition of the multidimensional Fourier transform can differ in the choice of a constant in front of the integral.
Discrete Fourier Transform
Discrete Fourier transform (in the English literature DFT, Discrete Fourier Transform) is one of the Fourier transforms widely used in digital signal processing algorithms (its modifications are used in audio compression to MP3, image compression in JPEG, etc.), as well as in others areas related to the analysis of frequencies in a discrete (for example, digitized analog) signal. The discrete Fourier transform requires a discrete function as an input. Such functions are often created by discretization (sampling values from continuous functions). Discrete Fourier transforms help solve partial differential equations and perform operations such as convolutions. Discrete Fourier transforms are also actively used in statistics, in the analysis of time series. There are multidimensional discrete Fourier transforms.
Discrete Transform Formulas
Direct Transform:

Inverse Transform: The

Discrete Fourier Transform is a linear transform that translates a vector of time samples into a vector of spectral samples of the same length. Thus, the transformation can be implemented as multiplying a symmetric square matrix by a vector:

Fourier transforms for convolution calculation
One of the remarkable properties of Fourier transforms is the ability to quickly calculate the correlation of two functions defined, either on a real argument (using the classical formula) or on a finite ring (when using discrete transforms).
And although similar properties are inherent in many linear transformations, for practical use, to calculate the convolution operation, according to our definition, the formula is used
FxG = BFT (FFT (F) * FFT (G))
Where
- FFT - direct Fourier transform operation
- BFT - Inverse Fourier Transform Operation
It is quite easy to verify the correctness of the equality by explicitly substituting the Fourier transform formulas and reducing the resulting formulas
Fourier transforms for calculating correlation
Let be
Then, as already shown,
(t) = FxG '(t) = BFT (FFT (F) * FFT (G'))
If implementations of the Fourier transform algorithm through complex numbers are used, then such transformations have one more remarkable property:
FFT (G ') = CONJUGATE (FFT (G))
Where CONJUGATE (FFT (G)) is a matrix composed of conjugate elements of the matrix FFT (G)
Thus, we obtain
(t) = BFT (FFT (F) * CONJUGATE (FFT (G)))
Fourier transforms to solve the problem
When using the formula for estimating the distance between images when shifting (i, j) relative to each other
m (X, Y) (i, j) =(i, j) / (| X | (i, j)) * | Y | (i, j)),
we get that
= XxY '= BFT (FFT (X) * CONJUGATE (FFT (Y))) - | X | ^ 2 =
= XxX'xE '= BFT (FFT (X) * CONJUGATE (FFT (X)) * CONJUGATE (FFT (E))) = BFT (SQUAREMAGNITUDE (FFT (X)) * CONJUGATE (FFT (E))) - | Y | ^ 2 =
= YxY'xE '= BFT (FFT (Y) * CONJUGATE (FFT (Y)) * CONJUGATE (FFT (E)))
Where
(i, j) is the scalar product of two images obtained by shifting (i, j) relative to each other images X and Y - E is an image of size equal to the minimum sizes of X and Y, and filled with unit values (that is, a “frame” in which X and Y are compared)
- | X | (i, j) is the norm of the general part of the image X with a shift (i, j)
- | Y | (i, j) is the norm of the common part of the image Y when shifting (i, j)
- FFT - operation of direct two-dimensional discrete Fourier transform
- BFT - operation of the inverse two-dimensional discrete Fourier transform
- CONJUGATE - operation of calculating a matrix of conjugate elements
- SQUAREMAGNITUDE– operation of calculating the matrix of squares of element amplitudes
Simplification of formulas to solve the problem
When solving the problem of finding one sample, additional normalization of the sample is redundant, and the calculation of the norm for the common part can be replaced by the sum of the brightness of pixels in this common part or by the sum of the squares of brightness in this common part
When using the formula to estimate the distance between images at shift (i, j) relative to each other
m (X, Y) (i, j) =(i, j) / | X | ^ 2 (i, j),
we get that
= BFT (FFT (X) * CONJUGATE (FFT (Y))) = BFT (SQUAREMAGNITUDE (FFT (X)) * CONJUGATE (FFT (E)))
Where
(i, j) is the scalar product of two images obtained by shifting (i, j) relative to each other images X and Y - E is an image of size equal to the minimum sizes of X and Y, and filled with unit values (that is, a “frame” in which X and Y are compared)
(i, j) is the norm (the sum of the brightness of pixels) of the total part of the image X when shifting (i, j) - FFT - operation of direct two-dimensional discrete Fourier transform
- BFT - operation of the inverse two-dimensional discrete Fourier transform
- CONJUGATE - operation of calculating a matrix of conjugate elements
- SQUAREMAGNITUDE– operation of calculating the matrix of squares of element amplitudes
Fragment search algorithm in full image
- Let two images X and Y be given - an image and a sample, of sizes (N1, N2) and (M1, M2), respectively, and Ni> Mi
- It is required to find the coordinates of sample Y in the full image X and calculate the estimated value - a measure of proximity.
- Extend image Y to size (N1, N2), padding it with zeros
- Form an image E from units of size (M1, M2) and expand to size (N1, N2), adding zeros to it
- Calculate
= BFT (FFT (X) * CONJUGATE (FFT (Y))) - Calculate
= BFT (SQUAREMAGNITUDE (FFT (X)) * CONJUGATE (FFT (E))) - Calculate M [i, j] = (f +
[i, j]) / (f + [i, j]) - In the matrix M, find the element with the maximum value - the coordinates of this element are the desired position of the sample in the full image, and the value is equal to the estimate of the measure of comparison.
Note:
When using the discrete Fourier transform, the matrix M also contains elements from cyclic shift of images between themselves. Therefore, if it is not necessary to analyze the cyclic shift of frames, then the search for the maximum element in the matrix M must be limited to the region (0,0) - (N1-M1, N2-M2).
Implementation examples
The implemented algorithms are part of the open source library FFTTools. Internet Address: github.com/dprotopopov/FFTTools
Software Used
- Microsoft Visual Studio 2013 C # - environment and programming language
- EmguCV / OpenCV - C ++ library of structures and algorithms for image processing
- FFTWSharp / FFTW - C ++ library implementing fast discrete Fourier transform algorithms
///
/// Catch pattern bitmap with the Fastest Fourier Transform
///
/// Matrix of values
private Matrix Catch(Image image)
{
const double f = 1.0;
int length = image.Data.Length;
int n0 = image.Data.GetLength(0);
int n1 = image.Data.GetLength(1);
int n2 = image.Data.GetLength(2);
Debug.Assert(n2 == 1);
// Allocate FFTW structures
var input = new fftw_complexarray(length);
var output = new fftw_complexarray(length);
fftw_plan forward = fftw_plan.dft_3d(n0, n1, n2, input, output,
fftw_direction.Forward,
fftw_flags.Estimate);
fftw_plan backward = fftw_plan.dft_3d(n0, n1, n2, input, output,
fftw_direction.Backward,
fftw_flags.Estimate);
var matrix = new Matrix(n0, n1);
double[,,] patternData = _patternImage.Data;
double[,,] imageData = image.Data;
double[,] data = matrix.Data;
var doubles = new double[length];
// Calculate Divisor
Copy(patternData, data);
Buffer.BlockCopy(data, 0, doubles, 0, length*sizeof (double));
input.SetData(doubles.Select(x => new Complex(x, 0)).ToArray());
forward.Execute();
Complex[] complex = output.GetData_Complex();
Buffer.BlockCopy(imageData, 0, doubles, 0, length*sizeof (double));
input.SetData(doubles.Select(x => new Complex(x, 0)).ToArray());
forward.Execute();
input.SetData(output.GetData_Complex().Zip(complex, (x, y) => x*Complex.Conjugate(y)).ToArray());
backward.Execute();
IEnumerable doubles1 = output.GetData_Complex().Select(x => x.Magnitude);
if (_fastMode)
{
// Fast Result
Buffer.BlockCopy(doubles1.ToArray(), 0, data, 0, length*sizeof (double));
return matrix;
}
// Calculate Divider (aka Power)
input.SetData(doubles.Select(x => new Complex(x*x, 0)).ToArray());
forward.Execute();
complex = output.GetData_Complex();
CopyAndReplace(_patternImage.Data, data);
Buffer.BlockCopy(data, 0, doubles, 0, length*sizeof (double));
input.SetData(doubles.Select(x => new Complex(x, 0)).ToArray());
forward.Execute();
input.SetData(complex.Zip(output.GetData_Complex(), (x, y) => x*Complex.Conjugate(y)).ToArray());
backward.Execute();
IEnumerable doubles2 = output.GetData_Complex().Select(x => x.Magnitude);
// Result
Buffer.BlockCopy(doubles1.Zip(doubles2, (x, y) => (f + x*x)/(f + y)).ToArray(), 0, data, 0,
length*sizeof (double));
return matrix;
}
///
/// Copy 3D array to 2D array (sizes can be different)
/// Flip copied data
/// Reduce last dimension
///
/// Input array
/// Output array
private static void Copy(double[,,] input, double[,] output)
{
int n0 = output.GetLength(0);
int n1 = output.GetLength(1);
int m0 = Math.Min(n0, input.GetLength(0));
int m1 = Math.Min(n1, input.GetLength(1));
int m2 = input.GetLength(2);
for (int i = 0; i < m0; i++)
for (int j = 0; j < m1; j++)
output[i, j] = input[i, j, 0];
for (int k = 1; k < m2; k++)
for (int i = 0; i < m0; i++)
for (int j = 0; j < m1; j++)
output[i, j] += input[i, j, k];
}
///
/// Copy 3D array to 2D array (sizes can be different)
/// Replace items copied by value
/// Flip copied data
/// Reduce last dimension
///
/// Input array
/// Output array
/// Value to replace copied data
private static void CopyAndReplace(double[,,] input, double[,] output, double value = 1.0)
{
int n0 = output.GetLength(0);
int n1 = output.GetLength(1);
int m0 = Math.Min(n0, input.GetLength(0));
int m1 = Math.Min(n1, input.GetLength(1));
int m2 = input.GetLength(2);
for (int i = 0; i < m0; i++)
for (int j = 0; j < m1; j++)
output[i, j] = value;
}
///
/// Find a maximum element in the matrix
///
/// Matrix of values
/// Index of maximum element
/// Index of maximum element
/// Value of maximum element
public void Max(Matrix matrix, out int x, out int y, out double value)
{
double[,] data = matrix.Data;
int n0 = data.GetLength(0);
int n1 = data.GetLength(1);
value = data[0, 0];
x = y = 0;
for (int i = 0; i < n0; i++)
{
for (int j = 0; j < n1; j++)
{
if (data[i, j] < value) continue;
value = data[i, j];
x = j;
y = i;
}
}
}
///
/// Catch pattern bitmap with the Fastest Fourier Transform
///
/// Array of values
public Matrix Catch(Bitmap bitmap)
{
using (var image = new Image(bitmap))
return Catch(image);
}
Caught that biting

Literature
- A.L. Dmitriev. Optical methods of information processing. Tutorial. SPb. SPYUGUITMO 2005.46 p.
- A.A. Akayev, S.A. Mayorov “Optical methods of information processing” M.: 1988
- J. Goodman "Introduction to Fourier Optics" M .: World 1970