Fourier transform. The fast and the furious

Often, when developing algorithms, we run into the limit of computational complexity, which, it would seem, is impossible to overcome. Fourier transform has complexity$ O (n ^ 2) $, and a quick version proposed around 1805 by Gaus 1 (and reinvented in 1965 by James Cooley and John Tukey)$ O (nlog (n)) $. In this article I want to show you that you can get the conversion results in linear time.$ O (n) $ or even reach constant difficulty $ O (1) $ under certain conditions that occur in real problems.


When I was faced with the task of writing a program for analyzing the transfer functions of sound systems in real time, I, like everyone else, first turned to fast conversion. Everything was fine, but with a large time window, the CPU load became indecently large and we had to do something about it. It was decided to pause and study the transformation again, and at the same time to look for ways to solve the problem. Returning to the original Joseph Fourier transform 2 :

$ f (x) = \ sum \ limits _ {- \ infty} ^ {+ \ infty} c_ke ^ {2 \ pi ikx / T} \\ c_k = \ frac {1} {T} \ int \ limits_0 ^ Tf ( x) e ^ {- 2 \ pi ikx / T} dx $



Let's look carefully what is happening here. Each output value in the frequency domain$ c_k $ is the sum of all input signal values $ f (x) $ multiplied by $ e ^ {- 2 \ pi ikx / T} $. To calculate, we need to go through all input data for each output value, i.e. perform those same$ n ^ 2 $ operations.

Getting rid of n


Let me remind you that initially there was the task of analyzing audio data in real time. For this, the selected time window (essentially a buffer) of size N is filled with data with a frequency f d corresponding to the sampling frequency. With the period T, the input data are converted from the time window to the frequency one. If you look at real numbers, then N varies from 2 14 (16 384) to 2 16 (65 536) samples (the values ​​are inherited from the FFT, where the window size must be a power of two). The time is T = 80ms (12.5 times per second), which allows you to see quite conveniently the changes and not overload the CPU and GPU. F sampling rate dstandard and is 48kHz. Let's calculate how much data in the time window changes between measurements. During T, the buffer enters.$ 48,000 * 0.08 = $ 3840samples Thus, only 5% to 23% of the data is updated in the window. In the worst case, 95% (and at best 73%, which is also quite a lot!) Processed samples will again and again get into the conversion, despite the fact that they were already processed in previous iterations.

At this moment, the attentive reader will raise his hand and say: “Wait, what about the coefficient$ e ^ {- 2 \ pi ikx / T} $? After all, with each new transformation, the same data will be located in the new positions of the series, and as a result will have different coefficients? ”For each five, for attentiveness, let us recall an important detail in the transformation that is often forgotten. In the study of the values ​​of the function$ f (t) $on the interval from 0 to t, the function is considered periodic, which allows you to safely shift the function to the left or right in time. As a result, we have the full right not to insert a new value into the end and delete the old one from the beginning, but to cyclically replace the data in the buffer.

For clarity, you can write in a table as the buffer will change:
t = 0f (0)f (1)f (2)f (3)f (4)f (5)f (6)f (7)f (8)f (9)
t = 1f (10)f (1)f (2)f (3)f (4)f (5)f (6)f (7)f (8)f (9)
t = 2f (10)f (11)f (2)f (3)f (4)f (5)f (6)f (7)f (8)f (9)
t = 3f (10)f (11)f (12)f (3)f (4)f (5)f (6)f (7)f (8)f (9)
t = 4f (10)f (11)f (12)f (13)f (4)f (5)f (6)f (7)f (8)f (9)

You can write how the transformation in time changes from t 1 to t 2 :

$ F_t = F_ {t-1} + \ Delta F \\ \ Delta F: \ Delta c_k = \ frac {1} {T} \ int \ limits_ {t_1} ^ {t_2} (f_t (x) - f_ { t-1} (x)) e ^ {- 2 \ pi ikx / T} dx $


Value $ F_ {t-1} (x) $ is the result of the previous conversion, and the complexity of the calculation $ \ Delta f (x) $does not depend on the size of the time window and, therefore, is constant. As a result, the complexity of the conversion will be$ O (n) $* because all we have to do is to go through the frequency window once and apply the changes to the samples that have changed during T. Separately, I want to draw your attention that the coefficients$ e ^ {- 2 \ pi ikx / T} $can be calculated in advance, which gives an additional gain in performance, and the cycle will only have two operations: subtracting real numbers and multiplying a real number by a complex one, in practice both of these operations are simple and cheap.

For completeness, it remains only to indicate the initial state, but everything is simple:

$ F_0 (x) = 0 $


* - of course, the ultimate complexity of the whole transformation will remain $ O (n ^ 2) $, but it will be executed gradually, over n iterations, until the buffer is updated. $ O (n) $ - this is the complexity of updating the data, but this is exactly what we need (when using FFT, the complexity of each transformation $ O (nlog (n)) $).

And what if you dig deeper. Or get rid of the second n


I just want to make a reservation that the further steps are applicable only if you do not plan to reverse the result for the result (in order to correct the signal or obtain an impulse response). First of all, I want to remind you that as a result of the transformation, we get a symmetric array of data, which immediately reduces the number of transformations in half.

Now let's analyze the resulting data set, taking into account the conditions of the problem. We have a set of complex numbers, each of which describes the amplitude and phase of the oscillations at a certain frequency. Frequency can be determined by the formula:$ f [j] = j \ frac {fd} {N} $ for $ j <\ frac {N} {2} $. Let's estimate the frequency window step on our data:$\Delta f = \frac{fd}{N}$For N = 2, 14 : 2.93 Hz (and for 2, 16 : 0.73 Hz). Thus, in the range from 1 kHz to 2 kHz, we get 341 results. Try to independently assess how much data will be in the range from 8 kHz to 16 kHz for N = 65536. A lot, right? Lots of! Do we need so much data? Of course, in the tasks of displaying the frequency characteristics of sound systems, the answer is no. On the other hand, for analysis in the low-frequency region, a small step is very useful. Do not forget that there is still ahead of schedule, which should these volumes ($\frac{N}{2}$) convert to a human-readable form (averaging, spline or smoothing) and display them on the screen. Even at high frequencies, even having a 4K screen and displaying the graph in full screen mode with a logarithmic frequency axis, the step size will quickly turn out to be much less than 1 pixel.

Experimentally, you can find out that it is enough to have only 48 points per octave, and in order to have data a little more smooth and averaged, I suggest staying at 96. In the audio frequency range from 20 Hz to 20 kHz, it is easy to count only 10 octaves: 20, 40, 80 , 160, 320, 640, 1280, 2560, 5120, 10240, 20480, each of which can be divided into a given number of subranges (do not forget that the division should be performed geometrically, not arithmetically), therefore, it is more than enough to perform the conversion only for 960 frequencies to get Performan that in 16 ... 65 times smaller than the original version.

Thus, combining both approaches, we obtain the constant complexity of the execution of the data update algorithm.$O(1)$.

Honey squared and fly in the ointment


Now we can safely say that the complexity $O(n^2)$ we came to difficulty $O(1)$ using two simple life hacking:

  • after analyzing the task, we noticed that the data is added gradually, and the period of complete updating of the time window is much higher than the period of transformations and went on to calculate the difference of the Fourier transform.
  • left from the arithmetic step in the frequency window to limited only by given values, which allows to drastically reduce the number of conversions.

But, of course, life would really be a fairy tale, if not for one thing. The use of these two approaches allowed us to really unload the CPU so that it guesses that it calculates the Fourier transform and displays the results on the screen even with$N=2^{20}$it was difficult. But the punishment did not keep itself waiting, when your signals are not periodic in reality (and this is necessary to get the correct conversion results) and finding the appropriate window size is not possible, there is a need to use various window functions, which no longer allows you to fully use the first step. Practice has shown that the use of window functions is critical in the study of signals with a frequency less than$0,1f_d$. At high frequencies, the number of periods trapped in the time window significantly and reduces the distortions arising due to the presence of a gap of the first kind (between f (0) and f (N-1)) in the original function.

I also refused from the second step and returned to the FFT, since the gain in this problem was already small.

Finally


The first approach can be used if your data has a pronounced periodic nature and needs to be analyzed over time using a large time window, which, I recall, does not have to be a power of 2, i.e. any natural number.
The second approach is applicable (even taking into account window functions), if only a certain, small set of frequencies are analyzed in the data.

Alas, for me in this problem, it remains only a little mathematical entertainment, but I hope that it will inspire you to study other algorithms on holidays in terms of input data changes over time :)

Literature


  1. Cooley – Tukey FFT algorithm
  2. E.A.Vlasova Rows. Publisher MGTU im. E. Bauman. Moscow. 2002

The image was taken from the manio of Michio Shibuya. “Entertaining mathematics. ANALYSIS OF FOURIER

Also popular now: