# Statistical verification of randomness of binary sequences by NIST methods

Anyone who, one way or another, has come across cryptography knows that random number generators cannot do this. One of the possible uses of such generators, for example, is key generation. But not everyone thinks at the same time, but how “good” this or that generator is. And even if I thought about it, I was faced with the fact that in the world there is no single “official” set of criteria that would evaluate how much these random numbers are applicable specifically for this area of ​​cryptography. If the sequence of random numbers is predictable, then even the strongest encryption algorithm in which this sequence will be used turns out to be vulnerable - for example, the space of possible keys that need to be “sorted out” by the attacker to obtain some information is sharply reduced, with the help of which he can “hack” the whole system. Fortunately, different organizations are still trying to put things in order here, in particular, the American Institute for Standards NIST has developed a set of tests to assess the randomness of a sequence of numbers. They will be discussed in this article. But first, a little theory (I’ll try to present it is not tedious).

## Random binary sequences

Firstly, by generating random numbers is meant to obtain a sequence of binary characters 0 and 1, and not bytes, no matter how programmers would like. An ideal generator of this kind is tossing an “ideal” coin (a flat coin with the same probability of falling out of each side), which would be tossed as many times as needed, but the problem is that nothing ideal exists, and the performance of such an generator would leave better (one coin grew = one bit). Nevertheless, all the tests described below evaluate how much the random number generator under study is “similar” or “not similar” to an imaginary ideal coin (not by the speed of obtaining “random” characters, but by their “quality”).

Secondly, all random number generators are divided into 2 types — truly random — physical random number generators / sensors (DSPH / FDSCH) and pseudo-random - software random number generators / generators (MAP). The first ones take some random endless process as input, and the output gives an endless (depending on the observation time) sequence 0 and 1. The second are a determinate function specified by the developer, which is initialized by the so-called grain, after which it also gives the sequence 0 and 1 at the output. Knowing this grain, one can predict the whole sequence. A good MAP is one for which it is impossible to predict subsequent values, having the entire history of previous values, without grain. This property is called direct unpredictability. There is also the opposite unpredictability - the inability to calculate the grain,

It would seem that the easiest way is to take a truly random / physical DSP and not think about any predictability. However, there are problems:

• A random phenomenon / process, which is taken as a basis, may not be able to produce numbers at the right speed. If you recall the last time you generated a pair of 2048bit keys, then do not flatter yourself. Does this happen very rarely? Then imagine yourself a server accepting hundreds of requests for SSL connections per second (SSL handshake involves generating a pair of random numbers).
• In appearance, random phenomena may not be as random as it would seem. For example, electromagnetic noise may be a superposition of several more or less uniform periodic signals.

Each of the tests offered by NIST receives a finite sequence of input. Next, statistics are calculated that characterize a certain property of a given sequence — it can be either a single value or a set of values. After that, these statistics are compared with the reference statistics, which will give a perfectly random sequence. Reference statistics are derived mathematically, many theorems and scientific papers are devoted to this. At the end of the article, all references to sources where the necessary formulas are displayed will be given.

## Zero and alternative hypotheses

The basis of the tests is the concept of the null hypothesis . I’ll try to explain what it is. Suppose we have gathered some statistical information. For example, let it be the number of people with lung cancer in a group of 1000 people. And let it be known that some people from this group are smokers, while others are not, and it is known which ones specifically. The next task is: to understand whether there is a relationship between smoking and disease. The null hypothesis is the assumption that there is no relationship between the two facts. In our example, this is the assumption that smoking does not cause lung cancer. There is also an alternative hypothesis that refutes the null hypothesis: i.e. there is a relationship between phenomena (smoking causeslungs' cancer). If we pass to the terms of random numbers, then the assumption is made for the null hypothesis that the sequence is truly random (the signs of which appear equally and independently from each other). Therefore, if the null hypothesis is true, then our generator produces sufficiently “good” random numbers.

How is a hypothesis tested? On the one hand, we have statistics calculated on the basis of actually collected data (i.e., according to the measured sequence). On the other hand, there is reference statistics obtained by mathematical methods (theoretically calculated), which wouldhad a truly random sequence. Obviously, the collected statistics cannot be compared with the reference - no matter how good our generator is, it is still not perfect. Therefore, they introduce a certain error, for example 5%. It means that if, for example, the collected statistics deviate from the reference by more than 5%, it is concluded that the null hypothesis is not true with great reliability .

Since we are dealing with hypotheses, there are 4 options for the development of events:
1. The conclusion is made that the sequence is random, and this is the right conclusion.
2. It is concluded that the sequence is not random, although it was actually random. Such errors are called errors of the first kind.
3. The sequence is recognized as random, although in fact it is not. Such errors are called errors of the second kind.
4. Sequence rightly rejected

The probability of an error of the first kind is called the level of statistical significance and is denoted as α. Those. α is the probability of rejecting a “good” random sequence. This value is determined by the application. In cryptography, it is customary to take α from 0.001 to 0.01.

In each test, the so-called P-value : this is the probability that the experimental generator will produce a sequence no worse than a hypothetical true one. If P value = 1, then our sequence is perfectly random, and if it = 0, then the sequence is completely predictable. Subsequently, the P-value is compared with α, and if it is greater than α, then the null hypothesis is accepted and the sequence is recognized as random. Otherwise, it is rejected.

In tests, α = 0.01 is taken. It follows that:
• If the P-value is ≥ 0.01, then the sequence is considered random with a confidence level of 99%
• If the P-value is <0.01, then the sequence is rejected with a confidence level of 99%

So, we pass directly to the tests.

## Frequency Bit Test

Obviously, the more random the sequence, the closer this ratio is to 1. This test evaluates how close this ratio is to 1.

We take each “1” for +1, and each “0” for -1 and consider the sum over the entire sequence. This can be written as follows:
S n = X 1 + X 2 + ... + X n , where X i = 2x i - 1.
By the way, they say that the distribution of the number of “successes” in a series of experiments, where success or failure is possible in each experiment with a given probability, has a binomial distribution.

Take the following sequence: 1011010101

Then S = 1 + (-1) + 1 + 1 + (-1) + 1 + (-1) + 1 + (-1) + 1 = 2

Calculate the statistics:

Calculate the P-value through an additional error function :

Additional function errors (complementary error function) is defined as follows:

We see that the result is> 0.01, which means our sequence passed the test. It is recommended to test sequences of at least 100 bits in length.

## Frequency Block Test

This test is done on the basis of the previous one, only now the proportions “1” / “0” for each block are analyzed by the Chi-square method. It is clear that this ratio should be approximately equal to 1.

For example, let the sequence 0110011010 be given. Divide it into blocks of 3 bits (the “ownerless” 0 at the end is discarded):
011 001 101

We calculate the proportions π i for each block: π 1 = 2 / 3, π 2 = 1/3, π 3 = 1/3. Next, we calculate the statistics by the Chi-square method with N degrees of freedom (here N is the number of blocks):

Calculate the P-value through a special function Q :

Q is the so-called. incomplete upper gamma function , defined as:

In this case, the function Г is the standard gamma function:

The sequence is considered random if the P-value is> 0.01. It is recommended to analyze sequences of at least 100 bits in length, and the relationships M> = 20, M> 0.01n, and N <100 must also be satisfied.

## Test for identical consecutive bits

The test searches for all sequences of identical bits, and then analyzes how much the number and sizes of these sequences correspond to the number and sizes of a truly random sequence. The point is that if the change from 0 to 1 (and vice versa) is too rare, then such a sequence "does not pull" on a random one.

Let the sequence 1001101011 be given. First, we calculate the fraction of units in the total mass:

Next, the condition is checked:

If it is not satisfied, then the whole test is considered unsuccessful and that’s all. In our case, 0.63246> 0.1, which means we go further.

We calculate the total number of alternating sign V:

where if , or otherwise.

We calculate the P-value through the error function:

If the result> = 0.01 (as in our example), then the sequence is considered random.

## Test for the longest sequence of units in a block

The initial sequence of n bits is divided into N blocks, each of M bits, after which the longest sequence of units is searched in each block, and then it is estimated how close the indicator is to the same indicator for a truly random sequence. Obviously, a similar test for zeros is not required, since if the units are well distributed , then the zeros will also be well distributed .

What is the length of the block? NIST recommends several reference values ​​for how to break into blocks:
Total length nBlock length, M
1288
6272128
750,00010,000

Let the following sequence be given:
11001100 00010101 01101100 01001100 11100000 00000010
01001101 01010001 00010011 11010110 10000000 11010111
11001100 11100110 11011000 10110010

We divide it into blocks of 8 bits (M = 8), after which we calculate the maximum sequence of units for each block:
BlockUnit Length
110011002
000101011
011011002
010011002
111000003
000000101
010011012
010100011
000100112
110101102
10,000,0001
110101113
110011002
111001103
110110002
101100102

Next, we consider statistics for different lengths based on the following table:
v iM = 8M = 128M = 10000
v 0≤1& le4& le10
v 125eleven
v 23612
v 3≥47thirteen
v 4814
v 5≥9fifteen
v 6≥16

How to use this table: we have M = 8, so we look only at one corresponding column. We consider v i :
v 0 = { number of blocks with max. length ≤ 1 } = 4
v 1 = { number of blocks with max. length = 2 } = 9
v 2 = { number of blocks with max. length = 3 } = 3
v 3 = { number of blocks with max. length ≥ 4 } = 0

Calculate the Chi-square:

Where the values ​​of K and R are taken based on such a table:
MKR
8316
128549
10,000675

The theoretical probabilities π i are given by constants. For example, for K = 3 and M = 8, it is recommended to take π 0 = 0.2148, π 1 = 0.3672, π 2 = 0.2305, π 3 = 0.1875. (Values ​​for other K and M are given in [2]).

Next, we calculate the P-value:

If it is> 0.01, as in our example, then the sequence is considered rather random.

## Binary Matrix Rank Test

The test analyzes matrices that are composed of the original sequence, namely, it calculates the ranks of disjoint submatrices constructed from the original binary sequence. The test is based on the research of Kovalenko [6], where the scientist investigated random matrices consisting of 0 and 1. He showed that it is possible to predict the probabilities that the matrix M x Q will have rank R, where R = 0,1,2, ... min (M, Q). These probabilities are equal:

NIST recommends taking M = Q = 32, and also so that the length of the sequence is n = M ^ 2 * N. But for example, we take M = Q = 3. Next, we need the probabilities P M , P M-1 and P M-2 . With a small fraction of error, the formula can be simplified, and then these probabilities are equal:

So, let the sequence 01011001001010101101 be given. We “decompose” it into matrices - enough for 2 matrices:

Determine the rank of the matrices: it turns out R 1 = 2, R 2 = 3. For the test you need 3 numbers:
• F M = { number of matrices with rank M } = { number of matrices with rank 3 } = 1
• F M-1 = 1 (similar)
• N - F M - F M-1 = 2 - 1 - 1 = 0
Calculate the Chi-square:

Calculate the P-value:

If the result is> 0.01, then the sequence is considered random. NIST recommends that the total sequence length be> = 38MQ.

## Spectral test

The experimental sequence is considered as a discrete signal for which spectral decomposition is done to identify frequency peaks. Obviously, such peaks will indicate the presence of periodic components, which is not a gut. In short, the test reveals peaks that exceed the 95% barrier, and then checks to see if the proportion of these peaks exceeds 5%.

As you might guess, we will use the discrete Fourier transform to represent the sequence as the sum of the periodic components. It looks like this:

Here x k is the initial sequence in which one corresponds to +1, and -1 to zero, X j- the obtained values ​​of complex amplitudes (complex means that they contain both the real value of the amplitude and the phase).

You may ask, where is the frequency here? The answer is that the exponent can be expressed in terms of trigonometric functions:

For our test, it is not the phases that are interesting, but the absolute values ​​of the amplitudes. And if we calculate these absolute values, then it turns out that they are symmetric (this is a well-known fact when moving from complex values ​​to real ones), so for further consideration we take only half of these values ​​(from 0 to n / 2) - the rest do not carry additional information.

We will show all this with an example. Let the sequence 1001010011 be given.
Then x = {1, -1, -1, 1, -1, 1, -1, -1, 1, 1}.

Here is how Fourier decomposition can be done, for example, in the GNU Octave program:
``````octave:1> x = [1, -1, -1, 1, -1, 1, -1, -1, 1, 1]
x =
1  -1  -1   1  -1   1  -1  -1   1   1
octave:2> abs(fft(x))
ans =
0.0000  2.0000  4.4721  2.0000  4.4721  2.0000  4.4721  2.0000  4.4721  2.0000 ``````
We see that symmetry is observed. Therefore, five values ​​are enough for us: 0, 2, 4.4721, 2, 4.4721.

Next, we calculate the boundary value according to the formula.

It means that if the sequence is truly random, then 95% of the peaks should not exceed this boundary.

We calculate the limit number of peaks, which should be less than T:

Next, we look at the result of the decomposition and see that all our 4 peaks are less than the boundary value. Next, evaluate this difference:

Calculate the P-value:

It turned out> 0.01, so the random hypothesis is accepted. And yes, it is recommended to take at least 1000 bits for the test.

## Test for non-overlapping patterns

The experimental sequence is divided into blocks of the same length. For example:
`1010010010 1110010110`

In each block we will look for some pattern, for example, “001”. The word disjoint means that if the pattern is inside the sequence, the next comparison will not capture any bits of the pattern found. As a result of the search, for each ith block, the number W i equal to the number of cases found will be found .

So, for our blocks W 1 = 2 and W 2 = 1:
101 001 001 0
111 001 0110

We calculate the mathematical expectation and variance, as if our sequence were truly random. The following are formulas. Here N = 2 (number of blocks), M = 10 (block length), m = 3 (sample length).

Calculate the Chi-square:

Calculate the final P-value through an incomplete gamma function:

We see that the P-value is> 0.1, which means that the sequence is quite random.

We rated only one template. In fact, you need to check all the combinations of patterns, and besides for the different lengths of these patterns. How much of one and the other is needed is determined based on specific requirements, but usually m take 9 or 10. To get meaningful results, you should take N <100 and M> 0.01 * n.

## Test for intersecting intersecting patterns

This test differs from the previous one in that when the template is found, the search window is not shifted by the length of the template, but only by 1 bit. In order not to clutter up the article, we will not give an example of calculation by this method. It is completely similar.

## Universal Mauer Test

The test evaluates how far apart patterns are within the sequence. The meaning of the test is to understand how compressible the sequence is (of course, meaning lossless compression). The more compressible the sequence, the less random it is. The algorithm of this test is very cumbersome for the Habr format, so we omit it.

## Linear difficulty test

The test is based on the assumption that the experimental sequence was obtained through a linear feedback shift register (or LFSR, Linear feedback shift register). This is a well-known method of obtaining an infinite sequence: here each next bit is obtained as a certain function of the bits “sitting” in the register. The disadvantage of LFSR is that it always has a finite period, i.e. the sequence will necessarily be repeated sooner or later. The longer the LFSR, the better the random sequence.

The initial sequence is divided into equal blocks of length M. Further, for each block, using its Berlekamp – Massey algorithm [10], its linear complexity (L i ) is found, i.e. LFSR length. Then for all found L ichi-square distribution with 6 degrees of freedom is estimated. We show an example.

Let the block 1101011110001 (M = 13) be given, for which the Berlekamp-Massey algorithm gave L = 4. Make sure that this is so. Indeed, it is easy to guess that for this block, each next bit is obtained as the sum (modulo 2) of the 1st and 2nd bits (numbering from 1):
x 5 = x 1 + x 2 = 1 + 1 = 0
x 6 = x 2 + x 3 = 1 + 0 = 1
x 7 = x 3 + x 4 = 1 + 0 = 1
, etc.

We calculate the expectation using the formula

For each block, calculate the value of T i:

Next, based on the set T, we compute the set v 0 , ..., v 6 in this way:
• if T i <= -2.5, then v 0 ++
• if -2.5 <T i <= -1.5, then v 1 ++
• if -1.5 <T i <= -0.5, then v 2 ++
• if -0.5 <T i <= 0.5, then v 3 ++
• if 0.5 <T i <= 1.5, then v 4 ++
• if 1.5 <T i <= 2.5, then v 5 ++
• if T i > 2.5, then v 6 ++

We have 7 possible outcomes, which means we calculate the Chi-square with the number of degrees of freedom 7 - 1 = 6: The

probabilities π i in the test are hard-set and equal, respectively: 0.010417, 0.03125, 0.125, 0.5, 0.25, 0.0625, 0.020833. (π i for a larger number of degrees of freedom can be calculated by the formulas given in [2]).

Calculate the P-value:

If the result is> 0.01, then the sequence is considered random. For real tests, it is recommended to take n> = 10 ^ 6 and M in the range from 500 to 5000.

## Subsequence test

The frequency of finding all kinds of sequences of length “m” bits inside the original sequence is analyzed. Moreover, each sample is searched independently, i.e. it’s possible to “overlap” one sample found on another. Obviously, the number of all kinds of samples will be 2 m . If the sequence is sufficiently large and random, then the probabilities of finding each of these samples are the same. (By the way, if m = 1, then this test "degenerates" into the test for the ratio "0" or "1" already described previously).

The test is based on [8] and [11]. There are 2 indicators described (∇ψ 2 m and ∇ 2 ψ 2 m), which characterize how much the frequency of appearance of the samples corresponds to the same frequencies for a truly random sequence. We show the algorithm by example.

Let a sequence 0011011101 with length n = 10 and m = 3 be given.

First, 3 new sequences are formed, each of which is obtained by adding m-1 first bits of the sequence to its end. It turns out:
• For m = 3: 0011011101 00 (added 2 bits to the end)
• For m-1 = 2: 0011011101 0 (added 1 bit to the end)
• For m-2 = 1: 0011011101 (source sequence)
Next, we find the frequencies of occurrence of all blocks of length m, m-1 and m-2, respectively:
• v 000 = 0, v 001 = 1, v 010 = 1, v 011 = 2, v 100 = 1, v 101 = 2, v 110 = 2, v 111 = 0
• v 00 = 1, v 01 = 3, v 10 = 3, v 11 = 3
• v 0 = 4, v 1 = 6
We calculate the necessary statistics by the formulas:

Substitute:

Then:

Total values:

So, both P-values ​​are> 0.01, which means that the sequence is recognized as random.

## Approximate entropy

The Approximate Entropy method has initially proven itself in medicine, especially in cardiology. In general, according to the classical definition, entropy is a measure of chaos: the higher it is, the more unpredictable phenomena. For better or worse, it depends on the context. For random sequences used in cryptography, it is important to have high entropy - this means that it will be difficult to predict subsequent random bits based on what we already have. But, for example, if for a random value we take the heart rate measured with a given period, the situation is different: there are many studies (for example, [12]) that prove that the lower the heart rate variability, the less likely the heart attacks and other unpleasant phenomena . Obviously, a person’s heart cannot beat at a constant frequency. However, some die from heart attacks, and others not. Therefore, the method of approximate entropy allows us to estimate how random the appearance of the phenomenonreally random .

Specifically, the test calculates the frequency of occurrence of all kinds of samples of a given length (m), and then similar frequencies, but already for samples of length m + 1. Then, the frequency distribution is compared with the Chi-square reference distribution. As in the previous test, the samples may overlap.

We show an example. Let the sequence 0100110101 be given (length n = 10), and take m = 3.

To begin with, supplement the sequence with the first m-1 bits. It will turn out 0100110101 01.

We calculate the occurrence of each of 8 various blocks. It turns out:
k 000 = 0, k 001 = 1, k 010 = 3, k 011 = 1, k 100 = 1, k 101 = 3, k110 = 1, k 111 = 0.

We calculate the corresponding frequencies according to the formula C i m = k i / n:
C 000 3 = 0, C 001 3 = 0.1, C 010 3 = 0.3, C 011 3 = 0.1, C 100 3 = 0.1, C 101 3 = 0.3, C 110 3 = 0.1, C 111 3 = 0.

Similarly, we consider the frequency of occurrence of subunits of length m + 1 = 4. There are already 2 4 = 16:
C 0011 4 = C 0100 4= С 0110 4 = С 1001 4 = С 1101 4 = 0.1, С 0101 4 = 0.2, С 1010 4 = 0.3. The remaining frequencies = 0.

Calculate φ 3 and φ 4 (note that here is the natural logarithm):

Calculate the Chi-square:

P-value:

The resulting value> 0.01, which means that the sequence is recognized as random.

## Cumulative Amount Test

We take each zero bit of the original sequence as -1, and each single bit as +1, after which we calculate the sum. Intuitively, the more random the sequence, the faster this amount will tend to zero. On the other hand, imagine that we are given a sequence consisting of 100 zeros and 100 units in a row: 00000 ... 001111 ... 11. Here the sum will be equal to 0, but it is obvious that to call such a sequence random "the hand will not rise." Therefore, a deeper criterion is needed. And this criterion is partial amounts. We will gradually consider the amounts starting from the first element:
S 1 = x 1
S 2 = x 1 + x 2 S 3= x 1 + x 2 + x 3 ...
S n = x 1 + x 2 + x 3 + ... + x n

Next is the number z = the maximum among these sums.

Finally, the P-value is considered according to the following formula (see [9] for its derivation):

Where:

Here Φ is the distribution function of the standard normal random variable. We remind you that the standard normal distribution is the well-known Gaussian distribution (in the form of a bell), which has a mathematical expectation of 0 and a variance of 1. It looks like this:

If the resulting P-value is> 0.01, then the sequence is recognized as random.

By the way, this test has 2 modes: the first one we just examined, and in the second, the amounts are calculated starting from the last element.

## Random deviation test

This test is similar to the previous one: similarly, partial sums of a normalized sequence (i.e., consisting of -1 and 1) are considered. Let the sequence 0110110101 be given and let S (i) be a partial sum from 1 to the ith element. We draw these points on the graph, after adding “0” to the beginning and end of the sequence S (i) - this is necessary for the integrity of further calculations:

Note the points where the graph intersects the horizontal axis - these points will divide the sequence into so-called cycles . Here we have 3 cycles: {0, -1, 0}, {0, 1, 0} and {0, 1, 2, 1, 2, 1, 2, 0}. Further, it is said that each of these cycles sequentially takes on different states.. For example, the first cycle 2 times takes the state “0” and 1 time the state “-1”. For this test, states from -4 to 4 are of interest. We list all occurrences in these states in the following table:
State (x)Cycle number 1Cycle number 2Cycle number 3
-4000
-3000
-2000
-1100
1013
2003
3000
4000

Based on this table, we form another table: in it the number of cycles that take a given state will go horizontally :
State (x)Never1 time2 times3 times4 timesFive times
-4300000
-3300000
-2300000
-1210000
1110100
2200100
3300000
4300000

Then, for each of the eight states, the Chi-square of statistics is calculated by the formula

Where v k (x) are the values ​​in the table for this state, J is the number of cycles (we have 3), π k (x) are the probabilities that the state “x "Occurs k times in a truly random distribution (they are known).

For example, for x = 1 it turns out:

π For the remaining x, see [2].

We calculate the P-value:

If it is> 0.01, then a conclusion is made about randomness. As a result, you need to calculate 8 P-values. Some may be more than 0.01, some less. In this case, the final decision on the sequence is made on the basis of other tests.

## Variation of the test for arbitrary deviations

Almost similar to the previous test, but a wider set of conditions is taken: -9, -8, -7, -6, -5, -4, -3, -2, -1, +1, +2, +3, + 4, +5, +6, +7, +8, +9. But the main difference is that here the P-value is calculated not through the gamma function (igamc) and Chi-square, but through the error function (erfc). For exact formulas, the reader can refer to the source document.

Below is a list of sources that you can see if you want to delve into the topic:

1. csrc.nist.gov/groups/ST/toolkit/rng/stats_tests.html
2. csrc.nist.gov/groups/ST/toolkit/rng/documents/SP800-22rev1a.pdf
3. Central Limit Theorem
4. Anant P. Godbole and Stavros G. Papastavridis, (ed), Runs and patterns in probability: Selected papers. Dordrecht: Kluwer Academic, 1994
5. Pal Revesz, Random Walk in Random and Non-Random Environments. Singapore: World Scientific, 1990
6. I. N. Kovalenko, Probability Theory and Its Applications, 1972
7. O. Chrysaphinou, S. Papastavridis, “A Limit Theorem on the Number of Overlapping Appearances of a Pattern in a Sequence of Independent Trials.” Probability Theory and Related Fields, Vol. 79, 1988
8. IJ Good, “The serial test for sampling numbers and other tests for randomness,” Cambridge, 1953
9. A. Rukhin, “Approximate entropy for testing randomness,” Journal of Applied Probability, 2000
10. Burlekamp Algorithm - Massey
11. DE Knuth, The Art of Computer Programming. Vol. 2 & 3, 1998
12. www.ncbi.nlm.nih.gov/pubmed/8466069