# Why unit tests do not work in scientific applications

In this article I want to share my experience in developing scientific applications, and to explain why Test-Driven Development and unit tests are not a panacea, as is commonly believed in recent times, at least from the point of view of finding software errors. Why so?

The article will consist of several sections:

I must say right away that I do not consider the benefits of TDD as a way to develop an architecture, but solely from the point of view of testing the program.

During the week of aeronautics on the hub, the main article was Digital Aircraft [1], and an interesting commentary on it [2] that 20 years ago there were no unit tests and generally normal development methods, so the triple-triple scheme was common redundant architecture ”, in particular, writing the same program with three different commands in different languages in order to eliminate software errors.

Nevertheless, according to eyewitnesses ([3], by the way, this is another excellent book by the excellent scientist Richard Feynman, who gained a second life on the hub), if not Test Driven Development (TDD further), then manual testing was a very important stage of development.

That is why I had a desire to analyze the approach to the development of scientific applications and share my thoughts on this subject, in particular, to tell that TDD is not a silver bullet at all for finding errors, as you might think at first.

In the past 20 years, an unusual method has been developed for modeling hydrodynamics: the Lattice-Boltzmann Method (hereinafter - LBM) [4], [5]. I participated in the implementation of this algorithm in C ++ for the Max Planck Institute in Germany.

To avoid unavoidable questions: yes, there are already commercial and open packages using this method [6]; but it was originally planned to use a modification of the method, which was supposed to give a gain in performance and memory consumption; Nevertheless, we have consistently refused it (first for mass transfer, then for heat transfer). By that time, a lot of code had been written, and we decided to continue our own implementation.

Hydrodynamics is known to be described by the Navier-Stokes equation. When modeling it, you can use, for example, the Finite Element Method [7]. Unfortunately, it has drawbacks - the relative complexity of parallelization, the inability to simulate multiphase flows, the difficulties of modeling flows in porous media. These drawbacks are devoid of LBM.

For rarefied gases, the Boltzmann equation [8] is valid, which describes how the particle velocity distribution density at each point in space changes with time. Macroscopically, this equation is equivalent to the Navier-Stokes equation (that is, after the transition to macroscopic quantities - density and velocity).

Now watch your hands: despite the fact that the Boltzmann equation is not applicable for dense liquids, if we learn to model it, we can also model the Navier-Stokes equation for these liquids. That is, we thereby replace the underlying level of abstractions (the microscopic equation for dense liquids) with a physically incorrect implementation (the microscopic equation for rarefied gases), but so that the upper level (the macroscopic Navier-Stokes equation) is described correctly.

The next stage is the discretization of the Boltzmann equation: in time, in spatial coordinates (we obtain spatial nodes for modeling) and in the possible directions of particles in each spatial node. Directions are selected in a special way, and always point to some neighboring nodes.

Thus, these are the features that are critical for further examples: 1. complex formulas are used in the algorithm 2. the algorithm is non-local, that is, in order to calculate the state of the system in this node, you need to use the information from the previous step about neighboring nodes.

Finally, let's move on to why unit tests cannot fully cover a complex scientific application.

The main idea of the section is that complex algorithms interfere with testing ... yourself.

For complex algorithms and methods, it is hard to come up with simple tests.

This, in general, is obvious. It is often difficult to choose simple input parameters of a formula so that a simple and clear number is obtained at the output. That is, you, of course, separately on a piece of paper or in Matlab, you can calculate what parameters need to be input, in order to get one at the output of the tested method; but both the input parameters and the unit of output will be more likely magic numbers. An example is the Maxwell distribution [9]. In the LBM algorithm, it is used in a slightly modified form.

Even if you manage to come up with a simple test for a complex formula or algorithm, this test will most likely not find most of the errors.

Indeed, sometimes you can come up with a simple and easy to read test: for example, if you test how they wrote rotation matrices in your 3D engine (I think that many residents of the Habrir dabbled in this), then rotation at zero angles along all axes should return a single matrix. However, often these tests do not detect most errors. For example, if instead of the sine of the rotation angle (in the elements of the rotation matrix) you use the tangent, then such a simple test will pass.

Another example: you need to implement boundary conditions for the LBM algorithm. These conditions should complement the state of the node when the latter is not completely surrounded by “liquid” neighbors (for example, it is located near the wall). You can come up with a simple test - if all the neighbors are in equilibrium states (that is, the velocities are distributed according to the Maxwell law), then after iteration the boundary node will also be in equilibrium. However, it turned out that this test misses most of the errors ...

One can argue - in business applications, tests are also not called to cover all possible input parameters. However, the difference between scientific applications is that they usually operate with sets of different power [10]. Business applications mainly work with finite or countable sets (of finite power or Alef-zero power: Boolean variables, strings, integers), and scientific ones with uncountable sets (real or complex numbers).

Due to the fact that typical sets are uncountable, it is difficult to immediately detect an error.

The fact is that if you confuse "either" and "and" in a method that uses only Boolean variables, then you will immediately get something wrong at its output. And if you confuse the plus with the minus in the formula that works with small values of the input parameters, then you may not be able to find the error after one method call with this formula. You will need to call this method several hundred thousand times to detect a discrepancy. It is also worth remembering that in a test that works with real numbers, you need to think about how to avoid false positives.

Example: in one of the important cycles of the LBM algorithm, continue was confused with break. However, all tests passed. An error was discovered only after manually checking the algorithm on a simple system - the Poiseuille flow [11], when after 100,000 iterations, or 5 minutes (!) Of modeling the 32x64 node system (starting from the stationary state), the speed distribution turned out to be somewhat asymmetric (the maximum relative error is of the order of 1e -10).

For many methods, unit tests cannot be created.

Complex physical modeling algorithms are rarely local. That is, each spatial modeling node can interact with neighboring nodes, and you can’t avoid this in any way, and this makes unit testing impossible.

For example, to test a typical step of the LBM algorithm or typical boundary conditions in two-dimensional space, a system of at least 3x3 nodes is necessary. They need to specify densities, speeds, temperatures, make preliminary calculations, set the modeling context (such as various providers of physical properties and models), etc. As a result, the modularity of the test is gradually hidden in the haze of integration.

Another (not so characteristic) example: if you need to test the collision algorithm in the physical engine of the game, then you must specify at least two bodies that will collide, their coordinates, speeds, geometry, etc., which again turns the unit test into an integration one.

If the algorithm is nontrivial, then one public method of the class that implements this algorithm can hide 10-15 methods and 300-400 lines of complex code [12].

Even so: 300-400 lines of harsh, merciless, non-parsable code. In this class there can be 5 variables of type a, b, gamma, method calls of the LAPACK [13], BLAS libraries, random number generation, and who knows what else. And you may wish with all your heart to name the variable “a” somehow normal, because you know that when developing a corporate application you would be scolded for such a name for a long time, but you cannot, because it is called that way in the original article, and because it does not make any explicit sense. The best you can do is provide a link to the article at the beginning of the class.

And even if you come up with a simple unit test, with simple input parameters for this single public method, and with a simple output value, and it stops passing at one fine moment, then you will not be able to find an error in this class even by selfless debug or by the ultimatum method attentive reading.

After encountering such a situation, the best solution was to rewrite this algorithm on MATLAB and compare the calculations line by line.

I hope that I was able to give reasonably strong arguments and convince the Habr readers that often writing the same program in different languages by different teams is a wonderful and (in case of vital systems) inevitable addition for unit testing, TDD, Interface Driven Development, Agile and other modern techniques.

Thanks for reading!

[1] Digital airplanes

[2] Commentary on the article Digital airplanes

[3] What do you care about what others think of you , chapter “The long-suffering application”

[4] Lattice Boltzmann method

[5] Books on LBM . Some are available via google books

[6] Parallel Lattice Boltzmann Solver

[7] Finite element method

[8] Boltzmann equation

[9] Maxwell-Boltzmann distribution

[10] Cardinality

[11] Poiseuille flow

[12] Pattern Facade

[13] LAPACK

PS I guess that for most readers of the site the algorithm is somewhat exotic, however, I think this is the best illustration. After all, where without scientific algorithms in the article about them the most.

PPS I do not provide links to scientific articles, because all of them are available only by subscription.

PPS Thanks to SychevIgor for invite!

After reading the comments, I think that it is necessary to get rid of the phobia of magic numbers and write for all modules (such as BoltzmannDistributionProvider) tabular tests for comparison with calculations from Matlaba or calculations of the working state of the application. And also add similar system tests (for example, calculate the dynamics of a 5x5 node system once, write to a file, and compare the results of the run every time with this file). Thank you all for your advice!

The article will consist of several sections:

- "Introduction", in which I will explain why I decided to write it
- "Illustrative algorithm." A scientific problem and an algorithm will be briefly described here, which will illustrate further
- “Unit Test Disadvantages” Arguments will be given here to explain the disadvantages of unit tests for scientific applications.
- Conclusion

I must say right away that I do not consider the benefits of TDD as a way to develop an architecture, but solely from the point of view of testing the program.

#### Introduction

During the week of aeronautics on the hub, the main article was Digital Aircraft [1], and an interesting commentary on it [2] that 20 years ago there were no unit tests and generally normal development methods, so the triple-triple scheme was common redundant architecture ”, in particular, writing the same program with three different commands in different languages in order to eliminate software errors.

Nevertheless, according to eyewitnesses ([3], by the way, this is another excellent book by the excellent scientist Richard Feynman, who gained a second life on the hub), if not Test Driven Development (TDD further), then manual testing was a very important stage of development.

That is why I had a desire to analyze the approach to the development of scientific applications and share my thoughts on this subject, in particular, to tell that TDD is not a silver bullet at all for finding errors, as you might think at first.

#### Illustration algorithm

In the past 20 years, an unusual method has been developed for modeling hydrodynamics: the Lattice-Boltzmann Method (hereinafter - LBM) [4], [5]. I participated in the implementation of this algorithm in C ++ for the Max Planck Institute in Germany.

To avoid unavoidable questions: yes, there are already commercial and open packages using this method [6]; but it was originally planned to use a modification of the method, which was supposed to give a gain in performance and memory consumption; Nevertheless, we have consistently refused it (first for mass transfer, then for heat transfer). By that time, a lot of code had been written, and we decided to continue our own implementation.

Hydrodynamics is known to be described by the Navier-Stokes equation. When modeling it, you can use, for example, the Finite Element Method [7]. Unfortunately, it has drawbacks - the relative complexity of parallelization, the inability to simulate multiphase flows, the difficulties of modeling flows in porous media. These drawbacks are devoid of LBM.

For rarefied gases, the Boltzmann equation [8] is valid, which describes how the particle velocity distribution density at each point in space changes with time. Macroscopically, this equation is equivalent to the Navier-Stokes equation (that is, after the transition to macroscopic quantities - density and velocity).

Now watch your hands: despite the fact that the Boltzmann equation is not applicable for dense liquids, if we learn to model it, we can also model the Navier-Stokes equation for these liquids. That is, we thereby replace the underlying level of abstractions (the microscopic equation for dense liquids) with a physically incorrect implementation (the microscopic equation for rarefied gases), but so that the upper level (the macroscopic Navier-Stokes equation) is described correctly.

The next stage is the discretization of the Boltzmann equation: in time, in spatial coordinates (we obtain spatial nodes for modeling) and in the possible directions of particles in each spatial node. Directions are selected in a special way, and always point to some neighboring nodes.

Thus, these are the features that are critical for further examples: 1. complex formulas are used in the algorithm 2. the algorithm is non-local, that is, in order to calculate the state of the system in this node, you need to use the information from the previous step about neighboring nodes.

#### Unit Test Disadvantages

Finally, let's move on to why unit tests cannot fully cover a complex scientific application.

The main idea of the section is that complex algorithms interfere with testing ... yourself.

##### Lack of simple tests

For complex algorithms and methods, it is hard to come up with simple tests.

This, in general, is obvious. It is often difficult to choose simple input parameters of a formula so that a simple and clear number is obtained at the output. That is, you, of course, separately on a piece of paper or in Matlab, you can calculate what parameters need to be input, in order to get one at the output of the tested method; but both the input parameters and the unit of output will be more likely magic numbers. An example is the Maxwell distribution [9]. In the LBM algorithm, it is used in a slightly modified form.

##### Small coverage area

Even if you manage to come up with a simple test for a complex formula or algorithm, this test will most likely not find most of the errors.

Indeed, sometimes you can come up with a simple and easy to read test: for example, if you test how they wrote rotation matrices in your 3D engine (I think that many residents of the Habrir dabbled in this), then rotation at zero angles along all axes should return a single matrix. However, often these tests do not detect most errors. For example, if instead of the sine of the rotation angle (in the elements of the rotation matrix) you use the tangent, then such a simple test will pass.

Another example: you need to implement boundary conditions for the LBM algorithm. These conditions should complement the state of the node when the latter is not completely surrounded by “liquid” neighbors (for example, it is located near the wall). You can come up with a simple test - if all the neighbors are in equilibrium states (that is, the velocities are distributed according to the Maxwell law), then after iteration the boundary node will also be in equilibrium. However, it turned out that this test misses most of the errors ...

One can argue - in business applications, tests are also not called to cover all possible input parameters. However, the difference between scientific applications is that they usually operate with sets of different power [10]. Business applications mainly work with finite or countable sets (of finite power or Alef-zero power: Boolean variables, strings, integers), and scientific ones with uncountable sets (real or complex numbers).

##### Low error rate

Due to the fact that typical sets are uncountable, it is difficult to immediately detect an error.

The fact is that if you confuse "either" and "and" in a method that uses only Boolean variables, then you will immediately get something wrong at its output. And if you confuse the plus with the minus in the formula that works with small values of the input parameters, then you may not be able to find the error after one method call with this formula. You will need to call this method several hundred thousand times to detect a discrepancy. It is also worth remembering that in a test that works with real numbers, you need to think about how to avoid false positives.

Example: in one of the important cycles of the LBM algorithm, continue was confused with break. However, all tests passed. An error was discovered only after manually checking the algorithm on a simple system - the Poiseuille flow [11], when after 100,000 iterations, or 5 minutes (!) Of modeling the 32x64 node system (starting from the stationary state), the speed distribution turned out to be somewhat asymmetric (the maximum relative error is of the order of 1e -10).

##### Lack of modularity

For many methods, unit tests cannot be created.

Complex physical modeling algorithms are rarely local. That is, each spatial modeling node can interact with neighboring nodes, and you can’t avoid this in any way, and this makes unit testing impossible.

For example, to test a typical step of the LBM algorithm or typical boundary conditions in two-dimensional space, a system of at least 3x3 nodes is necessary. They need to specify densities, speeds, temperatures, make preliminary calculations, set the modeling context (such as various providers of physical properties and models), etc. As a result, the modularity of the test is gradually hidden in the haze of integration.

Another (not so characteristic) example: if you need to test the collision algorithm in the physical engine of the game, then you must specify at least two bodies that will collide, their coordinates, speeds, geometry, etc., which again turns the unit test into an integration one.

##### Large facades

If the algorithm is nontrivial, then one public method of the class that implements this algorithm can hide 10-15 methods and 300-400 lines of complex code [12].

Even so: 300-400 lines of harsh, merciless, non-parsable code. In this class there can be 5 variables of type a, b, gamma, method calls of the LAPACK [13], BLAS libraries, random number generation, and who knows what else. And you may wish with all your heart to name the variable “a” somehow normal, because you know that when developing a corporate application you would be scolded for such a name for a long time, but you cannot, because it is called that way in the original article, and because it does not make any explicit sense. The best you can do is provide a link to the article at the beginning of the class.

And even if you come up with a simple unit test, with simple input parameters for this single public method, and with a simple output value, and it stops passing at one fine moment, then you will not be able to find an error in this class even by selfless debug or by the ultimatum method attentive reading.

After encountering such a situation, the best solution was to rewrite this algorithm on MATLAB and compare the calculations line by line.

#### Conclusion

I hope that I was able to give reasonably strong arguments and convince the Habr readers that often writing the same program in different languages by different teams is a wonderful and (in case of vital systems) inevitable addition for unit testing, TDD, Interface Driven Development, Agile and other modern techniques.

Thanks for reading!

#### References

[1] Digital airplanes

[2] Commentary on the article Digital airplanes

[3] What do you care about what others think of you , chapter “The long-suffering application”

[4] Lattice Boltzmann method

[5] Books on LBM . Some are available via google books

[6] Parallel Lattice Boltzmann Solver

[7] Finite element method

[8] Boltzmann equation

[9] Maxwell-Boltzmann distribution

[10] Cardinality

[11] Poiseuille flow

[12] Pattern Facade

[13] LAPACK

PS I guess that for most readers of the site the algorithm is somewhat exotic, however, I think this is the best illustration. After all, where without scientific algorithms in the article about them the most.

PPS I do not provide links to scientific articles, because all of them are available only by subscription.

PPS Thanks to SychevIgor for invite!

#### UPD

After reading the comments, I think that it is necessary to get rid of the phobia of magic numbers and write for all modules (such as BoltzmannDistributionProvider) tabular tests for comparison with calculations from Matlaba or calculations of the working state of the application. And also add similar system tests (for example, calculate the dynamics of a 5x5 node system once, write to a file, and compare the results of the run every time with this file). Thank you all for your advice!