Maverik January 23, 2015 at 13:57

Haskell approximate number comparison

Probably everyone knows that in calculations with limited accuracy, two mathematically equivalent expressions may not be equal to each other. For example, the following obvious mathematical equality when computing in Haskell unexpectedly turns out to be false:

ghci> 3 * sqrt(24 ^ 2 + 16 ^ 2) == sqrt(72 ^ 2 + 48 ^ 2)
False

The reason for this violation is that the expressions in this equality are calculated only approximately:

ghci> 3 * sqrt(24 ^ 2 + 16 ^ 2)
86.53323061113574
ghci> sqrt(72 ^ 2 + 48 ^ 2)
86.53323061113575
ghci> sqrt(72 ^ 2 + 48 ^ 2) - 3 * sqrt(24 ^ 2 + 16 ^ 2)
1.4210854715202004e-14

The difference here is only in the last (fourteenth!) Decimal place, but this is enough to make the comparison false.

Although this problem is well known, programmers pay little attention to it. Firstly, it is believed that comparisons of this kind arise only in a narrow area of numerical methods, and secondly, that violation of equality occurs extremely rarely. As it turned out, both are not entirely true. The given case arose when I needed to implement a function for calculating the length of a vector with integer coordinates. At the same time, for the unit testing, the tools of the QuickCheck package are used, which quite quickly found the case of violation of the scaling invariant for the length of the vector. I note that this is far from the only invariant whose violation was discovered during testing.

The question arises: what is the easiest way to describe the verification of the approximate equality of two numbers obtained as a result of calculations with limited accuracy? To solve this problem in Haskell, it is enough to define another comparison operator (say ~ =), which is used in the same way as the usual equality operator. I propose to consider the implementation of such an operator, which can be issued in the form of a fairly simple Circa module .

The first thing that comes to mind for an approximate comparison of two numbers is to calculate the absolute value of the difference of the compared numbers, and check if it exceeds a certain threshold:

ghci> abs(sqrt(72 ^ 2 + 48 ^ 2) - 3 * sqrt(24 ^ 2 + 16 ^ 2)) < 1e-12
True

Such a solution, of course, is quite functional. But he has two serious flaws. First of all, when reading this record it is not at all obvious that we check the equality (even if approximate) of the two numbers. In addition, we had to use the “magic” number 1e-12 to indicate the expected inaccuracy of the comparison. Of course, with a small number of such comparisons with these inconveniences, one could come to terms. But when the number of invariants is measured in tens, I would like to get a simpler and more obvious way to write them. The code in which the two-place operator of approximate comparison is used in the same way as the usual equal sign looks much better, for example, like this:

sqrt(72 ^ 2 + 48 ^ 2) ~= 3 * sqrt(24 ^ 2 + 16 ^ 2)

Unfortunately, the standard Haskell syntax does not provide such a statement. However, the language itself provides a wonderful opportunity to introduce a new operator on its own as if it were part of the standard syntax.

First, we define the circaEq function, which will reduce the approximate comparison record

circaEq :: (Fractional t, Ord t) => t -> t -> t -> Bool
circaEq t x y = abs (x - y) < t

Now our example is getting a little shorter, but not much clearer:

ghci> circaEq 1e-12 (sqrt(72 ^ 2 + 48 ^ 2)) (3 * sqrt(24 ^ 2 + 16 ^ 2))
True

There is still a lot of unnecessary information: in order to separate the arguments, I had to put them in brackets, and most importantly - as before, it is necessary to indicate the accuracy of the comparison. To get rid of the extra parameter, we will use the trick that was used in the Data.Fixed module to represent numbers with fixed accuracy. We define a series of double functions, each of which will compare numbers with a predetermined error. It turns out that in practice it is enough to define only seven such functions for the most typical error values:

picoEq :: (Fractional t, Ord t) => t -> t -> Bool
picoEq = circaEq 1e-12
infix 4 `picoEq`
nanoEq :: (Fractional t, Ord t) => t -> t -> Bool
nanoEq = circaEq 1e-9
infix 4 `nanoEq`
microEq :: (Fractional t, Ord t) => t -> t -> Bool
microEq = circaEq 1e-6
infix 4 `microEq`
milliEq :: (Fractional t, Ord t) => t -> t -> Bool
milliEq = circaEq 1e-3
infix 4 `milliEq`
centiEq :: (Fractional t, Ord t) => t -> t -> Bool
centiEq = circaEq 1e-2
infix 4 `centiEq`
deciEq :: (Fractional t, Ord t) => t -> t -> Bool
deciEq = circaEq 1e-1
infix 4 `deciEq`
uniEq :: (Fractional t, Ord t) => t -> t -> Bool
uniEq = circaEq 1
infix 4 `uniEq`

Any of these functions can be used as a two-place comparison operator, for example:

ghci> sqrt(72 ^ 2 + 48 ^ 2) `picoEq` 3 * sqrt(24 ^ 2 + 16 ^ 2)
True

All that remains is to add a little sugar:

(~=) :: (Fractional t, Ord t) => t -> t -> Bool
(~=) = picoEq
infix 4 ~=

and we get what we wanted:

ghci> sqrt(72 ^ 2 + 48 ^ 2) ~= 3 * sqrt(24 ^ 2 + 16 ^ 2)
True

We will answer another important question: why did we need several different functions for an approximate comparison? Is comparison with pico accuracy not enough? It turns out really not enough. A suitable counterexample is found using the same QuickCheck package:

ghci> sqrt(5588 ^ 2 + 8184 ^ 2) ~= 44 * sqrt(127 ^ 2 + 186 ^ 2)
False
ghci> sqrt(5588 ^ 2 + 8184 ^ 2) `nanoEq` 44 * sqrt(127 ^ 2 + 186 ^ 2)
True

Obviously, the required level of accuracy is highly dependent on the scale of the numbers you need to work with. That is why the Circa module exports not only the operator of approximate comparison, but also a set of functions for which it can become a synonym. If the application does not accept the use of pico accuracy, it can import the necessary function and define the approximate comparison operator accordingly. The same can be done if someone likes a different notation for the approximate comparison operator.

Tags:

arithmetic

Haskell approximate number comparison

Also popular now: