# Fast Fixed Point Math for Java Financial Applications

It is no secret that financial information (invoices, transactions, and other bookkeeping) is not very friendly with floating point numbers, and many articles recommend using a fixed point (fixed point arithmetic). In Java, this format is represented, in fact, only by the BigDecimal class, which cannot always be used for performance reasons. You have to look for alternatives. This article describes a self-written Java library for performing arithmetic operations on fixed-precision numbers. The library was created to work in high-performance financial applications and allows you to work with an accuracy of 9 decimal places while maintaining acceptable performance. A link to the source code and benchmarks is provided at the end of the article.

# Floating point arithmetic

Modern computers can perform arithmetic operations only with limited accuracy. These are discrete devices that may not work with all possible numbers, but only with some of their countable subsets. The most common format for working with real numbers in the computer's memory is a floating (binary) point - floating (binary) point, when numbers are stored as M * 2 ^ E, where M and E are integer mantissa and the order of the numbers. But some numbers, such as 0.1, cannot be accurately represented in this format. Therefore, in the course of complex calculations, some error inevitably accumulates. That is, the result of the machine calculation, say 0.1 + 0.1 + 0.1, does not coincide with the mathematically correct 0.3. Given the above, when programming complex arithmetic, you can follow several strategies:

Strategy 1 - ignore. Do not pay attention to the error, consider all operations ideally mathematical, and hope that the accuracy available will suffice for acceptable results. The most common option.

Strategy 2 - scrupulously calculate. Formulas for calculating machine errors have been known for more than a decade. They allow us to estimate from above the relative error of any arithmetic operation. Probably, and it is necessary to do for serious numerical modeling. The problem is that it is very time consuming. In fact, every + - * / character in the code must be accompanied by an error calculation. It is necessary to take into account all the dependencies between the calculations and repeat the procedure every time the code changes.

Strategy 3 - use a decimal point (floating decimal point) instead of a binary one. That is, store numbers in the form of M * 10 ^ E. This does not solve the problems with an error (the mantissa is still rounded to a finite number of significant digits), but at least all the “simple” numbers for a person (like 1.1) are now accurately represented in the memory. Payback will be performance. Any normalization of numbers (that is, an equivalent reduction of the mantissa and an increase in order) requires division by degree 10, which is not very fast, unlike division by degree 2. And there is a lot to normalize — with each addition or subtraction with different orders.

Strategy 4 - use a fixed point (fixed decimal point). Simplify strategy 3 when we fix order E. In this case, normalization is not needed for addition / subtraction. In addition, all calculations will have the same absolute error. This strategy is devoted to the article.

# Fixed point arithmetic

In contrast to physics, where relative error is important, in finance, an absolute is needed. If, after conducting a complex financial transaction, a customer is billed at \$ 1000000.23 while he expects \$ 1000000.18, then there may be some difficulties. An explanation of the “yes, why do you need an accuracy of 8 significant digits ??” may not roll. And it's not a loss of 5 cents (to make a mistake on the contrary, “in favor” of the client is not much better), but in the inconsistencies of accounting. Therefore, the rules of computation and rounding are clearly stipulated between the parties, and artifacts from the use of double and float variables sometimes complicate life.

In Java, there is a standard class for fixed point arithmetic - BigDecimal. There are two problems with him: he is slow (because of his versatility) and he is not mutable. Non-confusion means that any operation allocates an object on the heap. The selection and release in terms of the object takes a little time, but intensive calculations in the hot code create a decent load on the GC, which is unacceptable in some cases. You can hope for escape analysis and scalarization, but they are very unstable in the sense that even a minor change in the code or in the JIT (such as lazy loading of the new interface implementation) can turn the entire inline structure upside down, and the method that worked normally a minute ago, suddenly begin to allocate memory madly.
UPD due to questions in the comments: Main reason the rejection of BigDecimal and BigInteger is not at all the poor performance of computations, but the inconsistency and selection of objects.

The described library is the result of the fact that I was tired of rewriting non-allocating memory fixed arithmetic from scratch for each new employer, and I decided to write my own library for later insourcing.

Immediately show an example of use, before moving on to the details of implementation:

``````publicclassSample{
privatefinal Decimal margin;
privatefinal Quantity cumQuantity = new Quantity();
privatefinal Quantity contraQuantity = new Quantity();
privatefinal Quantity cumContraQuantity = new Quantity();
privatefinal Price priceWithMargin = new Price();
privatefinal Price avgPrice = new Price();
publicSample(int marginBp){
// 1 + margin / 10000this.margin = Decimal.create(marginBp).divRD(10000L).add(1);
}
public Price calculateAvgPrice(Quantity[] quantities, Price[] prices){
cumQuantity.set(0);
contraQuantity.set(0);
// avg = sum(q * p * margin) / sum(q)for (int i = 0; i < quantities.length; i++) {
priceWithMargin.set(prices[i]).mulRD(margin);
contraQuantity.set(quantities[i]).mulRD(priceWithMargin);
}
return avgPrice.quotientRD(cumContraQuantity, cumQuantity);
}
publicstaticvoidmain(String[] args)throws ParseException {
Price p1 = Price.create("1.5");
Price p2 = Price.create(1.6);
Quantity q1 = Quantity.create("100");
Quantity q2 = Quantity.create(200);
// apply 0.05% margin to the prices
Sample sample = new Sample(5);
System.out.println(sample.calculateAvgPrice(new Quantity[]{q1, q2}, new Price[]{p1, p2}));
}
}``````

# Idea implementation

So, we need a mutable wrapper for an integer primitive, more precisely, a long, which will give us nearly 19 significant digits (enough for the whole and for the fractional part). In long, we mean N decimal places. For example, when N = 2, the number 2.56 is stored as 256 (binary 100000000). Negative numbers are stored in the standard, in the additional code:

-2.56
-256

(Hereinafter, “mathematical” numbers and calculations are indicated in italics , and their internal representation is bold )

It also seemed to me useful to enter NaN as a separate value, which is returned in case of arithmetic errors (instead of elimination or garbage). NaN is represented internally as Long.MIN_VALUE , “propagates” (propagated) through all operations and allows you to determine the sign inversion for all the remaining numbers.

Let us try to estimate the algorithms of arithmetic operations for the case when N = 2.

Addition and subtraction do not require any extra gestures, just use the values ​​as is:

1.20 + 2.30 = 3.50
120 + 230 = 350

Multiplication and division require additional normalization, that is, multiplication / division by 10 ^ N (by 100 in our example)

1.20 * 2.00 = 2.40
120 * 200/100 = 240

1.20 / 2.00 = 0.60
100 * 120/200 = 60

Additional division is not the fastest operation. But in this case, this division by a constant, because we fixed N = 2 and 10 ^ N = 100 in advance. The division by a constant, especially the “beautiful” (type 10), is intensively optimized in the CPU and much faster than the division by a random number. We do a lot of divisions by 10 each time we convert any number to a string (for example, in logs), and CPU manufacturers know about this ( for more information, see "Division by a constant" for optimization ).

To consolidate the understanding of what we are doing, I will give one more operation: a unary inversion of a number, that is, 1 / x. This is a special case of division, you just need to submit 1.00 in our format and do not forget to normalize:

1.00 / 2.00 = 0.50
100 * 100/200 = 50

Well, so far everything is pretty simple, let's try to get into the details.

# Rounding

Let's try to draw another number:

1.00 / 3.00 = 0.33
100 * 100/300 = 33

An honest mathematical result lies between 0.33 and 0.34, but we cannot accurately represent it. Which way to round? Usually rounded to 0, and this is the fastest way (supported by hardware). But, returning to the real financial problems, this is not always the case. Usually, when processing transactions with a client, rounding is “in favor of the client”. That is, the price is rounded up if the customer sells, and down if the customer buys. But other options may be required, for example, arithmetic rounding to the nearest number with subtypes (half-up, half-down, half-even) to minimize accounting inconsistencies. Or rounding to ± infinity for negative prices (for some financial instruments). Java BigDecimal already contains a list of standard rounding modes, and the described library supports them all. UNNECESSARY mode returns NaN,

In rounding up mode, our calculation should yield:

1.00 / 3.00 = 0.34
100 * 100/300 + 1 = 34

How do I know if I need to add one? You need the remainder of the division of 10000% 300 = 100. Which is as slow as the division itself. Fortunately, if you write in succession in the code "a / b; a% b", then JIT will figure out that 2 divisions are not necessary, just one assembler command div returning 2 numbers (quotient and remainder).

Other rounding options are a bit more complicated, but they can also be calculated based on the remainder and the divisor.

In the API, I deliberately made mentioning rounding wherever it occurs, either as a parameter or as a suffix R ound D own in methods where it defaults to zero.

# Overflow

We come to the most difficult part. Remember once again our multiplication:

1.20 * 2.00 = 2.40
120 * 200/100 = 240

Now imagine that we are in the 1980s and we have 16-bit processors. That is, only short is available to us with a maximum value of 65535. The first multiplication will overflow and will be equal to 240000 & 0xFFFF = 44392 (this is if without a sign, with a sign it will also be negative), which will break the result.

Will not work. We have 2 normal (values ​​that fit into our range) arguments, and the same normal expected result, but we fill up halfway. Exactly the same situation is possible with 64-bit long, just the numbers need more.

In the 1980s, we would need a multiplication, giving a 32-bit result. Today we need multiplication with a 128-bit result. The most annoying thing is that both multiplications are available in assemblers 8086 and x86-64, respectively, but we cannot use them from Java! JNI, even in the case of a hack with fast JavaCritical, gives the overhead in tens of nanoseconds, introduces problems with deployment and compatibility, freezes the GC for the duration of the call. In addition, we would somehow have to return a 128-bit result from the native method, and writing by reference to the array (to memory) is an additional delay.

In general, I had to write manual multiplication and division. Column. I needed 2 auxiliary operations:

1. A (64) * B (64) = T (128); T (128) / N (32) = Q (64), R (32) - as part of the fixed point multiplication A * B
2. N (32) * A (64) = T (96); T (96) / B (64) = Q (64), R (64) - as part of the fixed point of the division A / B
(data size in bits is shown in brackets, T is a temporary variable that should not overflow)

Both operations return a quotient and a remainder (one - as the result of the method, the second - in the field of the object). They too can overflow, but only at the last step, when it is inevitable. Here is an example (from the 1980s):

500.00 / 0.50 = 1000.00
100 * 50000/50 = 100000 - overflow!

Dividing the column a la Knut is not the easiest algorithm. Plus, it should all be relatively fast. Therefore, the code of both operations is hundreds of lines of fairly harsh bitmaking, it will take a long time for me to remember again what exactly is happening there. I pulled them into a separate class and commented in detail as I could.

The multiplication algorithm is not limited to calling operation 1, but the remaining code is not so complicated and simply adds support for negative numbers, rounding, and NaN.

Usually (with the exception of special cases), both operations contain 4 multiplications and 2 divisions. Operation 1 is significantly faster than 2, since in it these divisions are by a constant.

By the way, if anyone noticed, N (32) is our 10 ^ N for normalization. It is 32-bit, which means that N can be a maximum of 9. In the real applications I saw, 2, 4, or 8 decimal places were used. More than 9 I have not met, so that should be enough. If you make 10 ^ N 64-bit, the code becomes more complicated (and slows down) even more.

# Several different accuracy

Sometimes it is necessary to perform an operation on arguments with a different number of decimal places. At a minimum, enter transactions involving the usual long.

For example:

2.0000 (N = 4) + 3.00 (N = 2) = 5.0000 (N = 4)
20,000 + 300 * 100 = 50,000

3.00 (N = 2) + 2.0000 (N = 4) = 5.00 (N = 2)
300 + 20000/100 = 500

In this case, additional normalization of one of the arguments is required. Note that mathematically both operations are equivalent, but because of the other accuracy of the result, they are calculated differently. It is also worth noting that the second operation generally requires rounding.

The number of decimal places is NOT stored in the object. Instead, a separate subclass is assumed for each accuracy. Class names can be business oriented, for example Price (N = 8), Quantity (N = 2). And they can be generalized: Decimal1, Decimal2, Decimal3, ... The greater the accuracy, the smaller the range of stored values, the minimum range has Decimal9: ± 9223372036. It is assumed that one or two classes will be enough to cover the necessary functionality, in which case the abstract getScale method will most likely be virtualized and inline. Subclasses (instead of an additional field) allow you to strictly typify the accuracy of the arguments and the result, as well as to signal possible rounding at the compilation stage.

The library allows for operations in which a maximum of 2 (but not 3) of different accuracy are involved. That is, either the accuracy of the two arguments must match, or the accuracy of one of the arguments and the result. Again, support for 3 different precisions would slow the code down and complicate the API. As arguments, you can pass the usual long, for which the accuracy is assumed to be N = 0.

2.0000 / 3.0 = 0.6667 - ok (2 different accuracy)
2/3 = 0.6667 - ok (long arguments, decimal result)
2 / 3.0 = 0.6667 - impossible! (3 different accuracy)

Obviously, the calculations of increased bitness carried out by the library are slower than the hardware supported. However, the overhead projector is not so large (see benchmarks below).

Based on this, the library is usually used in places where the loss of absolute accuracy is critical. For example, calculating accurate financial statistics, accounting for current financial indicators (trading positions, PnL, executing orders). In the case of a network exchange of financial information between systems, it is also more convenient to use formats with a decimal point (instead of binary).

Complex mathematical algorithms (modeling, statistics, forecasting) are usually simpler to perform as standard in double, since their result is not absolutely accurate in any case.

# Code and benchmarks

Code

BenchmarkModeCntScoreErrorUnits
DecimalBenchmark.controlavgt20010.072± 0.074ns / op
DecimalBenchmark.multiplyNativeavgt20010.625± 0.142ns / op
DecimalBenchmark.multiplyMyDecimalavgt20035.840± 0.121ns / op
DecimalBenchmark.multiplyBigDecimalavgt200126.098± 0.408ns / op
DecimalBenchmark.quotientNativeavgt20070.728± 0.230ns / op
DecimalBenchmark.quotientMyDecimalavgt200138.581± 7.102ns / op
DecimalBenchmark.quotientBigDecimalavgt200179.650± 0.849ns / op

In general, multiplication is 4 times faster than BigDecimal, division - in 1.5. The division rate depends strongly on the arguments, hence the spread of values.