Arithmetic operations on floating point numbers

All readers of Habr are somehow connected with the IT direction. Whether you are a programmer or working with hardware, networks, and so on, we all know common concepts.

Once, in my second year at the university, I met just one of the things that, in my opinion, should each of us know, well, or at least hear about it in such an article. This is the standard for representing floating point numbers (in other floating point sources). How did I get this name: the IEEE-754 standard.

I am sure that each of the IT specialists has at least once heard with floating-point numbers, but for me for the first time this seemed like complete nonsense. And it’s not easy: after all, the subject on which we studied the standard was called “Computer Architecture” and the teacher was, and now is, a living legend. Well, this is offtopic.

So what exactly is this IEEE-754 standard? I will say right away that we were given it in electronic form in Russian at the university, but I could not find it on the Internet, even when I reached the 30th page of Google. There was an example in English in which the author wrote it at 4:36 AM. I even found a site that says that if Satan decided to take over the Earth slowly, he would create this standard. But it was created by people just like you and me.

The standard itself is a description of binary arithmetic operations with numbers in floating point format. It also describes the exceptional situations that arise in such cases, recording in this format, and much more. Naturally, after reading it, and even with such difficulty, I did not understand anything! After all, I did not know anything about the floating-point format. But this is rude, saying the fractional part of any number, only the accuracy you need to know.

On this subject at the university we calculated RGR (Settlement and Graphic work) and for some reason then I realized that it was worthwhile to devote more time to it than something and it turned out to be right. This was probably the turning point in my studies. I sat at night over this standard and over my specific task: “Division of two numbers in a double-precision floating-point format with replacing chains of continuous units by zeros and rounding to the nearest even”. Then it could not be understood. And the IEEE-754 standard has always followed along with this assignment. In fact, there was everything, absolutely everything that I needed.

Well, now more about the IEEE-754 standard. It represents several chapters that I would like to describe in more detail.
Everything, as always, begins with the introduction. The fact that there are programs is much more complicated than what I saw. Describes the history of the creation of the standard. After all, programs are becoming more and more difficult, and the digital computer is aging and should be replaced with a new architecture. This was the reason that the IEEE (Institute of Electrical and Electronics Engineers of the USA) created a commission in the late 70s that considered many proposals. The result of the commission's work was the IEEE 754 standard ≪ Binary Floating Point Arithmetic ≫ (1985), which became international. Its foundations were developed by professor of mathematics at the University of Berkeley, William Kahan.
In the following years, the following standards were developed on the basis of IEEE 754 - 1985:

- IEEE 854 - 1987, covering decimal arithmetic as well as binary;

- IEC 60559 - 1989 IEC ≪ Binary floating point arithmetic for
microprocessor systems ≫ (IEC - International Electrotechnical Commission).

The IEEE 754 standard does not oblige, but recommends the use of a package of the formats specified in it, methods of encoding data, rounding results, and much more. The task of choosing a format for the designer of a universal digital computer was extremely simplified, and from that time on, firms began to produce universal digital computers with floating-point arithmetic satisfying the recommendations of the standard. The task of programmers is also somewhat simplified, because there is no need to study the features of binary floating-point arithmetic of different computers, just master the knowledge of the standard.
But you need to remember that standards are conservative, but not eternal. And, nevertheless, all of us, colleagues, use this standard.

The standard supports several formats: single precision (32 bits), double (64 bits) and double advanced accuracy. Other formats are also provided to prevent rounding errors, etc. The standard describes cases of emergencies: Nan, infinity, division by zero, etc. Doesn’t resemble anything? Rounding numbers in a floating point format plays a very important role. This is also described in the standard.

And finally, the main section - Performing operations on numbers in floating point format. This section describes all arithmetic operations from comparison to division, as well as all the nuances when performing such operations. About this section can not be said like this, "in a nutshell." I can only say that this is a real mess and I faced the task of understanding how this happens.
I will briefly describe my algorithm for working with "Floating-point division." After we received operands A and B, we had to check them for all possible cases of exceptional situations. This is division by zero and Nan and infinity. A little below, the table shows the types of numbers that the format supports:

image

If the operands really were numbers in the IEEE-754 format, the second stage of the operation began: the conversion of orders. It's no secret that floating-point numbers look something like this:

image

This is a single-precision representation of a number.
The order of the number in the computer is, in my understanding, the serial number of the number in the computer, that is, its order. Surely there is a scientific definition, but it will only confuse even more. So, since numbers have different orders, they cannot be divided. First, bring orders to one form by shifting orders. But for this it was necessary to analyze the orders of min and max value. And when an order shift occurs, the mantissas also shift. If the orders are equal, you need to check the mantissas, whether they flew out of the borders and whether they were filled with zeros, etc. After completing a series of checks, you can proceed to the most important thing: finally divide the mantissas. Well, everything is simple, like all binary arithmetic. I divided the divisor by the dividend, and wrote the remainder into the register and added. There are still several ways to divide: with recovery and without restoring the remainder. And that's not all! In the end, you should round up the result according to the necessary condition and determine the sign of the quotient.

It's just in words, although it sounds scary, in fact it looks much better. Then I frankly fell for this standard, which brought me not only deeper knowledge in digital computers and binary arithmetic, but also the pleasure that I was able to do this, the pleasure of realizing that I know something very important.
I have everything, in fact the topic is very interesting and fascinating. If you are interested, I will gladly drop the IEEE-754 standard and answer your questions.

Thanks.

Also popular now: