gnomeby February 23, 2013 at 15:14

Debunking the x32 ABI Myths

Transfer

Probably some of you have heard of a freebie called x32 ABI .

Briefly about x32 ABI

In short, this is an opportunity to take full advantage of the 64-bit architecture, while maintaining 32-bit pointers. Potentially, the application will consume less memory, although it will not be able to address more than 4 GB of memory.

Example. In your code, you define an array of integers and fill it with values. How much memory do you spend? If it is very crudely represented, it will look something like this:
32 bits: Pointer + Element count + N integers = N + 2 32-bit numbers
64 bits: Pointer + Element count + N integers = N + 2 64-bit numbers = 2N + 4 32-bit numbers
Here the engineers thought: what if you try to use 32-bit pointers on a 64-bit architecture? X86-64 Architecture Has CISC Command Systemand lets do it. In this case, our array above will consume 2N + 3 memory instead of 2N + 4. The savings are of course insignificant, but the fact is that in modern code the number of all kinds of pointers in structures often reaches ten, and the use of short pointers will potentially save up to 50% of memory (in the ideal case).

For those who need more accurate calculations:
* How large are arrays (and values) in PHP? (Hint: VERY BIG)
* How much memory do objects consume in PHP and is it worth using the 64-bit version?

But as it turned out there will be no freebies.

Debunking x32 myths article translation

There were many comments on my previous x32 ABI article . Some of them are interesting, while others simply don’t understand what they are writing about. I got the impression that there was something like a cult of cargo around this topic. People think: “For some reason they are doing this, so that I can use it too,” while technical literacy is not available to evaluate this very benefit.

So in the same spirit that I used to go through ccache almost four years ago (wow, my blog has been so many years old, well, haven't I done it?), I will try to debunk myths and misconceptions about this x32 ABI .

x32 ABI code faster

Not quite right. Now we have only a few test results posted by those who created this ABI. Of course, you expect those who took the time to set up the system to find it interesting and faster, but to be honest, I have doubts about the results, for reasons that will be clear after reading the next few sentences.

It is also interesting to note that, despite the fact that the general measurements were faster, the difference is not fundamental. And even Intel's presentation shows big differences only in comparison with the original x86, which is already understandable, which is worse than x86-64.

Also, these results were obtained using synthetic tests, and not from the actual use of the system, and you know, of course, if you know that such results can lie from three boxes.

x32 ABI code is more compact

The new ABI generates less code, which means that more instructions will go into the processor cache, and we will also have fewer files. This is absolutely untrue. The generated code, in general, is the same as for x86-64, since the set of instructions does not change, the so-called data model simply changes, which means that you resize for long (and related types) and the size of pointers (but also changes and the size of the available address space).

It is theoretically true that if you intend to use smaller data structures, then they will fit more into the data cache (but not into the instruction cache, be sure (approx. Per.: CISC internally immediately converts all short instructions into long ones)), but does that right approach? In my experience, it’s better to focus on writing code that is best placed in the cache if your code devours the cache. You can use the dev-util / dwarves utilities from Arnaldo (acme). pahole , for example, will tell you how your data structures will be shared in memory.

Also remember that for compatibility, the system calls will be left the same as in x86-64, which means that all the kernel code and system data structures that you will use will be the same as for x86-64. Which means that a large number of structures will not change their size in the new ABI (approx. Transl .: binary interface).

Finally, if you turn to the presentation again, you can see on slide 24 that the x32 ABI code may be longer than the original x86 code. It would be nice if they also included an example for x86-64 code (since I do not own VCISC (approx. Transl .: I mean a group of 64-bit instructions from CISC)), but I think this is one same code.

Let's compare the size of the libc.so.6 file for fun . Here is the output of the rbelf-size utilityfrom my Ruby Elf set :

        exec         data       rodata        relro          bss     overhead    allocated   filename
     1239436         7456       341974        13056        17784        94924      1714630   /lib/libc.so.6
     1259721         4560       316187         6896        12884        87782      1688030   x32/libc.so.6

The executable code is even larger in the x32 version. The big change is of course in the data structure (data, rodata, relro and bss), since pointers are now abbreviated. I honestly am even concerned: “How can one have so many pointers in its own structures for a C library?”; but this question is off topic. Even though the pointers are shorter, the difference is not that big. In general, you will have a saving of something like 30 KiB, which is unlikely to change the pattern of memory mapping.

Data reduction is useful

Well, yes, this is the main question. Of course, the data structures are smaller with x32, because for this it was done, in the end. But the main question will probably be: “Is it that important?”; I do not think. Even in the example above with the C library, where the difference is noticeable, it is only about 20% of the occupied space. And this is the C library! A library that assumes that you will write much smaller interfaces.

Now if you add all the possible libraries to this, then perhaps you can save a couple of megabytes of data, of course, but you should also consider all the porting problems that I am going to discuss soon. Yes, it’s true that C ++ and most languages with a virtual machine will have less difficulty, especially when copying objects, thanks to reduced pointers, but so far we can say this with a big stretch. Especially since most of your data buffers must be aligned at least 8 bytes (64 bits) in order to use the new instructions. And you already align them to 16 bytes (128 bits) to use some SIMD instruction sets .

And for those who think that x32 will save disk space. Remember that you cannot have a “clean” x32 system, what you get is a mixture of three approaches: x86, x86-64 and x32.

This is not applicable for applications using more than 4 GB of memory.

Yes, of course, this is probably true. But seriously, are you really worried about the size of the pointers? If you really want to make sure that the application does not use more than a certain amount of memory, use system limits! They are certainly less "heavy" than creating a new ABI as a whole.

Interestingly, there are 2 different, opposite approaches for applications in a full 64-bit address space with memory less than 4 GB:

ASLR (Address Space Layout Randomization), which can actually load various application objects over a wide range of addresses (approx. Transl .: that is, as if to scatter from memory)
and Prelink , which makes sure that every unique object in the system is always loaded at the same address, and this is really the opposite of what ASLR does

Applications use long , but they do not need a 64-bit address space

(Note lane: the author means the 64-bit long)
And, of course, the solution is to create a new ABI for this, according to some people.

I am not going to say that many people for applications still use long , without thinking about why they do it. Perhaps they have small ranges of numbers that they want to use, and yet they use large types, such as long , since they may have studied programming on systems that use long as a synonym for int , or even on systems where long is 32-bit and int is 16bit (hello MS-DOS!).

The solution to this problem is simple - use the standard types providedstdint.h such as uint32_t and int16_t . So you will always use the data size that you expect. It also works on more systems than you expected, and works with FFI and other techniques.

There are not many assembler inserts

This was told to me by several people after my previous post, where I complained that in the new ABI we will lose most of the assembler inserts. This statement may be true, but in reality they are not so few as you think. Even if all multimedia programs are excluded, cryptographic programs that make good use of SIMD through assembler inserts (and not through compiler optimizations).

There is also a problem with assembler inserts in things like Ruby , where Ruby 1.9 does not compile on x32. With Ruby 1.8, the situation is more interesting because it compiles, but throws segfaults at runtime at startup. Doesn't remind you of anything?

In addition, the C library itself comes with a lot of assembler inserts. And the only reason why you don’t need to port so much is simple - HJ Lu, who cares about most of them, is one of the authors of the new ABI, which means that the code is already ported.

x32 ABI will be compatible with x86, if not now, then in the future

Well, I didn’t mention this before, but this is one of the misconceptions that I noticed before being stoned. Fortunately, the presentation will help with this. Slide 22 makes it clear that the new ABI is not compatible. Among other things, you may notice that the ABI at least fixes some actual errors in x86, including the use of 32-bit data types for off_t and others. Again, I touched on this topic a little two years ago .

This is the future of 64-bit processors

No, again we turn to the slides, in particular to slide 10. This is clearly done for proprietary systems, than to replace x86-64 in general! Well, how are you feeling now?

Porting will be trivial, you just need to change a few lines of assembler inserts and change the size of the pointers

This is not the case. Porting requires solving a number of other issues, and assembler inserts are just the tip of the iceberg. Breaking the concept that 64-bit pointers in x86-64 is a big task in itself, but not as big as one might think at first glance (and also for Windows), compared with the implementation of FFI style C bindings. Remember, I said that this is not an easy answer ?

The processor performs better 32-bit instructions than 64-bit

Interestingly, the only processor that Intel claims to perform better on 32-bit instructions in a presentation is Atom. I quote: “Delays on 64-bit IMUL operations are twice as high as on 32-bit ones on Atom.”

So what exactly is IMUL? This is a sign multiplication operation. Do you multiply pointers? It's pointless. In addition, the signs are not significant. And you tell me that you are more worried about the platform (Atom), which has large delays when people use 64-bit data instead of the laid-out 32-bit? And your solution to this problem is to create a new ABI, where it is difficult to use 64-bit types. And all this, instead of just fixing in the program what causes these problems?

I should probably dwell on this, as this last comment about Atom and IMUL will please many people who only superficially understand the new interface.

UPD I just tried to build PHP on my virtual machine with Gentoo x32 ABI RC . Like Ruby, it does not compile.

Tags: