DustCn August 27, 2008 at 13:07

We optimize the code, or overtake Firelis in speed

I read the topic about new superoptimizations in Ognelis and thought for a long time.
It’s not very clear to me why a holiday with a firework and a snow maiden is arranged around this kind of work. Let's take a closer look at what has been done.

* Function Inlining: Removing the overhead of function calls by simply replacing them with their resulting native code.
Almost all compilers can do function inlining. This is a very simple and you can say free way to speed up the program. It has certain limitations:
1) Excessive inlining leads to bloat code. This is now practically no problem - there is a lot of memory, but the compiler has a top limit on the size of the included function.
2) The compiler is NOT able to inline functions from a neighboring module, from system libraries, or from dynamic libraries.

In addition, the normal (modern) compiler is able to work with the so-called "intrinsics" - functions that it recognizes by the plate and has ready-made code for them. These are usually mathematical functions such as sin (). So, this very sine will not be made like call sin (), but will be inserted by a piece of code, i.e. automatically inline.

* Type Inference: Removing checks surrounding common operators (like "+") when the types contained within a variable are already known. This means that the engine will have already pre-determined, for example, that two strings need to be concated when it sees the "+" operator.

Well, this is usually called RTTI skip - they threw out type checking where it is not needed ... Perhaps for the JIT compiler this is super cool, but ordinary ones have been able to do this for a long time. Naturally, all responsibility for types, or rather their possible inconsistency, lies with the programmer :)

* Looping: The overhead of looping has been grossly diminished. It's one of the most common areas of overhead in JavaScript applications (common repetition of a task) and the constant determining of bounds and the resulting inner code is made negligible.
9 out of 10 that a simple anroll cycle was made. Unrolling is a duplication of the body of the loop N times: if unroll is 4 then we get: Well, plus an additional check that maxI is a multiple of 4 :)

for (int i=0; i
a[i] = b[i]+c[i];

}

for (int i=0; i
a[i] = b[i]+c[i];

a[i+1] = b[i+1]+c[i+1];

a[i+2] = b[i+2]+c[i+2];

a[i+3] = b[i+3]+c[i+3];

}

Now we take our program, Intel C Compiler (or Intel Fortran Compiler - whoever likes it) and assemble the project with the following keys:
-O3 (yes, aggressive optimization);
-axT (enable vectorization, i.e. using SSEx + general code generation for severe cases);
-ip (inter-procedure optimization, including partial inlining. for fans there is an option -ipo - inter-module optimization, not portable !!!);
-ansi-alias -fno-alias (improves vectorizability :) cycles)

Plus, we get: automatic anroll of cycles by at least 4, inlining and intrinsics of all mathematical functions.
And fireside will not catch up;)
Yes, and in the general case there is a difference between the regular and the JIT compiler, but it is not so big as to give out long-known features for discoveries (like: “Oh! We came up with inlining!”)

PS. The first pancake, please do not hit on the face with a lump.

Tags:

We optimize the code, or overtake Firelis in speed

Also popular now: