We optimize the code, or overtake Firelis in speed

    I read the topic about new superoptimizations in Ognelis and thought for a long time.
    It’s not very clear to me why a holiday with a firework and a snow maiden is arranged around this kind of work. Let's take a closer look at what has been done.

    * Function Inlining: Removing the overhead of function calls by simply replacing them with their resulting native code.
    Almost all compilers can do function inlining. This is a very simple and you can say free way to speed up the program. It has certain limitations:
    1) Excessive inlining leads to bloat code. This is now practically no problem - there is a lot of memory, but the compiler has a top limit on the size of the included function.
    2) The compiler is NOT able to inline functions from a neighboring module, from system libraries, or from dynamic libraries.

    In addition, the normal (modern) compiler is able to work with the so-called "intrinsics" - functions that it recognizes by the plate and has ready-made code for them. These are usually mathematical functions such as sin (). So, this very sine will not be made like call sin (), but will be inserted by a piece of code, i.e. automatically inline.

    * Type Inference: Removing checks surrounding common operators (like "+") when the types contained within a variable are already known. This means that the engine will have already pre-determined, for example, that two strings need to be concated when it sees the "+" operator.

    Well, this is usually called RTTI skip - they threw out type checking where it is not needed ... Perhaps for the JIT compiler this is super cool, but ordinary ones have been able to do this for a long time. Naturally, all responsibility for types, or rather their possible inconsistency, lies with the programmer :)

    * Looping: The overhead of looping has been grossly diminished. It's one of the most common areas of overhead in JavaScript applications (common repetition of a task) and the constant determining of bounds and the resulting inner code is made negligible.
    9 out of 10 that a simple anroll cycle was made. Unrolling is a duplication of the body of the loop N times: if unroll is 4 then we get: Well, plus an additional check that maxI is a multiple of 4 :)
    for (int i=0; i a[i] = b[i]+c[i];
    }



    for (int i=0; i a[i] = b[i]+c[i];
    a[i+1] = b[i+1]+c[i+1];
    a[i+2] = b[i+2]+c[i+2];
    a[i+3] = b[i+3]+c[i+3];
    }



    Now we take our program, Intel C Compiler (or Intel Fortran Compiler - whoever likes it) and assemble the project with the following keys:
    -O3 (yes, aggressive optimization);
    -axT (enable vectorization, i.e. using SSEx + general code generation for severe cases);
    -ip (inter-procedure optimization, including partial inlining. for fans there is an option -ipo - inter-module optimization, not portable !!!);
    -ansi-alias -fno-alias (improves vectorizability :) cycles)

    Plus, we get: automatic anroll of cycles by at least 4, inlining and intrinsics of all mathematical functions.
    And fireside will not catch up;)
    Yes, and in the general case there is a difference between the regular and the JIT compiler, but it is not so big as to give out long-known features for discoveries (like: “Oh! We came up with inlining!”)

    PS. The first pancake, please do not hit on the face with a lump.

    Also popular now: