A few details about templates or fork bomb compilation stages

    Recently interested in instantiating positive templates. The term code bloat is used on the Internet . For c ++, this could mean an uncontrolled increase in the code generated by the compiler. The code is increased due to the fact that the instantiation of a new function has a higher priority than the conversion of arguments to a more convenient type. Those. template T foo (T a); for int and char are two different functions. One function can be obtained either by rejecting templates or using explicit type conversion.
    But let's turn the problem inside out and try to get the executable file of the maximum possible size from a minimum of lines of code.
    The result was not very impressive - I only got 53Mb of 60 lines of code. And then only for one of the three tested compilers and at the cost of several hours of compilation. The maximum volume / line ratio is 2.3MB / line for a volume of 14MB.
    How and why it happened - under the cut.

    Resources

    One laptop with 4GB of memory,
    Intel® Core (TM) i3-2330M CPU @ 2.20GHz processor,
    Linux OS 3.7.3-101.fc17.x86_64
    and a disabled swap partition.
    The swap had to be disabled for the same reason that a fork bomb appeared in the post title. With a sufficiently large amount of the task, the compiler ate all the memory and began to actively exchange with the disk, which tightly and permanently hung the machine.

    Compiler Versions:
    • g ++ (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2)
    • Intel® C ++ Intel® 64 Compiler XE for applications running on Intel® 64, Version 13.1.1.163 Build 20130313
    • clang version 3.3 (trunk 179304)


    Long arrays

    The easiest way is to organize an array of the stage of compilation and string functions on it. Like this:
    template inline void nop(){nop();asm("nop"); }
    template<> inline void nop<0>() {asm("nop");}
     int main(int argc, char ** argv) {
            nop();
            return 0;
    } 
    

    As a result, the main () function should be filled with empty useless nop operations .

    The size of the sequence nop is theoretically determined by the depth of the recursion. The g ++ mana states that the maximum depths are 17 and 1024 for c ++ 11.
    Quote from mana
    -ftemplate-depth = n
    Set the maximum instantiation depth for template classes to n. A limit on the template instantiation depth is needed
    to detect endless recursions during template class instantiation. ANSI / ISO C ++ conforming programs must not rely on a
    maximum depth greater than 17 (changed to 1024 in C ++ 11). The default value is 900, as the compiler can run out of
    stack space before hitting 1024 in some situations.
    I did not find specific numbers in the standards. Only the point was found that the maximum recursion depth is determined by the implementation:
    4.7.1.14 for c ++ 03
    4.7.1.15 for c ++ 11
    Which was confirmed experimentally. For a large enough LVL, compilers crashed. Surprised by clang ++ which crashed on 2 13 , unlike g ++ and icpc reaching 2 17 .

    The assembly was carried out by the commands:
     clang++  -DLVL=$(( 2**$n)) -o list$n ./list.cc -ftemplate-depth=3000000 -O$x
         g++  -DLVL=$(( 2**$n)) -o list$n ./list.cc -ftemplate-depth=3000000 -O$x
        icpc  -DLVL=$(( 2**$n)) -o list$n ./list.cc -O$x
    
    within
    build script
    #/bin/bash
    MIN=0
    MAX=19
    for i in 2;
    do
            mkdir -p attempt$i
            cd attempt$i
            for CXX in clang++ g++ icpc
            do
                    mkdir -p $CXX
                    cd $CXX
                    for O in O0 O1 O2 O3 Os 
                    do
                            mkdir -p $O
                            cd $O
                            (while :;do free -m |grep Mem|awk '{print $3}' >> ./memory;sleep 1;done)&
                            TIME=$!
                            for i in $(seq $MIN $MAX)
                            do
     #                             CMD="$CXX -$O ../../../fbomb.cc -DLVL=$i -o fbomb$i" #-save-temps=obj "
                                    CMD="$CXX -$O ../../../list.cc -DLVL=$(( 2 ** $i)) -o list$i  -ftemplate-depth=3000000" #-save-temps=obj "
                                    echo $CMD
                                    $CMD    
                                    sleep 5
                                    sleep 5
                            done
                            kill -9 $TIME
                            cd ..
                    done
                    cd ..
            done
            cd ..
    done
    

    Binaries were collected sequentially and for a long time. For each compiler, for each level of optimization. Assembly results are shown in graphs. Four image columns:
    1. Dependence of used memory on time. These graphs are for reference only, as during compilation, the X server and browser worked, but the main trends are visible.
    2. The size of the binary file. The maximum turned out - 14MB
    3. Characters. The inline keyword is just a recommendation to the compiler, so for large N. Nops are grouped into regular functions. They are calculated as follows:
      nm --demangle ./list$i|grep nop|wc -l
    4. The number of honest nops. Calculated from the disassembler:
      objdump -d ./list$i|grep 'nop$'|wc -l

    It looks like the images scale correctly only in chrome.
    For other browsers
    screenshot
    and a link to the archive with all the images
    Images
    Comparison of different optimization levels for the same compiler

    Memory consumption at compile time
    Executable file size
    The number of characters in the executable file
    Number of operations nop














    Comparison of different compilers at different optimization levels

    Memory consumption at compile time
    Executable file size
    The number of characters in the executable file
    Number of operations nop




















    The maximum file size was 14MB for 2 17 icpc compilers. For g ++ - 12 MB. Both at O0. Optimization level O0 does not perform inline substitution, therefore the number of nop characters match the number of nop operations.

    Tall trees

    For a linear data structure, the size of the generated file is limited to at least the maximum default recursion depth (256 for clang ++, 900 for g ++). To get around it, you can try to create a tree. The maximum theoretical depth of the compilation stage tree is (sizeof (long) -1) == 63. And 2 64 bytes will overflow any disk. The practical limit is much less.
    Using a tree, we do not go beyond the maximum depth of recursion.
    The source code takes 19 lines and looks like this:
    #ifndef LVL
    #       define LVL 3
    #endif
    long x = 0;
    template 
    struct foo{
            inline static double bar(double m)
                    {x++;return foo::bar(m) + foo::bar(m) + I;};
    };
    template
    struct foo<0,I>{
            inline static double bar(double m) {x++; return m;} 
    };
    #include 
    int main(int argc, char **argv){
            double ret = foo<>::bar(argc);
            std::cout << x << " " << ret << std::endl;
            return int(ret);
    }
    

    The result is a tree, each node of which has its own type. Each type is characterized by a pair of numbers:
    • N - level number;
    • I is the node number at the level.

    Each node inside one level is numbered from 0 to N.

    I didn’t mess with nop here, I used addition. Global long x - was used to control the correct assembly. As a result, 2 LVL + 1 is returned .

    The assembly was carried out by the commands:
     clang++  -DLVL=$n -o fbomb$n ./fbomb.cc  -O$x
         g++  -DLVL=$n -o fbomb$n ./fbomb.cc  -O$x
        icpc  -DLVL=$n -o fbomb$n ./fbomb.cc -O$x
    
    in the same scenario as above.
    The maximum LVL is 18 for clang ++. For g ++ and icpc - 16, regardless of whether the option --std = c ++ 11 was specified or not. Compilers ran out of memory.

    It looks like the images scale correctly only in chrome.
    For other browsers
    screenshot
    and a link to the archive with all the images
    Pictures for tree
    Memory consumption at compile time
    Executable file size
    The number of characters in the executable file









    Memory consumption at compile time
    Executable file size
    The number of characters in the executable file

















    Maximum file size:
    • 43MB for icpc -O0 -DLVL = 17;
    • 42MB for clang ++ -O0 -DLVL = 17;
    • 22MB for g ++ -O0 -DLVL = 16.


    Explicit instantiation

    43MB - not so small, but is it possible to make the file even larger with a given amount of RAM? It turned out possible, but only one of the three compilers - icpc. For this you need to use external templates and explicit instantiation.
    We modify the source code a little so that all the parameters of the template are indicated in its description. We divide the source code into three files - the description of the template, the main function and the partial instantiation of subtrees:
    fbomb.hh
    extern long x;
    template
    struct foo{
            inline static double bar(double m)
                    {x++;return foo::bar(m) + foo::bar(m) + I;};
    };
    template
    struct foo{
            inline static double bar(double m) {x++; return m;} 
    };
    

    main.cc
    
    #include "fbomb.hh"
    //for i in $(seq 0 13);do echo "extern template struct foo;";done
    #define L   (LVL-5)
    extern template struct foo;
    extern template struct foo;
    extern template struct foo;
    extern template struct foo;
    ...
    extern template struct foo;
    extern template struct foo;
    #include 
    long x = 0;
    int main(int argc, char **argv){
            double ret = foo::bar(argc);
            std::cout << x << " " << ret << std::endl;
            return int(ret);
    }
    

    part.cc
    template class foo;
    

    It turned out that g ++ -c main.cc -DLVL = 21 crashes due to lack of memory as well as with a full instantiation, regardless of the version of the standard. The same situation for clang ++. Icpc compiles main.cc in less than a second. However, compiling the subtrees took more than 4 hours of time:
    for i in $(seq 0 31);do
        echo -n "$i:";date;
        icpc -O2 -c ./part.cc -o part21_16_$i.o -DLVL=21 -D_L=16 -D_I=$i;sleep 10;
    done
    
    Memory consumption during subtree compilation

    Linking took less than a minute. The result is a 53MB file. This file was built with -O2. -O0 would give a larger size, but I did not rebuild it several times because of the time and meaninglessness of the result.

    The largest ratio volume / number of lines = 2.3MB / line was obtained for the array from the first part (icpc -O0 list.cc).
    The metric is certainly funny but funny. 2.3 - the maximum that happened. I will be glad to know if someone gets a bigger attitude.

    Good luck to all of us.

    Upd : Strip did not, but it would be necessary. I thought there was a couple of kilobytes - but it turned out to be a percentage of the size, and quite large. After strip, the maximum size dropped to 37MB (from 53). and 8.6MB (from 14). Accordingly, the ratio is 1.43MB / line.

    Also popular now: