A few details about templates or fork bomb compilation stages
Recently interested in instantiating positive templates. The term code bloat is used on the Internet . For c ++, this could mean an uncontrolled increase in the code generated by the compiler. The code is increased due to the fact that the instantiation of a new function has a higher priority than the conversion of arguments to a more convenient type. Those. template T foo (T a); for int and char are two different functions. One function can be obtained either by rejecting templates or using explicit type conversion.
But let's turn the problem inside out and try to get the executable file of the maximum possible size from a minimum of lines of code.
The result was not very impressive - I only got 53Mb of 60 lines of code. And then only for one of the three tested compilers and at the cost of several hours of compilation. The maximum volume / line ratio is 2.3MB / line for a volume of 14MB.
How and why it happened - under the cut.
Intel® Core (TM) i3-2330M CPU @ 2.20GHz processor,
Linux OS 3.7.3-101.fc17.x86_64
and a disabled swap partition.
The swap had to be disabled for the same reason that a fork bomb appeared in the post title. With a sufficiently large amount of the task, the compiler ate all the memory and began to actively exchange with the disk, which tightly and permanently hung the machine.
Compiler Versions:
As a result, the main () function should be filled with emptyuseless nop operations .
The size of the sequence nop is theoretically determined by the depth of the recursion. The g ++ mana states that the maximum depths are 17 and 1024 for c ++ 11.
4.7.1.14 for c ++ 03
4.7.1.15 for c ++ 11
Which was confirmed experimentally. For a large enough LVL, compilers crashed. Surprised by clang ++ which crashed on 2 13 , unlike g ++ and icpc reaching 2 17 .
The assembly was carried out by the commands:
It looks like the images scale correctly only in chrome.
For other browsers
characters match the number of nop operations.
Using a tree, we do not go beyond the maximum depth of recursion.
The source code takes 19 lines and looks like this:
The result is a tree, each node of which has its own type. Each type is characterized by a pair of numbers:
Each node inside one level is numbered from 0 to N.
I didn’t mess with nop here, I used addition. Global long x - was used to control the correct assembly. As a result, 2 LVL + 1 is returned .
The assembly was carried out by the commands:
The maximum LVL is 18 for clang ++. For g ++ and icpc - 16, regardless of whether the option --std = c ++ 11 was specified or not. Compilers ran out of memory.
It looks like the images scale correctly only in chrome.
For other browsers
Maximum file size:
We modify the source code a little so that all the parameters of the template are indicated in its description. We divide the source code into three files - the description of the template, the main function and the partial instantiation of subtrees:
and meaninglessness of the result.
The largest ratio volume / number of lines = 2.3MB / line was obtained for the array from the first part (icpc -O0 list.cc).
The metric is certainly funny but funny. 2.3 - the maximum that happened. I will be glad to know if someone gets a bigger attitude.
Good luck to all of us.
Upd : Strip did not, but it would be necessary. I thought there was a couple of kilobytes - but it turned out to be a percentage of the size, and quite large. After strip, the maximum size dropped to 37MB (from 53). and 8.6MB (from 14). Accordingly, the ratio is 1.43MB / line.
But let's turn the problem inside out and try to get the executable file of the maximum possible size from a minimum of lines of code.
The result was not very impressive - I only got 53Mb of 60 lines of code. And then only for one of the three tested compilers and at the cost of several hours of compilation. The maximum volume / line ratio is 2.3MB / line for a volume of 14MB.
How and why it happened - under the cut.
Resources
One laptop with 4GB of memory,Intel® Core (TM) i3-2330M CPU @ 2.20GHz processor,
Linux OS 3.7.3-101.fc17.x86_64
and a disabled swap partition.
The swap had to be disabled for the same reason that a fork bomb appeared in the post title. With a sufficiently large amount of the task, the compiler ate all the memory and began to actively exchange with the disk, which tightly and permanently hung the machine.
Compiler Versions:
- g ++ (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2)
- Intel® C ++ Intel® 64 Compiler XE for applications running on Intel® 64, Version 13.1.1.163 Build 20130313
- clang version 3.3 (trunk 179304)
Long arrays
The easiest way is to organize an array of the stage of compilation and string functions on it. Like this:template inline void nop(){nop();asm("nop"); }
template<> inline void nop<0>() {asm("nop");}
int main(int argc, char ** argv) {
nop();
return 0;
}
As a result, the main () function should be filled with empty
The size of the sequence nop is theoretically determined by the depth of the recursion. The g ++ mana states that the maximum depths are 17 and 1024 for c ++ 11.
Quote from mana
I did not find specific numbers in the standards. Only the point was found that the maximum recursion depth is determined by the implementation: -ftemplate-depth = n
Set the maximum instantiation depth for template classes to n. A limit on the template instantiation depth is needed
to detect endless recursions during template class instantiation. ANSI / ISO C ++ conforming programs must not rely on a
maximum depth greater than 17 (changed to 1024 in C ++ 11). The default value is 900, as the compiler can run out of
stack space before hitting 1024 in some situations.
Set the maximum instantiation depth for template classes to n. A limit on the template instantiation depth is needed
to detect endless recursions during template class instantiation. ANSI / ISO C ++ conforming programs must not rely on a
maximum depth greater than 17 (changed to 1024 in C ++ 11). The default value is 900, as the compiler can run out of
stack space before hitting 1024 in some situations.
4.7.1.14 for c ++ 03
4.7.1.15 for c ++ 11
Which was confirmed experimentally. For a large enough LVL, compilers crashed. Surprised by clang ++ which crashed on 2 13 , unlike g ++ and icpc reaching 2 17 .
The assembly was carried out by the commands:
clang++ -DLVL=$(( 2**$n)) -o list$n ./list.cc -ftemplate-depth=3000000 -O$x
g++ -DLVL=$(( 2**$n)) -o list$n ./list.cc -ftemplate-depth=3000000 -O$x
icpc -DLVL=$(( 2**$n)) -o list$n ./list.cc -O$x
within build script
Binaries were collected sequentially and for a long time. For each compiler, for each level of optimization. Assembly results are shown in graphs. Four image columns:#/bin/bash
MIN=0
MAX=19
for i in 2;
do
mkdir -p attempt$i
cd attempt$i
for CXX in clang++ g++ icpc
do
mkdir -p $CXX
cd $CXX
for O in O0 O1 O2 O3 Os
do
mkdir -p $O
cd $O
(while :;do free -m |grep Mem|awk '{print $3}' >> ./memory;sleep 1;done)&
TIME=$!
for i in $(seq $MIN $MAX)
do
# CMD="$CXX -$O ../../../fbomb.cc -DLVL=$i -o fbomb$i" #-save-temps=obj "
CMD="$CXX -$O ../../../list.cc -DLVL=$(( 2 ** $i)) -o list$i -ftemplate-depth=3000000" #-save-temps=obj "
echo $CMD
$CMD
sleep 5
sleep 5
done
kill -9 $TIME
cd ..
done
cd ..
done
cd ..
done
- Dependence of used memory on time. These graphs are for reference only, as during compilation, the X server and browser worked, but the main trends are visible.
- The size of the binary file. The maximum turned out - 14MB
- Characters. The inline keyword is just a recommendation to the compiler, so for large N. Nops are grouped into regular functions. They are calculated as follows:
nm --demangle ./list$i|grep nop|wc -l
- The number of honest nops. Calculated from the disassembler:
objdump -d ./list$i|grep 'nop$'|wc -l
It looks like the images scale correctly only in chrome.
For other browsers
screenshot
and a link to the archive with all the images
Images
The maximum file size was 14MB for 2 17 icpc compilers. For g ++ - 12 MB. Both at O0. Optimization level O0 does not perform inline substitution, therefore the number of nopComparison of different optimization levels for the same compiler
Memory consumption at compile time | Executable file size | The number of characters in the executable file | Number of operations nop |
---|---|---|---|
![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() |
Comparison of different compilers at different optimization levels
Memory consumption at compile time | Executable file size | The number of characters in the executable file | Number of operations nop |
---|---|---|---|
![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() |
Tall trees
For a linear data structure, the size of the generated file is limited to at least the maximum default recursion depth (256 for clang ++, 900 for g ++). To get around it, you can try to create a tree. The maximum theoretical depth of the compilation stage tree is (sizeof (long) -1) == 63. And 2 64 bytes will overflow any disk. The practical limit is much less.Using a tree, we do not go beyond the maximum depth of recursion.
The source code takes 19 lines and looks like this:
#ifndef LVL
# define LVL 3
#endif
long x = 0;
template
struct foo{
inline static double bar(double m)
{x++;return foo::bar(m) + foo::bar(m) + I;};
};
template
struct foo<0,I>{
inline static double bar(double m) {x++; return m;}
};
#include
int main(int argc, char **argv){
double ret = foo<>::bar(argc);
std::cout << x << " " << ret << std::endl;
return int(ret);
}
The result is a tree, each node of which has its own type. Each type is characterized by a pair of numbers:
- N - level number;
- I is the node number at the level.
Each node inside one level is numbered from 0 to N.
I didn’t mess with nop here, I used addition. Global long x - was used to control the correct assembly. As a result, 2 LVL + 1 is returned .
The assembly was carried out by the commands:
clang++ -DLVL=$n -o fbomb$n ./fbomb.cc -O$x
g++ -DLVL=$n -o fbomb$n ./fbomb.cc -O$x
icpc -DLVL=$n -o fbomb$n ./fbomb.cc -O$x
in the same scenario as above. The maximum LVL is 18 for clang ++. For g ++ and icpc - 16, regardless of whether the option --std = c ++ 11 was specified or not. Compilers ran out of memory.
It looks like the images scale correctly only in chrome.
For other browsers
screenshot
and a link to the archive with all the images
Pictures for tree
Memory consumption at compile time | Executable file size | The number of characters in the executable file |
---|---|---|
![]() | ![]() | ![]() |
![]() | ![]() | ![]() |
![]() | ![]() | ![]() |
Memory consumption at compile time | Executable file size | The number of characters in the executable file |
---|---|---|
![]() | ![]() | ![]() |
![]() | ![]() | ![]() |
![]() | ![]() | ![]() |
![]() | ![]() | ![]() |
![]() | ![]() | ![]() |
Maximum file size:
- 43MB for icpc -O0 -DLVL = 17;
- 42MB for clang ++ -O0 -DLVL = 17;
- 22MB for g ++ -O0 -DLVL = 16.
Explicit instantiation
43MB - not so small, but is it possible to make the file even larger with a given amount of RAM? It turned out possible, but only one of the three compilers - icpc. For this you need to use external templates and explicit instantiation.We modify the source code a little so that all the parameters of the template are indicated in its description. We divide the source code into three files - the description of the template, the main function and the partial instantiation of subtrees:
fbomb.hh
extern long x;
template
struct foo{
inline static double bar(double m)
{x++;return foo::bar(m) + foo::bar(m) + I;};
};
template
struct foo{
inline static double bar(double m) {x++; return m;}
};
main.cc
#include "fbomb.hh"
//for i in $(seq 0 13);do echo "extern template struct foo;";done
#define L (LVL-5)
extern template struct foo;
extern template struct foo;
extern template struct foo;
extern template struct foo;
...
extern template struct foo;
extern template struct foo;
#include
long x = 0;
int main(int argc, char **argv){
double ret = foo::bar(argc);
std::cout << x << " " << ret << std::endl;
return int(ret);
}
part.cc
It turned out that g ++ -c main.cc -DLVL = 21 crashes due to lack of memory as well as with a full instantiation, regardless of the version of the standard. The same situation for clang ++. Icpc compiles main.cc in less than a second. However, compiling the subtrees took more than 4 hours of time:template class foo;
for i in $(seq 0 31);do
echo -n "$i:";date;
icpc -O2 -c ./part.cc -o part21_16_$i.o -DLVL=21 -D_L=16 -D_I=$i;sleep 10;
done
Memory consumption during subtree compilation
Linking took less than a minute. The result is a 53MB file. This file was built with -O2. -O0 would give a larger size, but I did not rebuild it several times because of the time 
The largest ratio volume / number of lines = 2.3MB / line was obtained for the array from the first part (icpc -O0 list.cc).
The metric is certainly funny but funny. 2.3 - the maximum that happened. I will be glad to know if someone gets a bigger attitude.
Good luck to all of us.
Upd : Strip did not, but it would be necessary. I thought there was a couple of kilobytes - but it turned out to be a percentage of the size, and quite large. After strip, the maximum size dropped to 37MB (from 53). and 8.6MB (from 14). Accordingly, the ratio is 1.43MB / line.