
What is faster while (true) or for (;;)?
In the raw materials of different authors I saw different versions of the perpetual cycle. Most often I met the following:
and
Since everyone defended “their eternal cycle” as a native, I decided to figure it out. Who writes the most optimal code.
I wrote 2 sources:
while.c:
for.c:
Collected them:
And disassembled. Who is too lazy to read assembler lists - you can scroll down the page. Actually listings:
Various optimizations did not affect the implementation of the while (true) loop - it always executed 3 commands: mov, callq and jmp. Also, optimizations did not affect the implementation of for - it was also always from 3 commands: mov, callq, jmp. Between themselves mov, callq and jmp were no different. The length of instructions in bytes in all 6 cases is unchanged.
There is only a small difference between the -O1 and -O2 / -O3 jmp implementations executed on main + 4 and not on main + 8, but given that this is a static address (as seen from the asm code) it also does not make a difference performance ... Although ... what if the memory pages are different, as far as I know for gestures between different memory pages in x86 (and amd64) additional efforts are required!
We
recognize : 400438/4096 = 97.763183594
400520/4096 = 97.783203125
Carried. The memory page is one. Yes, this is the 97th page of the Virtual Memory of the Virtual Address Space of the process. But we also need it.
while (true) and for (;;) are identical in performance with each other and with any -Ox optimizations. So if you are asked which of them is faster - feel free to say that “for (;;)” - 8 characters to write faster than “while (true)” - 12 characters.
For those who do not believe that without -Ox it will be the same:
PS of course, all this will be true on the compiler “gcc version 4.7.2 (Debian 4.7.2-5)”
while (true) {
...
}
and
for (;;) {
...
}
Since everyone defended “their eternal cycle” as a native, I decided to figure it out. Who writes the most optimal code.
I wrote 2 sources:
while.c:
#include
int main (int argc, char* argv[])
{
while(1){
printf("1\n");
}
}
for.c:
#include
int main (int argc, char* argv[])
{
for(;;){
printf("1\n");
}
}
Collected them:
$ gcc -O3 while.c -o while.o3
$ gcc -O2 while.c -o while.o2
$ gcc -O1 while.c -o while.o1
$ gcc -O3 for.c -o for.o3
$ gcc -O2 for.c -o for.o2
$ gcc -O1 for.c -o for.o1
And disassembled. Who is too lazy to read assembler lists - you can scroll down the page. Actually listings:
$ objdump -d ./while.o3 ... 0000000000400430
: 400430: 48 83 ec 08 sub $0x8,%rsp 400434: 0f 1f 40 00 nopl 0x0(%rax) 400438: bf d4 05 40 00 mov $0x4005d4,%edi 40043d: e8 be ff ff ff callq 400400
400442: eb f4 jmp 400438 ... $ objdump -d ./while.o2 ... 0000000000400430 : 400430: 48 83 ec 08 sub $0x8,%rsp 400434: 0f 1f 40 00 nopl 0x0(%rax) 400438: bf d4 05 40 00 mov $0x4005d4,%edi 40043d: e8 be ff ff ff callq 400400
400442: eb f4 jmp 400438 ... $ objdump -d ./while.o1 ... 000000000040051c : 40051c: 48 83 ec 08 sub $0x8,%rsp 400520: bf d4 05 40 00 mov $0x4005d4,%edi 400525: e8 d6 fe ff ff callq 400400
40052a: eb f4 jmp 400520 ... $ objdump -d ./for.o1 ... 000000000040051c : 40051c: 48 83 ec 08 sub $0x8,%rsp 400520: bf d4 05 40 00 mov $0x4005d4,%edi 400525: e8 d6 fe ff ff callq 400400
40052a: eb f4 jmp 400520 ... $ objdump -d ./for.o2 ... 0000000000400430 : 400430: 48 83 ec 08 sub $0x8,%rsp 400434: 0f 1f 40 00 nopl 0x0(%rax) 400438: bf d4 05 40 00 mov $0x4005d4,%edi 40043d: e8 be ff ff ff callq 400400
400442: eb f4 jmp 400438 ... $ objdump -d ./for.o3 0000000000400430 : 400430: 48 83 ec 08 sub $0x8,%rsp 400434: 0f 1f 40 00 nopl 0x0(%rax) 400438: bf d4 05 40 00 mov $0x4005d4,%edi 40043d: e8 be ff ff ff callq 400400
400442: eb f4 jmp 400438
We disassemble on the fingers
Various optimizations did not affect the implementation of the while (true) loop - it always executed 3 commands: mov, callq and jmp. Also, optimizations did not affect the implementation of for - it was also always from 3 commands: mov, callq, jmp. Between themselves mov, callq and jmp were no different. The length of instructions in bytes in all 6 cases is unchanged.
There is only a small difference between the -O1 and -O2 / -O3 jmp implementations executed on main + 4 and not on main + 8, but given that this is a static address (as seen from the asm code) it also does not make a difference performance ... Although ... what if the memory pages are different, as far as I know for gestures between different memory pages in x86 (and amd64) additional efforts are required!
We
recognize : 400438/4096 = 97.763183594
400520/4096 = 97.783203125
Carried. The memory page is one. Yes, this is the 97th page of the Virtual Memory of the Virtual Address Space of the process. But we also need it.
Total
while (true) and for (;;) are identical in performance with each other and with any -Ox optimizations. So if you are asked which of them is faster - feel free to say that “for (;;)” - 8 characters to write faster than “while (true)” - 12 characters.
For those who do not believe that without -Ox it will be the same:
$ gcc while.c -o while.noO
$ objdump -d while.noO
...
40052b: bf e4 05 40 00 mov $0x4005e4,%edi
400530: e8 cb fe ff ff callq 400400
400535: eb f4 jmp 40052b
...
$ gcc for.c -o for.noO
$ objdump -d for.noO
...
40052b: bf e4 05 40 00 mov $0x4005e4,%edi
400530: e8 cb fe ff ff callq 400400
400535: eb f4 jmp 40052b
...
PS of course, all this will be true on the compiler “gcc version 4.7.2 (Debian 4.7.2-5)”