Hello from the libc free world! (Part 1)
- Transfer
As an exercise, I want to write a program in C. Simple enough to disassemble it and explain all the code to myself.
Sounds easy, right?
The reader is expected to have experience compiling programs and working in Linux. A little ability to read assembly code is also useful.
So, here is our simplest hellworld:
Compile it and count the number of characters:
Figase! Where do these 11 kilobytes come from?
So we will not use it. And
We recompile and recount the number of characters:
Almost nothing has changed? Ha!
The problem is that gcc still uses startup files (?) During linking. Evidence? We compile with the key
Just a warning, still try:
Looks good! We reduced the size to much more sane (as much as a whole order!) ...
... and paid for it by default. Pancake.
For fun, we will make our program run before we begin to understand assembler.
What does the symbol
By default, from the point of view of the linker, it is
The check informed that on this computer it
... and off we go.
So that's why you need it
We compile and execute the hellword with the assembler stub
Hooray, there are no more problems compiling. But the segfault is still here. Why? Compile with debugging information and take a look at gdb. Set the breakpoint on
What?
Heh! We will leave a detailed analysis of the assembler for later, noting briefly the following: after returning from
Literally. After returning from
Hurrah! The program compiles, starts, and even runs normally when run through gdb.
Hello from the libc free world!
Stay with me, in the second part we will analyze the assembler code in detail, see what happens if you make the program more complex, and look a little more at linking, calling conventions and binary ELF file structure in x86 architecture.
Sounds easy, right?
The reader is expected to have experience compiling programs and working in Linux. A little ability to read assembly code is also useful.
So, here is our simplest hellworld:
jesstess @ kid-charlemagne: ~ / c $ cat hello.c #includeint main () { printf ("Hello World \ n"); return 0; }
Compile it and count the number of characters:
jesstess @ kid-charlemagne: ~ / c $ gcc -o hello hello.c jesstess @ kid-charlemagne: ~ / c $ wc -c hello 10931 hello
Figase! Where do these 11 kilobytes come from?
objdump -t hello
shows 79 entries in the identifier table, most of which are responsible for the standard library. So we will not use it. And
printf
we will not use it either to get rid of the inclusion:jesstess @ kid-charlemagne: ~ / c $ cat hello.c int main () { char * str = "Hello World"; return 0; }
We recompile and recount the number of characters:
jesstess @ kid-charlemagne: ~ / c $ gcc -o hello hello.c jesstess @ kid-charlemagne: ~ / c $ wc -c hello 10892 hello
Almost nothing has changed? Ha!
The problem is that gcc still uses startup files (?) During linking. Evidence? We compile with the key
-nostdlib
, after which (in accordance with the documentation) gcc “will not use system libraries and startup files when linking. Only files explicitly transferred to the linker will be used. ”jesstess @ kid-charlemagne: ~ / c $ gcc -nostdlib -o hello hello.c / usr / bin / ld: warning: cannot find entry symbol _start; defaulting to 00000000004000e8
Just a warning, still try:
jesstess @ kid-charlemagne: ~ / c $ wc -c hello 1329 hello
Looks good! We reduced the size to much more sane (as much as a whole order!) ...
jesstess @ kid-charlemagne: ~ / c $ ./hello Segmentation fault
... and paid for it by default. Pancake.
For fun, we will make our program run before we begin to understand assembler.
What does the symbol
_start
that seems to be needed to run the program do? Where is it usually defined when using libc? By default, from the point of view of the linker, it is
_start
, rather than main
, that this is the real entry point into the program. Usually _start
defined in roaming ELFcrt1.o
. We will verify this by linking the worldword c crt1.o
and noticing what is _start
now detected (but other problems have appeared in return due to the fact that other startup symbols libc are not defined):# compile the source without linking jesstess @ kid-charlemagne: ~ / c $ gcc -Os -c hello.c # now try to link jesstess @ kid-charlemagne: ~ / c $ ld /usr/lib/crt1.o -o hello hello.o /usr/lib/crt1.o: In function `_start ': /build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:106: undefined reference to `__libc_csu_fini ' /build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:107: undefined reference to `__libc_csu_init ' /build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:113: undefined reference to `__libc_start_main '
The check informed that on this computer it
_start
lives in the libc: source sysdeps/x86_64/elf/start.S
. This delightfully commented file exports a character _start
, initializes the stack, some registers, and calls __libc_start_main
. If you look at the very bottom csu/libc-start.c
, you can see the call of _main
our program:/ * Nothing special, just call the function * / result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
... and off we go.
So that's why you need it
_start
. For convenience, let's summarize what is happening between _start
and the call main
: initialize a bunch of things for libc and call main
. And since we do not need libc, we export our own symbol _start
, which only knows what to call main
, and we link to it:jesstess @ kid-charlemagne: ~ / c $ cat stubstart.S .globl _start _start: call main
We compile and execute the hellword with the assembler stub
_start
:jesstess @ kid-charlemagne: ~ / c $ gcc -nostdlib stubstart.S -o hello hello.c jesstess @ kid-charlemagne: ~ / c $ ./hello Segmentation fault
Hooray, there are no more problems compiling. But the segfault is still here. Why? Compile with debugging information and take a look at gdb. Set the breakpoint on
main
and step by step execute the program to segfault:jesstess @ kid-charlemagne: ~ / c $ gcc -g -nostdlib stubstart.S -o hello hello.c jesstess @ kid-charlemagne: ~ / c $ gdb hello GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3 +: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu" ... (gdb) break main Breakpoint 1 at 0x4000f4: file hello.c, line 3. (gdb) run Starting program: / home / jesstess / c / hello Breakpoint 1, main () at hello.c: 5 5 char * str = "Hello World"; (gdb) step 6 return 0; (gdb) step 7} (gdb) step 0x00000000004000ed in _start () (gdb) step Single stepping until exit from function _start, which has no line number information. main () at helloint.c: 4 4 { (gdb) step Breakpoint 1, main () at helloint.c: 5 5 char * str = "Hello World"; (gdb) step 6 return 0; (gdb) step 7} (gdb) step Program received signal SIGSEGV, Segmentation fault. 0x0000000000000001 in ?? () (gdb)
What?
main
performed twice? ... It's time to take up the assembler:jesstess @ kid-charlemagne: ~ / c $ objdump -d hello hello: file format elf64-x86-64 Disassembly of section .text: 00000000004000e8 <_start>: 4000e8: e8 03 00 00 00 callq 4000f0 4000ed: 90 nop 4000ee: 90 nop 4000ef: 90 nop 00000000004000f0: 4000f0: 55 push% rbp 4000f1: 48 89 e5 mov% rsp,% rbp 4000f4: 48 c7 45 f8 03 01 40 movq $ 0x400103, -0x8 (% rbp) 4000fb: 00 4000fc: b8 00 00 00 00 mov $ 0x0,% eax 400101: c9 leaveq 400102: c3 retq
Heh! We will leave a detailed analysis of the assembler for later, noting briefly the following: after returning from
callq
to, main
we execute several nop
and return directly to main
. Since re-entry main
was made without setting the return instruction pointer on the stack (as part of the standard preparation for calling the function), the second call retq
tries to get the dummy return instruction pointer from the stack and the program crashes. Need a way to complete. Literally. After returning from
callq
to %eax
, push is made 1
, the system call codesys_exit
, and so on. we need to report the correct completion, we put in %ebx 0
, the only argument SYS_exit
. Now we enter the kernel with an interrupt int $0x80
.jesstess @ kid-charlemagne: ~ / c $ cat stubstart.S .globl _start _start: call main movl $ 1,% eax xorl% ebx,% ebx int $ 0x80 jesstess @ kid-charlemagne: ~ / c $ gcc -nostdlib stubstart.S -o hello hello.c jesstess @ kid-charlemagne: ~ / c $ ./hello jesstess @ kid-charlemagne: ~ / c $
Hurrah! The program compiles, starts, and even runs normally when run through gdb.
Hello from the libc free world!
Stay with me, in the second part we will analyze the assembler code in detail, see what happens if you make the program more complex, and look a little more at linking, calling conventions and binary ELF file structure in x86 architecture.