Hello from the libc free world! (Part 1)

Original author: Jessica McKellar
  • Transfer
As an exercise, I want to write a program in C. Simple enough to disassemble it and explain all the code to myself.

Sounds easy, right?

The reader is expected to have experience compiling programs and working in Linux. A little ability to read assembly code is also useful.

So, here is our simplest hellworld:

jesstess @ kid-charlemagne: ~ / c $ cat hello.c
#include 
int main ()
{
	printf ("Hello World \ n");
	return 0;
}

Compile it and count the number of characters:

jesstess @ kid-charlemagne: ~ / c $ gcc -o hello hello.c
jesstess @ kid-charlemagne: ~ / c $ wc -c hello
10931 hello

Figase! Where do these 11 kilobytes come from? objdump -t helloshows 79 entries in the identifier table, most of which are responsible for the standard library.

So we will not use it. And printfwe will not use it either to get rid of the inclusion:

jesstess @ kid-charlemagne: ~ / c $ cat hello.c
int main ()
{
	char * str = "Hello World";
	return 0;
}

We recompile and recount the number of characters:

jesstess @ kid-charlemagne: ~ / c $ gcc -o hello hello.c
jesstess @ kid-charlemagne: ~ / c $ wc -c hello
10892 hello

Almost nothing has changed? Ha!

The problem is that gcc still uses startup files (?) During linking. Evidence? We compile with the key -nostdlib, after which (in accordance with the documentation) gcc “will not use system libraries and startup files when linking. Only files explicitly transferred to the linker will be used. ”

jesstess @ kid-charlemagne: ~ / c $ gcc -nostdlib -o hello hello.c
/ usr / bin / ld: warning: cannot find entry symbol _start; defaulting to 00000000004000e8

Just a warning, still try:

jesstess @ kid-charlemagne: ~ / c $ wc -c hello
1329 hello

Looks good! We reduced the size to much more sane (as much as a whole order!) ...

jesstess @ kid-charlemagne: ~ / c $ ./hello
Segmentation fault

... and paid for it by default. Pancake.

For fun, we will make our program run before we begin to understand assembler.

What does the symbol _startthat seems to be needed to run the program do? Where is it usually defined when using libc?

By default, from the point of view of the linker, it is _start, rather than main, that this is the real entry point into the program. Usually _startdefined in roaming ELFcrt1.o . We will verify this by linking the worldword c crt1.oand noticing what is _startnow detected (but other problems have appeared in return due to the fact that other startup symbols libc are not defined):

# compile the source without linking
jesstess @ kid-charlemagne: ~ / c $ gcc -Os -c hello.c
# now try to link
jesstess @ kid-charlemagne: ~ / c $ ld /usr/lib/crt1.o -o hello hello.o
/usr/lib/crt1.o: In function `_start ':
/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:106: undefined reference to `__libc_csu_fini '
/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:107: undefined reference to `__libc_csu_init '
/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:113: undefined reference to `__libc_start_main '

The check informed that on this computer it _startlives in the libc: source sysdeps/x86_64/elf/start.S. This delightfully commented file exports a character _start, initializes the stack, some registers, and calls __libc_start_main. If you look at the very bottom csu/libc-start.c, you can see the call of _mainour program:

/ * Nothing special, just call the function * /
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);

... and off we go.

So that's why you need it _start. For convenience, let's summarize what is happening between _startand the call main: initialize a bunch of things for libc and call main. And since we do not need libc, we export our own symbol _start, which only knows what to call main, and we link to it:

jesstess @ kid-charlemagne: ~ / c $ cat stubstart.S
.globl _start
_start:
	call main

We compile and execute the hellword with the assembler stub _start:

jesstess @ kid-charlemagne: ~ / c $ gcc -nostdlib stubstart.S -o hello hello.c
jesstess @ kid-charlemagne: ~ / c $ ./hello
Segmentation fault

Hooray, there are no more problems compiling. But the segfault is still here. Why? Compile with debugging information and take a look at gdb. Set the breakpoint on mainand step by step execute the program to segfault:

jesstess @ kid-charlemagne: ~ / c $ gcc -g -nostdlib stubstart.S -o hello hello.c
jesstess @ kid-charlemagne: ~ / c $ gdb hello
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3 +: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu" ...
(gdb) break main
Breakpoint 1 at 0x4000f4: file hello.c, line 3.
(gdb) run
Starting program: / home / jesstess / c / hello
Breakpoint 1, main () at hello.c: 5
5 char * str = "Hello World";
(gdb) step
6 return 0;
(gdb) step
7}
(gdb) step
0x00000000004000ed in _start ()
(gdb) step
Single stepping until exit from function _start,
which has no line number information.
main () at helloint.c: 4
4 {
(gdb) step
Breakpoint 1, main () at helloint.c: 5
5 char * str = "Hello World";
(gdb) step
6 return 0;
(gdb) step
7}
(gdb) step
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000001 in ?? ()
(gdb)

What? mainperformed twice? ... It's time to take up the assembler:

jesstess @ kid-charlemagne: ~ / c $ objdump -d hello
hello: file format elf64-x86-64
Disassembly of section .text:
00000000004000e8 <_start>:
  4000e8: e8 03 00 00 00 callq 4000f0
  4000ed: 90 nop
  4000ee: 90 nop
  4000ef: 90 nop    
00000000004000f0:
  4000f0: 55 push% rbp
  4000f1: 48 89 e5 mov% rsp,% rbp
  4000f4: 48 c7 45 f8 03 01 40 movq $ 0x400103, -0x8 (% rbp)
  4000fb: 00
  4000fc: b8 00 00 00 00 mov $ 0x0,% eax
  400101: c9 leaveq
  400102: c3 retq

Heh! We will leave a detailed analysis of the assembler for later, noting briefly the following: after returning from callqto, mainwe execute several nopand return directly to main. Since re-entry main was made without setting the return instruction pointer on the stack (as part of the standard preparation for calling the function), the second call retq tries to get the dummy return instruction pointer from the stack and the program crashes. Need a way to complete.

Literally. After returning from callqto %eax, push is made 1, the system call codesys_exit , and so on. we need to report the correct completion, we put in %ebx 0, the only argument SYS_exit. Now we enter the kernel with an interrupt int $0x80.

jesstess @ kid-charlemagne: ~ / c $ cat stubstart.S
.globl _start
_start:
	call main
	movl $ 1,% eax
	xorl% ebx,% ebx
	int $ 0x80
jesstess @ kid-charlemagne: ~ / c $ gcc -nostdlib stubstart.S -o hello hello.c
jesstess @ kid-charlemagne: ~ / c $ ./hello
jesstess @ kid-charlemagne: ~ / c $

Hurrah! The program compiles, starts, and even runs normally when run through gdb.

Hello from the libc free world!

Stay with me, in the second part we will analyze the assembler code in detail, see what happens if you make the program more complex, and look a little more at linking, calling conventions and binary ELF file structure in x86 architecture.

Also popular now: