We create the ELF file with debugging information (DWARF) manually (for ARM microcontrollers)


Recently, I became interested in microcontrollers. First AVR, then ARM. There are two main options for programming microcontrollers: assembler and C. However, I am a fan of the Fort programming language and started porting it to these microcontrollers. Of course, there are ready-made solutions, but none of them had what I wanted: debugging with gdb. And I set out to fill this gap (so far only for ARM). I had a stm32vldiscovery board with a 32-bit ARM Cortex-M3 processor, 128kB flash and 8kB RAM, which is why I started with it.
I wrote the Fort cross-translator of course on Fort, and there will be no code in the article, since this language is considered exotic. I will limit myself to fairly detailed recommendations. There is almost no documentation and examples on the network on the subject, some parameters were selected by me through trial and error, some by analyzing the output files of the gcc compiler. In addition, I used only the necessary minimum debugging information, without touching, for example, relocation-s and many other things. The topic is very extensive and, I confess, I figured it out only 30 percent, which turned out to be sufficient for me.

Anyone interested in this project can download the code here .

ELF Review

Standard development tools compile your program into an ELF (Executable and Linkable Format) file with the ability to include debugging information. The format specification can be found here . In addition, each architecture has its own characteristics, for example , ARM features . Let's consider briefly this format.
The ELF executable file consists of the following parts:
1. Title (ELF Header)

Contains general information about the file and its main characteristics.
2. The title of the program (Program Header Table)

This is a table of correspondence between file sections and memory segments; it tells the loader which memory area to write each section to.
3. Sections

Sections contain all the information in the file (program, data, debugging information, etc.)
Each section has a type, name, and other parameters. In the ".text" section, code is usually stored, in ".symtab" - a table of program symbols (file names, procedures and variables), in ".strtab" - a table of lines, in sections with a prefix ".debug_" - debug information, etc. .d. In addition, the file must necessarily have an empty section with index 0.

4. Section Header Table

This is a table containing an array of section headers.
The format is discussed in more detail in the Creating ELF section.

DWARF Review

DWARF is a standardized format for debugging information. The standard can be downloaded on the official website . There is also a wonderful brief overview of the format:  Introduction to the DWARF Debugging Format (Michael J. Eager).
Why do I need debugging information? It allows you to:
  • set breakpoints not on the physical address, but on the line number in the source code file or the name of the function
  • display and change the values ​​of global and local variables, as well as function parameters
  • display call stack (backtrace)
  • execute the program step by step, not according to one assembler instruction, but along lines of source code

This information is stored as a tree structure. Each node of the tree has a parent, can have children, and is called DIE (Debugging Information Entry). Each node has its own tag (type) and a list of attributes (properties) that describe the node. Attributes can contain anything you want, for example, data or links to other nodes. In addition, there is information stored outside the tree.
Nodes are divided into two main types: nodes that describe data, and nodes that describe code.
Nodes describing data:

  1. Data types:
    • Basic data types (node ​​with type DW_TAG_base_type), for example, such as type int in C.
    • Compound data types (pointers, etc.)
    • Arrays
    • Structures, classes, unions, interfaces

  2. Data Objects:
    • constants
    • function parameters
    • variables
    • etc.

Each data object has a DW_AT_location attribute, which indicates how the address at which the data is located is calculated. For example, a variable can have a fixed address, be in a register or on a stack, be a member of a class or object. This address can be calculated in a rather complicated way, so the standard provides for the so-called Location Expressions, which can contain a sequence of operators of a special internal stack machine.

Nodes describing the code:

  1. Procedures (functions) - nodes with the tag DW_TAG_subprogram. Descendant nodes can contain descriptions of variables - function parameters and local function variables.
  2. Compilation Unit Contains information for the program and is the parent of all other nodes.

The information described above is in the sections .debug_info and .debug_abbrev.

Other information:

  • Information about line numbers (section ".debug_line")
  • Macro Information (section ".debug_macinfo")
  • Call Frame Information (".debug_frame" section)

Creating ELF

We will create EFL files using the libelf library from the elfutils package . There is a good article on the network on the use of libelf - LibELF by Example (unfortunately, the creation of files in it is described very briefly) as well as the documentation .
Creating a file consists of several steps:
  1. Libelf initialization
  2. Creating a file header (ELF Header)
  3. Creating a program header (Program Header Table)
  4. Create sections
  5. File record

Consider the steps in more detail.

Libelf initialization

First you will need to call the elf_version function (EV_CURRENT) and check the result. If it is equal to EV_NONE, an error has occurred and further actions cannot be performed. Then we need to create the file we need on the disk, get its handle and pass it to the elf_begin function:
Elf * elf_begin( 
    int fd, 
    Elf_Cmd cmd, 
    Elf *elf) 

  • fd - handle to the newly opened file
  • cmd - mode (ELF_C_READ for reading information, ELF_C_WRITE for writing or ELF_C_RDWR for reading / writing), it must correspond to the open file mode (ELF_C_WRITE in our case)
  • elf - needed only for working with archive files (.a), in our case, you need to pass 0

The function returns a pointer to the created descriptor, which will be used in all libelf functions, 0 is returned in case of an error.

Create a title

The new file header is created by the elf32_newehdr function: 
Elf32_Ehdr * elf32_newehdr( 
    Elf *elf); 

  • elf - handle returned by elf_begin

Returns 0 on error, or a pointer to a structure - the header of an ELF file:
#define EI_NIDENT 16 
typedef struct { 
    unsigned char e_ident[EI_NIDENT]; 
    Elf32_Half e_type; 
    Elf32_Half e_machine; 
    Elf32_Word e_version; 
    Elf32_Addr e_entry; 
    Elf32_Off e_phoff; 
    Elf32_Off e_shoff; 
    Elf32_Word e_flags; 
    Elf32_Half e_ehsize; 
    Elf32_Half e_phentsize; 
    Elf32_Half e_phnum; 
    Elf32_Half e_shentsize; 
    Elf32_Half e_shnum; 
    Elf32_Half e_shstrndx; 
} Elf32_Ehdr; 

Some of its fields are filled in a standard way, some need to be filled in for us:
  • e_ident - identification byte array, has the following indices:
    • EI_MAG0, EI_MAG1, EI_MAG2, EI_MAG3 - these 4 bytes must contain the characters 0x7f, 'ELF', which the elf32_newehdr function has already done for us
    • EI_DATA - indicates the type of data encoding in the file: ELFDATA2LSB or ELFDATA2MSB. You need to install ELFDATA2LSB like this: e_ident [EI_DATA] = ELFDATA2LSB
    • EI_VERSION - file header version, already installed for us
    • EI_PAD - do not touch
  • e_type - file type, may be ET_NONE - without type, ET_REL - file to be moved, ET_EXEC - executable file, ET_DYN - shared object file, etc. We need to set the file type to ET_EXEC
  • e_machine - the architecture required for this file, for example EM_386 - for Intel architecture, for ARM we need to write EM_ARM (40) here - see ELF for the ARM Architecture
  • e_version - file version, must be set to EV_CURRENT
  • e_entry - address of the entry point, for us it is not necessary
  • e_phoff - offset in the program header file, e_shoff - section header offset, do not fill
  • e_flags - processor-specific flags, for our architecture (Cortex-M3) you need to set it to 0x05000000 (ABI version 5)
  • e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum - do not touch
  • e_shstrndx - contains the number of the section in which there is a table of rows with section headers. Since we don’t have any sections yet, we’ll install this number later

Create a program title

As already mentioned, the Program Header Table is a table of correspondence between file sections and memory segments, which tells the loader where to write each section. The title created is created using the elf32_newphdr function:
Elf32_Phdr * elf32_newphdr( 
    Elf *elf, 
    size_t count); 

  • elf is our descriptor
  • count - the number of created table elements. Since we will have only one section (with program code), then count will be 1.

Returns 0 on error or a pointer to the program header.
Each element in the header table is described by this structure:
typedef struct { 
    Elf32_Word p_type; 
    Elf32_Off p_offset; 
    Elf32_Addr p_vaddr; 
    Elf32_Addr p_paddr; 
    Elf32_Word p_filesz; 
    Elf32_Word p_memsz; 
    Elf32_Word p_flags; 
    Elf32_Word p_align; 
} Elf32_Phdr;

  • p_type - type of segment (section), here we must specify PT_LOAD - loadable segment
  • p_offset - offsets in the file, from where the data of the section that will be loaded into memory begins. We have this section .text, which will be located immediately after the file header and the program header, we can calculate the offset as the sum of the lengths of these headers. The length of any type can be obtained using the elf32_fsize function:
    size_t elf32_fsize(Elf_Type type, size_t count, unsigned int version);
    type - here the constant ELF_T_xxx, we will need the sizes ELF_T_EHDR and ELF_T_PHDR; count - the number of elements of the desired type, version - must be set to EV_CURRENT
  • p_vaddr, p_paddr - virtual and physical address to which the contents of the section will be loaded. Since we do not have virtual addresses, we set it equal to physical, in the simplest case - 0, because this is where our program will be loaded.
  • p_filesz, p_memsz - section size in file and memory. We have the same ones, but since there is no section with program code yet, we will install them later
  • p_flags - permissions for the loaded memory segment. It can be PF_R - read, PF_W - write, PF_X - execute, or a combination thereof. Set p_flags to PF_R + PF_X
  • p_align - alignment of the segment, we have 4

Create sections

After creating the headers, you can start creating sections. An empty section is created using the elf_newscn function:
Elf_Scn * elf_newscn( 
    Elf *elf);

  • elf - handle returned earlier by elf_begin

The function returns a pointer to a section or 0 on error.
After creating a section, you need to fill out the section header and create a section data descriptor.
We can get a pointer to the section header using the elf32_getshdr function:
Elf32_Shdr * elf32_getshdr( 
    Elf_Scn *scn);

  • scn is the pointer to the section that we got from the elf_newscn function.

The section title looks like this:
typedef struct { 
    Elf32_Word		sh_name; 
    Elf32_Word		sh_type; 
    Elf32_Word		sh_flags; 
    Elf32_Addr		sh_addr; 
    Elf32_Off		sh_offset; 
    Elf32_Word		sh_size; 
    Elf32_Word		sh_link; 
    Elf32_Word		sh_info; 
    Elf32_Word		sh_addralign; 
    Elf32_Word		sh_entsize; 
} Elf32_Shdr;

  • sh_name - section name - offset in the string table of the section headers (section .shstrtab) - see “Tables of rows” below
  • sh_type - content type of the section, for the section with the program code you need to set SHT_PROGBITS, for sections with the string table - SHT_STRTAB, for the character table - SHT_SYMTAB 
  • sh_flags - section flags that can be combined, and of which we only need three:
    • SHF_ALLOC - means that the section will be loaded into memory
    • SHF_EXECINSTR - section contains executable code
    • SHF_STRINGS - section contains a table of rows

    Accordingly, for the .text section with the program, you must set the flags SHF_ALLOC + SHF_EXECINSTR
  • sh_addr - address where the section will be loaded into memory
  • sh_offset - section offset in the file - do not touch, the library will install for us
  • sh_size - section size - do not touch
  • sh_link - contains the number of the associated section; it is needed to link the section with its corresponding row table (see below)
  • sh_info - additional information depending on the type of section, set to 0
  • sh_addralign - address alignment, do not touch
  • sh_entsize - if the section consists of several elements of the same length, indicates the length of such an element, do not touch

After filling out the header, you need to create a section data descriptor with the elf_newdata function:
Elf_Data * elf_newdata( 
    Elf_Scn *scn);

  • scn - just received pointer to a new section.

The function returns 0 on error, or a pointer to the Elf_Data structure that will need to be filled:
typedef struct { 
    void*		d_buf; 
    Elf_Type		d_type; 
    size_t		d_size; 
    off_t		d_off; 
    size_t		d_align; 
    unsigned		d_version; 
} Elf_Data;

  • d_buf - pointer to the data to be written to the section
  • d_type - data type, ELF_T_BYTE is suitable for us everywhere
  • d_size - data size
  • d_off - offset in section, set to 0
  • d_align - alignment, can be set to 1 - without alignment
  • d_version - version, must be set to EV_CURRENT

Special sections

For our purposes, we will need to create the minimum necessary set of sections:
  • .text - section with program code
  • .symtab - file character table
  • .strtab is a string table containing the names of characters from the .symtab section, since the latter does not store the names themselves, but their indices
  • .shstrtab - row table containing section names

All sections are created as described in the previous section, but each special section has its own characteristics.

Section .text

This section contains executable code, so you need to set sh_type to SHT_PROGBITS, sh_flags to SHF_EXECINSTR + SHF_ALLOC, sh_addr to set the address to which this code will be loaded
Section .symtab

The section contains a description of all the symbols (functions) of the program and the files in which they were described. It consists of such elements of 16 bytes in length:
typedef struct { 
    Elf32_Word		st_name; 
    Elf32_Addr		st_value; 
    Elf32_Word		st_size; 
    unsigned char	st_info; 
    unsigned char	st_other; 
    Elf32_Half		st_shndx; 
} Elf32_Sym;

  • st_name - symbol name (index in the table of strings .strtab)
  • st_value - value (input address for the function or 0 for the file). Since the Cortex-M3 has a Thumb-2 command system, this address must be odd (real address + 1)
  • st_size - function code length (0 for file)
  • st_info - type of symbol and its scope. There is a macro to determine the value of this field.
    #define ELF32_ST_INFO(b,t) (((b)<<4)+((t)&0xf))

    where b is the scope, and t is the type of character.
    The scope can be STB_LOCAL (the character is not visible from other object files) or STB_GLOBAL (visible). For simplification we use STB_GLOBAL.
    Character type - STT_FUNC for function, STT_FILE for file
  • st_other - set to 0
  • st_shndx - the index of the section for which the character is defined (index of the .text section), or SHN_ABS for the file.
    The section index by its scn descriptor can be determined using elf_ndxscn:
    size_t elf_ndxscn( 
        Elf_Scn *scn);

Data for the section can be collected when passing through the source text into an array, the pointer to which is then written to the section data descriptor (d_buf).
This section is created in the usual way, only sh_type needs to be set to SHT_SYMTAB, and the index of the .strtab section is written in the sh_link field, so these sections become connected.

Section .strtab

This section contains the names of all characters from the .symtab section. It is created as a regular section, but sh_type must be set to SHT_STRTAB, sh_flags to SHF_STRINGS, so this section becomes a table of rows.
Data for the section can be collected when passing through the source text into an array, the pointer to which is then written to the section data descriptor (d_buf).

Section .shstrtab

Section - a table of lines, contains the headers of all sections of the file, including its own title. It is created in the same way as the .strtab section. After creating its index, write it in the e_shstrndx field of the file header.

Row tables

The row tables contain consecutive rows ending with a zero byte, the first byte in this table should also be 0. The row index in the table is just the offset in bytes from the beginning of the table, so the first row 'name' has index 1, the next row ' var 'has an index of 6.
Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
       \ 0 name \ 0 var \ 0 

File record

So, the headers and sections are already formed, now they need to be written to a file and exit libelf. The record is produced by the elf_update function:
off_t elf_update( 
    Elf *elf, 
    Elf_Cmd cmd);

  • elf - handle
  • cmd - command, must be equal to ELF_C_WRITE for writing.

The function returns -1 on error. The error text can be obtained by calling the elf_errmsg (-1) function, which will return a pointer to the error string.
We finish working with the library with the elf_end function, to which we pass our descriptor. It remains only to close the previously opened file.
However, our created file does not contain debugging information, which we will add in the next section.

Creating DWARF

We will create debugging information using the libdwarf library  , which comes with a pdf-file with documentation (libdwarf2p.1.pdf - A Producer Library Interface to DWARF). 
Creating debugging information consists of the following steps:
  1. Initializing libdwarf producer
  2. Creating Nodes (DIE - Debugging Information Entry)
  3. Creating Node Attributes
  4. Creating a Compilation Unit
  5. Creating a Common Information Entry
  6. Creating data types
  7. Creation of procedures (functions)
  8. Creating Variables and Constants
  9. Creating sections with debug information
  10. Finishing work with the library

Consider the steps in more detail.

Initializing libdwarf producer

We will create debugging information during compilation at the same time as creating the characters in the .symtab section, so we need to initialize the library after initializing libelf, creating the ELF-header and the program header, before creating the sections.
For initialization, we will use the dwarf_producer_init_c function. There are several more initialization functions in the library (dwarf_producer_init, dwarf_producer_init_b), which differ in some of the nuances described in the documentation. In principle, you can use any of them.

Dwarf_P_Debug dwarf_producer_init_c( 
    Dwarf_Unsigned flags, 
    Dwarf_Callback_Func_c func, 
    Dwarf_Handler errhand, 
    Dwarf_Ptr errarg, 
    void * user_data, 
    Dwarf_Error *error)

  • flags — комбинация по «или» нескольких констант которые определяют некоторые параметры, например разрядность информации, следование байтов (little-endian, big-endian), формат релокаций, из которых нам обязательно нужны DW_DLC_WRITE и DW_DLC_SYMBOLIC_RELOCATIONS
  • func — callback-функция, которая будет вызываться при создании ELF-секций с отладочной информацией. Более подробно см. ниже в разделе «Создание секций с отладочной информацией»
  • errhand — указатель на функцию, которая будет вызываться при возникновении ошибок. Можно передать 0
  • errarg — данные, которые будут передаваться в функцию errhand, можно ставить 0
  • user_data — данные, которые будут переданы в функцию func, можно ставить 0
  • error — возвращаемый код ошибки

The function returns Dwarf_P_Debug - the descriptor used in all subsequent functions, or -1 in case of an error, and there will be an error code in error (you can get the text of the error message by its code using the dwarf_errmsg function, passing this code to it)

Creating Nodes (DIE - Debugging Information Entry)

As described above, debugging information forms a tree structure. In order to create a node of this tree, you need:
  • create it with the dwarf_new_die function
  • add attributes to it (each type of attribute is added by its function, which will be described later)

A node is created using the dwarf_new_die function:
Dwarf_P_Die dwarf_new_die( 
    Dwarf_P_Debug dbg, 
    Dwarf_Tag new_tag, 
    Dwarf_P_Die parent, 
    Dwarf_P_Die child, 
    Dwarf_P_Die left_sibling, 
    Dwarf_P_Die right_sibling, 
    Dwarf_Error *error)

  • dbg - Dwarf_P_Debug descriptor received during library initialization
  • new_tag - tag (type) of the node - constant DW_TAG_xxxx, which can be found in the file libdwarf.h
  • parent, child, left_sibling, right_sibling - respectively, the parent, child, left and right neighbors of the node. It is not necessary to specify all of these parameters; it is enough to specify one, instead of the rest set 0. If all parameters are 0, the node will be either root or isolated
  • error - will contain an error code when it occurs

The function returns DW_DLV_BADADDR on error or the handle to the Dwarf_P_Die node if successful

Creating Node Attributes

There are a whole family of dwarf_add_AT_xxxx functions for creating node attributes. Sometimes it’s problematic to determine which function you want to create the required attribute, so I even delved into the library source code several times. Some of the functions will be described here, some below in the corresponding sections. They all take the ownerdie parameter, the handle to the node to which the attribute will be added, and return an error code in the error parameter.
The dwarf_add_AT_name function adds the name attribute (DW_AT_name) to the node. Most nodes should have a name (for example, procedures, variables, constants), some may not have a name (for example, Compilation Unit)
Dwarf_P_Attribute dwarf_add_AT_name( 
    Dwarf_P_Die ownerdie, 
    char *name, 
    Dwarf_Error *error)

  • name - attribute value itself (node ​​name)

Returns DW_DLV_BADADDR in case of error or attribute descriptor on success.
The functions dwarf_add_AT_signed_const, dwarf_add_AT_unsigned_const add the specified attribute and its signed (unsigned) value to the node. Signed and unsigned attributes are used to set constant values, sizes, line numbers, etc. Function Format:
Dwarf_P_Attribute dwarf_add_AT_(un)signed_const( 
    Dwarf_P_Debug dbg, 
    Dwarf_P_Die ownerdie, 
    Dwarf_Half attr, 
    Dwarf_Signed value, 
    Dwarf_Error *error)

  • dbg - Dwarf_P_Debug descriptor received during library initialization
  • attr - the attribute whose value is set, is the constant DW_AT_xxxx, which can be found in the file libdwarf.h
  • value - attribute value

Return DW_DLV_BADADDR in case of an error or attribute descriptor on success.

Creating a Compilation Unit

There should be a root in any tree - we have a compilation unit that contains information about the program (for example, the name of the main file, the programming language used, the name of the compiler, case sensitivity of characters (variables, functions), the main function of the program, the starting address and. etc). In principle, no attributes are required. For example, create information about the main file and the compiler.

Main File Information

To store information about the main file, the “name” attribute (DW_AT_name) is used; use the dwarf_add_AT_name function, as shown in the “Creating Node Attributes” section.

Compiler Information

We use the dwarf_add_AT_producer function:
Dwarf_P_Attribute dwarf_add_AT_name( 
    Dwarf_P_Die ownerdie, 
    char *producer_string, 
    Dwarf_Error *error)

  • producer_string - a string with information text

Returns DW_DLV_BADADDR in case of error or attribute descriptor on success.

Creating a Common Information Entry

Usually, when a function (subroutine) is called, its parameters and return address are pushed onto the stack (although each compiler can do it in its own way), all this is called Call Frame. The debugger needs information about the frame format in order to correctly determine the return address from the function and build a backtrace - a chain of function calls that brought us to the current function, and the parameters of these functions. Processor registers that are stored on the stack are also usually indicated. The code that reserves the space on the stack and saves the processor registers is called the prologue of the function, the code that restores the registers and the stack is the epilogue.
This information is highly dependent on the compiler. For example, prologue and epilogue need not be at the very beginning and end of a function; sometimes a frame is used, sometimes not; processor registers can be stored in other registers, etc.
So, the debugger needs to know how the processor registers change their value and where they will be stored when entering the procedure. This information is called Call Frame Information - frame format information. For each address in the program (containing the code), the address of the frame in memory (Canonical Frame Address - CFA) and information about the processor registers are indicated, for example, you can specify that:
  • the register is not stored in the procedure
  • register does not change its value in the procedure
  • the register is stored on the stack at CFA + n
  • register is stored in another register
  • the register is stored in memory at some address, which can be calculated in a rather non-obvious way
  • etc.

Since the information should be indicated for each address in the code, it is very voluminous and stored in a compressed form in the .debug_frame section. Since it changes little from address to address, only its changes are encoded in the form of instructions DW_CFA_xxxx. Each instruction indicates one change, for example:
  • DW_CFA_set_loc - indicates the current address in the program
  • DW_CFA_advance_loc - Moves a pointer to a certain number of bytes
  • DW_CFA_def_cfa - indicates the address of the stack frame (numeric constant)
  • DW_CFA_def_cfa_register - indicates the address of the stack frame (taken from the processor register)
  • DW_CFA_def_cfa_expression - specifies how to calculate the address of the stack frame
  • DW_CFA_same_value - indicates that the case is not changed
  • DW_CFA_register - indicate that the register is stored in another register
  • etc.

The elements of the .debug_frame section are records that can be of two types: Common Information Entry (CIE) and Frame Description Entry (FDE). CIE contains information that is common to many FDE entries, roughly speaking it describes a certain type of procedure. FDEs describe each specific procedure. When entering the procedure, the debugger first executes the instructions from CIE, and then from FDE.
My compiler creates procedures in which the CFA is in the sp (r13) register. Create a CIE for all procedures. There is a dwarf_add_frame_cie function for this:
Dwarf_Unsigned dwarf_add_frame_cie( 
    Dwarf_P_Debug dbg, 
    char *augmenter, 
    Dwarf_Small code_align, 
    Dwarf_Small data_align, 
    Dwarf_Small ret_addr_reg, 
    Dwarf_Ptr init_bytes, 
    Dwarf_Unsigned init_bytes_len, 
    Dwarf_Error *error);

  • augmenter is a UTF-8 encoded string whose presence indicates that there is additional platform-specific information to CIE or FDE. We put an empty string
  • code_align - alignment of the code in bytes (we have 2)
  • data_align - data alignment in the frame (set to -4, which means all parameters occupy 4 bytes on the stack and it grows in memory down)
  • ret_addr_reg - a register containing the return address from the procedure (we have 14)
  • init_bytes - an array containing the DW_CFA_xxxx instructions. Unfortunately, there is no convenient way to generate this array. You can create it manually or spy it in an elf file that was generated by the C compiler, which I did. For my case, it contains 3 bytes: 0x0C, 0x0D, 0, which stands for DW_CFA_def_cfa: r13 ofs 0 (CFA is in the register r13, the offset is 0)
  • init_bytes_len - length of the init_bytes array

The function returns DW_DLV_NOCOUNT on error or a CIE handle that should be used when creating FDE for each procedure, which we will discuss later in the section “Creating an FDE Procedure”

Creating data types

Before creating procedures and variables, you must first create nodes that correspond to data types. There are many types of data, but all of them are based on basic types (elementary types like int, double, etc.), other types are built from basic ones.
The base type is a node with the tag DW_TAG_base_type. It must have attributes:
  • "Name" (DW_AT_name)
  • “Encoding” (DW_AT_encoding) - means exactly what data describes this basic type (for example, DW_ATE_boolean - logical, DW_ATE_float - floating point, DW_ATE_signed - integer signed, DW_ATE_unsigned - integer unsigned, etc.)
  • "Size" (DW_AT_byte_size - size in bytes or DW_AT_bit_size - size in bits)

A node may also contain other optional attributes.
For example, to create a 32-bit integer signed base type “int”, we will need to create a node with the tag DW_TAG_base_type and set attributes DW_AT_name - “int”, DW_AT_encoding - DW_ATE_signed, DW_AT_byte_size - 4.
After creating the base types, you can create derivatives from them . Such nodes must contain the attribute DW_AT_type - a link to their base type. For example, a pointer to an int - a node with the tag DW_TAG_pointer_type should contain a link to the previously created type "int" in the attribute DW_AT_type.
An attribute with a link to another node is created by the dwarf_add_AT_reference function:
Dwarf_P_Attribute dwarf_add_AT_reference( 
    Dwarf_P_Debug dbg, 
    Dwarf_P_Die ownerdie, 
    Dwarf_Half attr, 
    Dwarf_P_Die otherdie, 
    Dwarf_Error *error)

  • attr is an attribute, in this case DW_AT_type
  • otherdie - the handle of the type node referenced

Creating Procedures

To create procedures, I need to clarify another type of debugging information - Line Number Information. It serves to map each machine instruction to a specific line of source code and also to enable line-by-line debugging of a program. This information is stored in the .debug_line section. If we had enough space, then it would be stored in the form of a matrix, one row for each instruction with the following columns:
  • source file name
  • line number in this file
  • column number in file
  • whether the instruction is the beginning of an operator or block of statements
  • etc.

Such a matrix would be very large, so it has to be compressed. Firstly, duplicate rows are deleted, and secondly, not the rows themselves are saved, but only changes in them. These changes look like commands for a finite state machine, and the information itself is already considered a program that will be “executed” by this state machine. The commands of this program look, for example, like this: DW_LNS_advance_pc - advance the command counter to some address, DW_LNS_set_file - set the file in which the procedure is defined, DW_LNS_const_add_pc - advance the command counter by several bytes, etc.
It’s difficult to create this information at such a low level, so the libdwarf library has several functions that make this task easier.
It is expensive to store the file name for each instruction, therefore, instead of the name, its index is stored in a special table. To create a file index, you need to use the dwarf_add_file_decl function:
Dwarf_Unsigned dwarf_add_file_decl( 
    Dwarf_P_Debug dbg, 
    char *name, 
    Dwarf_Unsigned dir_idx, 
    Dwarf_Unsigned time_mod, 
    Dwarf_Unsigned length, 
    Dwarf_Error *error)

  • name - file name
  • dir_idx - index of the folder where the file is located. The index can be obtained using the dwarf_add_directory_decl function. If full paths are used, you can set 0 as the index of the folder and not use dwarf_add_directory_decl at all
  • time_mod - file modification time, you can omit (0)
  • length - file size, also optional (0)

The function will return the file index or DW_DLV_NOCOUNT on error.
To create information about line numbers, there are three functions dwarf_add_line_entry_b, dwarf_lne_set_address, dwarf_lne_end_sequence, which we will consider below.
Creating debugging information for the procedure takes place in several stages:
  • creating a procedure symbol in the .symtab section
  • creating a procedure node with attributes
  • creating FDE procedures
  • creating procedure parameters
  • creating line number information

Creating a procedure symbol

A procedure symbol is created as described above in the ".symtab Section" section. In it, the procedure symbols are interspersed with the symbols of the files in which the source code of these procedures is located. First we create a file symbol, then a procedure. In this case, the file becomes the current one, and if the next procedure is in the current file, the file symbol is not needed again.

Creating a procedure node with attributes

First we create a node using the dwarf_new_die function (see the “Creating Nodes” section), specifying DW_TAG_subprogram as a tag, and using the Compilation Unit (if it is a global procedure) or the corresponding DIE (if local) as a parent. Next, create the attributes:
  • procedure name (dwarf_add_AT_name function, see “Creating Node Attributes”)
  • line number in the file where the procedure code begins (attribute DW_AT_decl_line), dwarf_add_AT_unsigned_const function (see “Creating Attributes of a Node”)
  • file name index (attribute DW_AT_decl_file), dwarf_add_AT_unsigned_const function (see "Creating Node Attributes")
  • starting address of the procedure (attribute DW_AT_low_pc), function dwarf_add_AT_targ_address, see below
  • final address of the procedure (attribute DW_AT_high_pc), function dwarf_add_AT_targ_address, see below
  • the type of the result returned by the procedure (attribute DW_AT_type - a link to a previously created type, see "Creating data types"). If the procedure returns nothing, you do not need to create this attribute

The DW_AT_low_pc and DW_AT_high_pc attributes must be created with the dwarf_add_AT_targ_address_b function specially designed for this:
Dwarf_P_Attribute dwarf_add_AT_targ_address_b( 
    Dwarf_P_Debug dbg, 
    Dwarf_P_Die ownerdie, 
    Dwarf_Half attr, 
    Dwarf_Unsigned pc_value, 
    Dwarf_Unsigned sym_index, 
    Dwarf_Error *error)

  • attr - attribute (DW_AT_low_pc or DW_AT_high_pc)
  • pc_value - address value
  • sym_index - index of the procedure symbol in the .symtab table. Optional, can pass 0

The function will return DW_DLV_BADADDR on error.

Creating FDE Procedures

As mentioned in the section “Creating a Common Information Entry”, for each procedure, you need to create a frame descriptor, which happens in several stages:
  • creating a new FDE (see Creating a Common Information Entry)
  • join created FDE to the general list
  • adding instructions to the created FDE

You can create a new FDE using the dwarf_new_fde function:
Dwarf_P_Fde dwarf_new_fde( 
    Dwarf_P_Debug dbg, 
    Dwarf_Error *error)

The function will return a new FDE descriptor or DW_DLV_BADADDR on error.
You can attach a new FDE to the list using dwarf_add_frame_fde:
Dwarf_Unsigned dwarf_add_frame_fde( 
    Dwarf_P_Debug dbg, 
    Dwarf_P_Fde fde, 
    Dwarf_P_Die die, 
    Dwarf_Unsigned cie, 
    Dwarf_Addr virt_addr, 
    Dwarf_Unsigned code_len, 
    Dwarf_Unsigned sym_idx, 
    Dwarf_Error* error)

  • fde - the descriptor just received
  • die - DIE procedures (see Creating a procedure node with attributes)
  • cie - CIE descriptor (see Creating a Common Information Entry)
  • virt_addr - the starting address of our procedure
  • code_len - procedure length in bytes
  • sym_idx - symbol index (optional, you can specify 0)

The function will return DW_DLV_NOCOUNT on error.
After all this, you can add the DW_CFA_xxxx instructions to our FDE. This is done with the dwarf_add_fde_inst and dwarf_fde_cfa_offset functions. The first one adds the given instruction to the list:
Dwarf_P_Fde dwarf_add_fde_inst( 
    Dwarf_P_Fde fde, 
    Dwarf_Small op, 
    Dwarf_Unsigned val1, 
    Dwarf_Unsigned val2, 
    Dwarf_Error *error)

  • fde - descriptor created by FDE
  • op - instruction code (DW_CFA_xxxx)
  • val1, val2 - instruction parameters (different for each instruction, see Standard, Section 6.4.2 Call Frame Instructions)

The dwarf_fde_cfa_offset function adds the DW_CFA_offset statement:
Dwarf_P_Fde dwarf_fde_cfa_offset( 
    Dwarf_P_Fde fde, 
    Dwarf_Unsigned reg, 
    Dwarf_Signed offset, 
    Dwarf_Error *error)

  • fde - descriptor created by FDE
  • reg - a register that is written to the frame
  • offset - its offset in the frame (not in bytes, but in the frame elements, see Creating a Common Information Entry, data_align)

For example, the compiler creates a procedure in the prolog of which the register lr (r14) is stored in the stack frame. The first step is to add the DW_CFA_advance_loc instruction with the first parameter equal to 1, which means advance the pc register by 2 bytes (see Creating a Common Information Entry, code_align), then add DW_CFA_def_cfa_offset with parameter 4 (setting the data offset in the frame by 4 bytes) and call function dwarf_fde_cfa_offset with parameter reg = 14 offset = 1, which means writing register r14 to a frame with an offset of -4 bytes from CFA.

Creating Procedure Parameters

Creating procedure parameters is similar to creating ordinary variables, see “Creating Variables and Constants”

Creating line number information

The creation of this information is as follows:
  • at the beginning of the procedure, we start the instruction block with the dwarf_lne_set_address function
  • for each line of code (or machine instruction) we create information about the source code (dwarf_add_line_entry)
  • at the end of the procedure, complete the instruction block with the dwarf_lne_end_sequence function

The dwarf_lne_set_address function sets the address where the instruction block begins:
Dwarf_Unsigned dwarf_lne_set_address( 
    Dwarf_P_Debug dbg, 
    Dwarf_Addr offs, 
    Dwarf_Unsigned symidx, 
    Dwarf_Error *error)

  • offs - address of the procedure (address of the first machine instruction)
  • sym_idx - symbol index (optional, you can specify 0)

Returns 0 (success) or DW_DLV_NOCOUNT (error).
The dwarf_add_line_entry_b function adds source line information to the .debug_line section. I call this function for each machine instruction:
Dwarf_Unsigned dwarf_add_line_entry_b( 
    Dwarf_P_Debug dbg, 
    Dwarf_Unsigned file_index, 
    Dwarf_Addr code_offset, 
    Dwarf_Unsigned lineno, 
    Dwarf_Signed column_number, 
    Dwarf_Bool is_source_stmt_begin, 
    Dwarf_Bool is_basic_block_begin, 
    Dwarf_Bool is_epilogue_begin, 
    Dwarf_Bool is_prologue_end, 
    Dwarf_Unsigned isa, 
    Dwarf_Unsigned discriminator, 
    Dwarf_Error *error)

  • file_index - index of the source code file obtained earlier by the dwarf_add_file_decl function (see "Creating Procedures")
  • code_offset - address of the current machine instruction
  • lineno - line number in the source code file
  • column_number — номер колонки в файле исходного кода
  • is_source_stmt_begin — 1 если текущая инструкция первая в коде в строке lineno (я всегда использую 1)
  • is_basic_block_begin — 1 если текущая инструкция первая в блоке операторов (я всегда использую 0)
  • is_epilogue_begin — 1 если текущая инструкция первая в эпилоге процедуры (не обязательно, у меня всегда 0)
  • is_prologue_end — 1 если текущая инструкция последняя в прологе процедуры (обязательно!)
  • isa — instruction set architecture (архитектура набора команд). Обязательно надо указать DW_ISA_ARM_thumb для ARM Cortex M3!
  • discriminator. Одна позиция (файл, строка, колонка) исходного кода может отвечать разным машинным инструкциям. В таком случае для наборов таких инструкций нужно устанавливать разные дискриминаторы. Если таких случаев нет, должен быть 0

The function returns 0 (success) or DW_DLV_NOCOUNT (error).
Finally, the dwarf_lne_end_sequence function completes the procedure:
Dwarf_Unsigned dwarf_lne_end_sequence( 
    Dwarf_P_Debug dbg, 
    Dwarf_Addr address; 
    Dwarf_Error *error)

  • address - address of the current machine instruction

Returns 0 (success) or DW_DLV_NOCOUNT (error).
This completes the creation of the procedure.

Creating Variables and Constants

In general, the variables are pretty simple. They have a name, a piece of memory (or processor register), where their data is located, as well as the type of this data. If the variable is global, its parent should be the Compilation Unit, if local - the corresponding node (this is especially true for the parameters of the procedures, their procedure must be the parent). You can also specify in which file, line and column the variable declaration is located.
In the simplest case, the value of a variable is located at some fixed address, but many variables are dynamically created when entering the procedure on the stack or register, sometimes the calculation of the address of the value can be very nontrivial. The standard provides a mechanism for describing where the value of a variable is located - address expressions (location expressions). An address expression is a set of instructions (DW_OP_xxxx constants) for a fort-like stack machine; in fact, it is a separate language with branches, procedures, and arithmetic operations. We will not review this language completely, we will actually be interested in only a few instructions:
  • DW_OP_addr - indicates the address of the variable
  • DW_OP_fbreg - indicates the offset of the variable from the base register (usually the stack pointer)
  • DW_OP_reg0 ... DW_OP_reg31 - indicates that the variable is stored in the corresponding register

To create an address expression, you must first create an empty expression (dwarf_new_expr), add instructions (dwarf_add_expr_addr, dwarf_add_expr_gen, etc.) to it and add it to the node as the value of the DW_AT_location attribute (dwarf_add_AT_location_expression).
The function to create an empty address expression returns its handle or 0 on error:
Dwarf_Expr dwarf_new_expr( 
    Dwarf_P_Debug dbg, 
    Dwarf_Error *error)

To add instructions to an expression, you need to use the dwarf_add_expr_gen function:
Dwarf_Unsigned dwarf_add_expr_gen( 
    Dwarf_P_Expr expr, 
    Dwarf_Small opcode, 
    Dwarf_Unsigned val1, 
    Dwarf_Unsigned val2, 
    Dwarf_Error *error)

  • expr - descriptor of the address expression into which the instruction is added
  • opcode - operation code, constant DW_OP_xxxx
  • val1, val2 - instruction parameters (see Standard)

The function returns DW_DLV_NOCOUNT on error.
To explicitly set the address of the variable, the dwarf_add_expr_addr function should be used instead of the previous one:
Dwarf_Unsigned dwarf_add_expr_addr( 
    Dwarf_P_Expr expr, 
    Dwarf_Unsigned address, 
    Dwarf_Signed sym_index, 
    Dwarf_Error *error)

  • expr - descriptor of the address expression into which the instruction is added
  • address - address of the variable
  • sym_index - the index of the character in the .symtab table. Optional, can pass 0

The function also returns DW_DLV_NOCOUNT on error.
Finally, you can add the created address expression to the node using the dwarf_add_AT_location_expr function:
Dwarf_P_Attribute dwarf_add_AT_location_expr( 
    Dwarf_P_Debug dbg, 
    Dwarf_P_Die ownerdie, 
    Dwarf_Half attr, 
    Dwarf_P_Expr loc_expr, 
    Dwarf_Error *error)

  • ownerdie - the node to which the expression is added
  • attr - attribute (in our case DW_AT_location)
  • loc_expr - handle to previously created address expression

The function returns an attribute descriptor or DW_DLV_NOCOUNT on error.
Variables (as well as procedure parameters) and constants are regular nodes with the tags DW_TAG_variable, DW_TAG_formal_parameter and DW_TAG_const_type, respectively. They need the following attributes:
  • variable / constant name (dwarf_add_AT_name function, see “Creating Node Attributes”)
  • line number in the file where the variable is declared (DW_AT_decl_line attribute), dwarf_add_AT_unsigned_const function (see “Creating Node Attributes”)
  • file name index (attribute DW_AT_decl_file), dwarf_add_AT_unsigned_const function (see "Creating Node Attributes")
  • variable / constant data type (attribute DW_AT_type - link to a previously created type, see "Creating data types")
  • address expression (see above) - needed for a variable or procedure parameter
  • or value - for a constant (attribute DW_AT_const_value, see "Creating Node Attributes")

Creating sections with debug information

After creating all the nodes of the debugging information tree, you can proceed to the formation of elf-sections with it. This happens in two stages:
  • first you need to call the dwarf_transform_to_disk_form function, which will call the function we wrote to create the necessary elf sections once for each section
  • for each section, the dwarf_get_section_bytes function will return to us the data that will need to be written to the corresponding section

dwarf_transform_to_disk_form ( 
    Dwarf_P_Debug dbg, 
    Dwarf_Error* error)

translates the debugging information that we created into a binary format, but writes nothing to disk. It will return us the number of created elf sections or DW_DLV_NOCOUNT on error. In this case, for each section, a callback function will be called, which we passed when initializing the library to the dwarf_producer_init_c function. We must write this function ourselves. Its specification is as follows:
typedef int (*Dwarf_Callback_Func_c)( 
    char* name, 
    int size, 
    Dwarf_Unsigned type, 
    Dwarf_Unsigned flags, 
    Dwarf_Unsigned link, 
    Dwarf_Unsigned info, 
    Dwarf_Unsigned* sect_name_index, 
    void * user_data, 
    int* error)

  • name - the name of the elf section to be created
  • size - section size
  • type - section type
  • flags - section flags
  • link - section communication field
  • info - section information field
  • sect_name_index - you need to return the index of the section with relocation (optional)
  • user_data - passed to us the same as we set it in the library initialization function
  • error - here you can pass the error code

In this function, we must:
  • create a new section (elf_newscn function, see Creating sections)
  • create a section header (elf32_getshdr function, ibid.)
  • fill it in correctly (see ibid.). This is simple, since the header fields of the section correspond to the parameters of our function. Missing fields sh_addr, sh_offset, sh_entsize set to 0, and sh_addralign to 1
  • return the index of the created section (elf_ndxscn function, see “Section .symtab”) or -1 on error (setting the error code in error)
  • also, we should skip the ".rel" section (in our case), returning 0 when returning from the function

After completion, the dwarf_transform_to_disk_form function will return us the number of sections created. We will need to go through a cycle from 0 for each section, following these steps:
  • create data for writing to the section with the dwarf_get_section_bytes function:
    Dwarf_Ptr dwarf_get_section_bytes( 
        Dwarf_P_Debug dbg, 
        Dwarf_Signed dwarf_section, 
        Dwarf_Signed *elf_section_index, 
        Dwarf_Unsigned *length, 
        Dwarf_Error* error)

    • dwarf_section - section number. Must be in the range 0..n, where n is the number returned to us by the dwarf_transform_to_disk_form function
    • elf_section_index - Returns the index of the section in which to write data
    • length - the length of this data
    • error - not used

    The function returns a pointer to the received data or 0 (in the case
    when there are no more sections to create)
  • create a data descriptor for the current section (elf_newdata function, see Creating sections) and fill it (see ibid.) by setting:
    • d_buf - pointer to the data received by us from the previous function
    • d_size - the size of this data (ibid.)

Finishing work with the library

After the sections are formed, you can end the libdwarf function dwarf_producer_finish:
Dwarf_Unsigned dwarf_producer_finish( 
    Dwarf_P_Debug dbg, 
    Dwarf_Error* error)

The function returns DW_DLV_NOCOUNT on error.
I note that writing to disk at this stage is not performed. Recording must be done using the functions in the “Creating ELF - File Recording” section.


That's all.
I repeat, the creation of debugging information is a very extensive topic, and I did not touch on many topics, only opening the curtain. Those who wish can go on ad infinitum.
If you have questions, I will try to answer them.




Also popular now: