kahi4 March 13, 2016 at 21:07

C ++ exception handling under the hood. Part 3

Transfer

Continuing the translation of a series of articles on exception handling in C ++

1 part
2 part

C ++ exceptions under the hood: finding the right landing pad

This is the 15th chapter in our long history. We have already learned a lot about how exceptions work, and even have a written working own personal function with a small amount of reflection that determines where the catch block is (landing pad in terms of exceptions). In the last chapter, we wrote a personal function that can handle exceptions, but it always substitutes only the first landing pad (i.e. the first catch block). Let's improve our personal function by adding the ability to choose the right landing pad in a function with several catch blocks.

Following the TDD (test driven development) mod, we can first build a test of our ABI. We will improve our program, throw.cpp, make a few try / catch blocks:

#include 
#include "throw.h"
struct Fake_Exception {};
void raise() {
    throw Exception();
}
void try_but_dont_catch() {
    try {
        printf("Running a try which will never throw.\n");
    } catch(Fake_Exception&) {
        printf("Exception caught... with the wrong catch!\n");
    }
    try {
        raise();
    } catch(Fake_Exception&) {
        printf("Caught a Fake_Exception!\n");
    }
    printf("try_but_dont_catch handled the exception\n");
}
void catchit() {
    try {
        try_but_dont_catch();
    } catch(Fake_Exception&) {
        printf("Caught a Fake_Exception!\n");
    } catch(Exception&) {
        printf("Caught an Exception!\n");
    }
    printf("catchit handled the exception\n");
}
extern "C" {
    void seppuku() {
        catchit();
    }
}

Before testing, try to think about what will happen during the launch of this test? Focus on the try_but_dont_catch function: the first try / catch block never throws an exception, the second one throws without catching it. As long as our ABI is a little dull, the first catch block will handle the exception of the second block. But what happens after the first catch is processed? The execution will continue from the place where the first catch / try ends, it is again right before the second try / catch block, which will throw an exception again, the first handler will process it again, and so on. Endless cycle! Well, we got a very complicated while (true) again!

We use our knowledge of the start / length fields in the LSDA table to correctly select our landing pad. To do this, we need to know what the IP was when the exception was thrown, and we can figure it out with the Unwind function we already know : _Unwind_GetIP . In order to understand what _Unwind_GetIP returns, let's look at an example:

void f1() {}
void f2() { throw 1; }
void f3() {}
void foo() {
L1:
    try{ f1(); } catch(...) {}
L2:
    try{ f2(); } catch(...) {}
L3:
    try{ f3(); } catch(...) {}
}

In this case, our personal function will be called in the catch-block for f2, and the stack will look something like this:

+------------------------------+
|   IP: f2  stack frame: f2    |
+------------------------------+
|   IP: L3  stack frame: foo   |
+------------------------------+

Note that IP will be set to L3, although an exception is thrown in L2. This is because IP indicates the next instruction that should have been executed. This also means that we must subtract one if we want to get the IP where the exception was thrown, otherwise the result from _Unwind_GetIP may be outside the landing pad. Back to our personal function:

_Unwind_Reason_Code __gxx_personality_v0 (
                             int version,
                             _Unwind_Action actions,
                             uint64_t exceptionClass,
                             _Unwind_Exception* unwind_exception,
                             _Unwind_Context* context)
{
    if (actions & _UA_SEARCH_PHASE)
    {
        printf("Personality function, lookup phase\n");
        return _URC_HANDLER_FOUND;
    } else if (actions & _UA_CLEANUP_PHASE) {
        printf("Personality function, cleanup\n");
        // Вычисление -- куда именно указывал IP
        // прямо перед тем, как было выброшено исключение
        uintptr_t throw_ip = _Unwind_GetIP(context) - 1;
        // Указатель на сырой LSDA
        LSDA_ptr lsda = (uint8_t*)_Unwind_GetLanguageSpecificData(context);
        // Чтрение заголовков LSDA
        LSDA_Header header(&lsda);
        // Чтение LSDA CS
        LSDA_CS_Header cs_header(&lsda);
        // Рассчет конца таблицы LSDA CS
        const LSDA_ptr lsda_cs_table_end = lsda + cs_header.length;
        // Цикл по всем записям таблицы CS
        while (lsda < lsda_cs_table_end)
        {
            LSDA_CS cs(&lsda);
            // Если тут нет LP, мы тут не можем обработать исключение, двигаемся дальше
            if (not cs.lp) continue;
            uintptr_t func_start = _Unwind_GetRegionStart(context);
            // Расчет области валидного IP для этого lp
            // Если LP может обрабатывать это исключение, тогда
            // IP для этого фрейма должен быть в этой области
            uintptr_t try_start = func_start + cs.start;
            uintptr_t try_end = func_start + cs.start + cs.len;
            // Проверка: корректный ли этот LP для текущего try блока
            if (throw_ip < try_start) continue;
            if (throw_ip > try_end) continue;
            // Если мы нашли landing pad для этого исключения; продолжаем выполнение
            int r0 = __builtin_eh_return_data_regno(0);
            int r1 = __builtin_eh_return_data_regno(1);
            _Unwind_SetGR(context, r0, (uintptr_t)(unwind_exception));
            // Напомню, что в этом коде напрямую зашит тип исключения;
            // Мы поправим это позже
            _Unwind_SetGR(context, r1, (uintptr_t)(1));
            _Unwind_SetIP(context, func_start + cs.lp);
            break;
        }
        return _URC_INSTALL_CONTEXT;
    } else {
        printf("Personality function, error\n");
        return _URC_FATAL_PHASE1_ERROR;
    }
}
}

As usual: the current example code is here .

Run again and voila! No more endless loops! Simple changes allowed us to choose the right landing pad. Next, we will try to teach our personal function to choose the correct stack frame instead of the first.

C ++ exceptions under the hood: finding the right catch block in the landing pad

We have already written a personal function that can handle functions with more than one landing pad. Now we will try to recognize which particular block can handle certain exceptions, in other words, which catch block to call us.

Of course, figuring out which block can handle the exception is not an easy task. However, did you really expect something else? The main problems right now are:

The first and main: where and how can we find the accepted types of exceptions with this catch-block.
Even if we can find the catch type, how can we handle catch (...)?
For a landing pad with several catch blocks, how can we find out all the possible catch types?
Take a look at an example:

 struct Base {};
 struct Child : public Base {};
 void foo() { throw Child; }
 void bar()
 {
    try { foo(); }
    catch(const Base&){ ... }
 }

We must check not only whether the current Landing Pad can accept the current exception, but also all its parents!

Let's make our task a little easier: we will work with landing pads with only one catch block, and we also say that we don’t have inheritance. However, how do we find the accepted types of landing pad?

In general, this is in the .gcc_except_table part, which we have not yet analyzed: action table. Disassemble on throw.cpp and see what’s right there after the call site table for our "try but dont catch" function:

LLSDACSE1:
    .byte   0x1
    .byte   0
    .align 4
    .long   _ZTI14Fake_Exception
.LLSDATT1:

It does not seem like there is a lot of information, but there is a promising pointer to something that has the name of our exception. Let's look at the definition of _ZTI14Fake_Exception:

_ZTI14Fake_Exception:
    .long   _ZTVN10__cxxabiv117__class_type_infoE+8
    .long   _ZTS14Fake_Exception
    .weak   _ZTS9Exception
    .section    .rodata._ZTS9Exception,"aG",@progbits,_ZTS9Exception,comdat
    .type   _ZTS9Exception, @object
    .size   _ZTS9Exception, 11

We found something very interesting! Can you recognize it? This is std :: type_info for the Fake_Exception structure!

Now we know that there is a way to get a pointer to a kind of reflection for our exception. Can we programmatically find this? Let's see further.

C ++ exceptions under the hood: exception type reflection and reading .gcc_except_table

Now we know where we can get a lot of information about the exception by reading the local data store .gcc_except_table; what we must implement in a personal function to determine the correct landing pad.

We abandoned our ABI implementation and plunged into assembler studies for .gcc_except_table to understand how we can find the types of exceptions that we can handle. We found that part of the table contains a list of types with the information we need. We will read this information in the cleanup phase, but first let's recall the definition of our LSDA header:

struct LSDA_Header {
    uint8_t start_encoding;
    uint8_t type_encoding;
    // Смещение от конца заголовков до таблицы типов
    uint8_t type_table_offset;
};

The last field is new for us: it indicates the offset for the type table. Recall also the definition of each of the calls:

struct LSDA_CS {
    // Смещение в функции откуда мы можем обрабатывать исключение
    uint8_t start;
    // Длина блока, который может обрабатыаться
    uint8_t len;
    // Landing pad
    uint8_t lp;
    // Смещение в action table + 1 (0 означает "нет действий")
    uint8_t action;
};

Look at the last field, "action". This is the offset in the action table. This means that we can find an action for a specific CS (call site). The trick is that for landing pads, in which there are catch blocks, the action contains an offset to the type table, now we can use the offset to get the type table, which we can get from the headers! Stop talking, better look at the code:

// Указатель на начало чистого LSDA
LSDA_ptr lsda = (uint8_t*)_Unwind_GetLanguageSpecificData(context);
// Чтение заголовка LSDA
LSDA_Header header(&lsda);
const LSDA_ptr types_table_start = lsda + header.type_table_offset;
// Чтение LSDA CS
LSDA_CS_Header cs_header(&lsda);
// Рассчет конца таблицы LSDA CS
const LSDA_ptr lsda_cs_table_end = lsda + cs_header.length;
// Получение начала action tables
const LSDA_ptr action_tbl_start = lsda_cs_table_end;
// Первый call site
LSDA_CS cs(&lsda);
// cs.action -- это offset + 1; таким образом cs.action == 0
// означает что тут нет подходящий точек входа
const size_t action_offset = cs.action - 1;
const LSDA_ptr action = action_tbl_start + action_offset;
// Для landing pad с блоком catch the action table
// будет содержать index списка типов
int type_index = action[0];
// types_table_start указывает на конец таблицы, так что
// нам нужно инвентировать type_index. Это позволит найти ptr на
// std::type_info, определенную в нашем catch-блоке
const void* catch_type_info = types_table_start[ -1 * type_index ];
const std::type_info *catch_ti = (const std::type_info *) catch_type_info;
// Если все пойдет правильно, должно вывестить что-то типа Fake_Exception
printf("%s\n", catch_ti->name());

This code looks complicated due to several consecutive indirect addressing before getting the type_info structure, but in practice it doesn’t do anything complicated, it just reads the .gcc_except_table that we found when disassembling.

Deriving an exception type is a big step in the right direction. Also our personal function becomes a little piled up. Most LSDA reading difficulties can be hidden under the carpet, this should not be too expensive (meaning - to be taken out in a separate function).

Further we will learn to compare the type of exception handled with the type of thrown.

C ++ exceptions under the hood: getting the right stack frame

Our latest version of the personal function knows where the information is stored about whether this exception can be handled or not (though it only works for one catch block in one try / catch block, and also without inheritance), but to make it useful - First, we’ll learn to check whether an exception is of the type that we can handle.

Of course, first we need to find out the type of exception. To do this, we need to record it when __cxa_throw is called :

void __cxa_throw(void* thrown_exception,
                 std::type_info *tinfo,
                 void (*dest)(void*))
{
    __cxa_exception *header = ((__cxa_exception *) thrown_exception - 1);
    // Мы должны сохранять тип в заголовке исключения, который получит _Unwind_
    // иначе мы не сможет получить его в процессе раскрутки
    header->exceptionType = tinfo;
    _Unwind_RaiseException(&header->unwindHeader);
}

And now we can read the type of the exception in our personal function and just compare the type match (the names of the exceptions are C ++ lines, so a simple "==" is enough):

// Получение доступного для обработки типа исключения
const void* catch_type_info = lsda.types_table_start[ -1 * type_index ];
const std::type_info *catch_ti = (const std::type_info *) catch_type_info;
// Получение типа пробрасываемого исключения
__cxa_exception* exception_header = (__cxa_exception*)(unwind_exception+1) - 1;
std::type_info *org_ex_type = exception_header-&gt;exceptionType;
printf("%s thrown, catch handles %s\n",
            org_ex_type->name(),
            catch_ti->name());
// Проверяем: совпадает ли тип обрабатываемого исключения
// с пробрасываемым
if (org_ex_type->name() != catch_ti->name())
    continue;

Look at the latest changes on the git .

Hmm, of course we have a problem, can you find it yourself? If an exception is thrown in two phases and in the first we want to handle it, the second time we cannot say that we do not want to handle it again. I don’t know, _Unwind is processing this case, there is no documentation about this, most likely there will be undefined behavior, so just saying that we process everything in a row is not enough.

As long as we taught the personal function to find out which landing pad can handle the exception, we lied to Unwindabout which exception can be handled, instead we say that we handle them all in our ABI 9. The truth is that we don’t know if we can handle it. This is just a fix: we can do something like this:

_Unwind_Reason_Code __gxx_personality_v0 (...)
{
    printf("Personality function, searching for handler\n");
    // ...
    foreach (call site entry in lsda)
    {
        if (call site entry.not_good()) continue;
        //  Мы нашли landing pad для данного исключения; продолжаем выполнение
        // Если мы в фазе поиска, говорим _Unwind_, что можем обработать
        if (actions & _UA_SEARCH_PHASE) return _URC_HANDLER_FOUND;
        // если мы не в фазе поиска, тогда в фазе _UA_CLEANUP_PHASE
        /* установка всего необходимого */
        return _URC_INSTALL_CONTEXT;
    }
    return _URC_CONTINUE_UNWIND;
}

What do we get if we launch our personal function? A fall! Who would doubt that. Remember our falling feature? Here is what our exception should catch:

void catchit() {
    try {
        try_but_dont_catch();
    } catch(Fake_Exception&) {
        printf("Caught a Fake_Exception!\n");
    } catch(Exception&) {
        printf("Caught an Exception!\n");
    }
    printf("catchit handled the exception\n");
}

Unfortunately, our personal function only checks the first type of errors that the landing pad can handle. If we remove the Fake_Exception catch block and try again: everything will finally work correctly! Our personal function can select the correct catch block in the correct frame supplied by the try-catch block with a single catch block.

In the next chapter, we will improve it again!

C ++ exceptions under the hood: choosing the right catch from the landing pad

The 19th chapter on exceptions in C ++: we wrote a personal function that can read LSDA, choose the right landing pad, the right stack frame to handle the exception, but still find it difficult to find the correct catch branch. For the final version of the working personal function, we must check the types of exceptions in the entire action table .gcc_except_table .

Remember the action table? Let's look at it again, but now with a few catch blocks:

# Call site table
.LLSDACSB2:
    # Call site 1
    .uleb128 ip_range_start
    .uleb128 ip_range_len
    .uleb128 landing_pad_ip
    .uleb128 (action_offset+1) => 0x3
    # Rest of call site table
# Action table start
.LLSDACSE2:
    # Action 1
    .byte   0x2
    .byte   0
    # Action 2
    .byte   0x1
    .byte   0x7d
    .align 4
    .long   _ZTI9Exception
    .long   _ZTI14Fake_Exception
.LLSDATT2:
# Types table start

If we are going to read all the exceptions supported by the landing pad in this example (this LSDA for the catchit function, by the way), we need to do something like this:

Get the offset of the action from the call table (do not forget, we read offset + 1, and 0 means no action)
Go to action 2 by offset, get an index of type 1. The type table is indexed in reverse order (i.e. we have a pointer to its end and must be accessed using -1 * index)
Go to types_table [-1] to get type_info for Fake_Exception
Fake_Exception is not the exception that was thrown, we get the offset to the next action (action) (0x7d)
Reading 0x7d in uleb128 will return -3, which is three steps back from the position where we are reading the offset from
Reading type with index 2
Getting type_info to throw an Exception, which this time matches throwing, so that we can set the landing pad!

It looks complicated, as long as we again have a lot of indirect addressing, but you can see the final code in the repository . By the link you will find a bonus in the form of a personal function that can read the type table, determine which catch block we need (if the type is null, the block can handle all exceptions in a row). There is a funny side effect: we can only handle errors thrown from C ++ programs.

Finally, we know how exceptions are thrown, how the stack is unwound, how a personal function selects the correct stack frame and which catch block inside the landing pad to choose, but we still have a problem: starting destructors. Well, then we will change our personal function by providing RAII support.

C ++ exceptions under the hood: running destructors in promotion

Our mini-ABI version 11 is capable of almost all of the basic features in exception handling, but it still cannot run the destructor. This is a very important part if we want to write safe code. We know that the required destructors are stored .gcc_except_table, so we need to look at the assembler code a little more.

# Call site table
.LLSDACSB2:
    # Call site 1
    .uleb128 ip_range_start
    .uleb128 ip_range_len
    .uleb128 landing_pad_ip
    .uleb128 (action_offset+1) => 0x3
    # Rest of call site table
# Action table start
.LLSDACSE2:
    # Action 1
    .byte   0
    .byte   0
    # Action 2
    .byte   0x1
    .byte   0x7d
    .align 4
    .long   _ZTI14Fake_Exception
.LLSDATT2:
# Types table start

In a regular landing pad, when the action has a type with an index greater than 0, we can get the index into the type tables and can use it to find the necessary catch block. Otherwise, when the index is 0, we need to run the cleanup code. Even if the landing pad cannot handle exceptions, it is still able to perform cleanup during the promotion. Of course, the landing pad should call _Unwind_Resume after the cleanup is completed in order to continue the promotion process.

I uploaded to my github repositorynew and latest version of the code, but I have bad news: remember our cheating when we said that uleb128 == char? When we started adding code for destructors, offsets in .gcc_except_table became large (by “large” I mean that they are greater than 127) and our trick no longer works.

For the next version, we should rewrite our LSDA reader so that it correctly processes the uleb128 code.

Even in spite of this, we have achieved our goal! They wrote a mini-ABI capable of correctly handling exceptions without the help of the libcxxabi library!

Of course, there is still something to be done, for example, to handle exceptions that are not native to this language, support for compatibility between compilers and linkers. Maybe sometime later ...

C ++ exceptions under the hood: results

After 20 chapters on low-level exception handling, it's time to take stock! What did we learn about how exceptions are thrown and how they are caught?

We leave aside the scary details about reading .gcc_except_table, which is probably the largest part of this article, we can conclude:

The C ++ compiler actually does very little work related to handling exceptions, most of the magic happens in libstdc ++
Here are a few things the compiler does:
- Generates CFI Stack Promotion Information
- It creates something called .gcc_except_table with information about landing pads (try / catch blocks). Part of reflection.
- When we write throw, the compiler translates this into a couple of calls to libstdc ++ that allocate an exception and then start the promotion
When an exception is thrown in runtime, __cxa_throw delegates stack promotion to the libstdc library
In the process of stack promotion, a special function called libstdc ++ (called a personality function, personality routine) is called, which checks each function in the stack to see if it can handle an exception.
If no matching catch block is found, std :: terminate is called.
If found, the stack promotion starts again from the beginning of the stack.
During the second pass, cleaning is performed
The personal function checks .gcc_except_table for the current method. If there is a cleaning action in it (table), a personal function “jumps” to the current stack frame to start cleaning this method
As soon as the unwinder comes across a stack frame (consider a function) that can handle this exception, it jumps to the appropriate catch block
After the catch block is executed, the memory occupied by the exception is cleared.

Having studied in detail how exceptions are handled, we are now able to say why it is so difficult to write exception safe code.

With a cursory glance, exceptions may seem nice and simple, but if you dig a little deeper, as we come across a bunch of difficulties, the program literally begins to delve into itself (reflection), which is not typical for C ++ applications.

Even if we are talking about high-level languages when an exception is thrown, we cannot rely on our understanding of the normal execution of the code: usually it runs linearly with small branches in the form of if and switch statements. With exceptions, everything is different: the code begins to execute in an incomprehensible order, expand the stack, interrupt the execution of functions, and ceases to follow the usual rules. The instruction pointer changes in each landing pad, the stack spins without our control, in general, a lot of magic happens under the hood.

In the end, exceptions are complex because they break our understanding of the natural execution of a program. This does not mean that we are strictly forbidden to use them, but only says that we need to always be careful when working with them!

Tags: