What is so hard about handling C ++ exceptions?

    image
    Exceptions and related stack promotion are one of the nicest techniques in C ++. Exception handling is intuitively consistent with the block structure of the program. Externally, exception handling seems very logical and natural.

    Accurate use of stack objects allows you to create very efficient and safe code, where, unlike systems with garbage collection, the locality of links is preserved, which makes it possible to reduce the number of calls to the system for memory, reduce its fragmentation, and use the memory cache more efficiently.

    However, in C ++, exceptions are traditionally considered literally as exceptions related to error recovery. It is difficult to say whether this is the reason or the consequence that the implementation of exception handling by compilers is extremely expensive. Let's try to figure out why.

    Continued here .
    How are things.

    There are two approaches to implementing exception handling:
    • The first can be described with the words “let the loser pay”. The compiler tries to minimize costs in cases where exceptions do not arise. Ideally, the program does not carry any additional load, all the necessary information is located away from the code in some convenient form for promotion.
    • The second strategy is “little by little and always pay.” In other words, in the process, the program incurs certain costs for maintaining the information necessary for the correct promotion of the stack. Moreover, in case of an exception, the promotion of the stack is cheaper.

    A few examples:
    • GCC / SJLJ . SJLJ is short for setjmp / longjmp. Rather, it refers to the first approach, but thanks to the control transfer scheme, it also has a fixed fee for each try . In essence, this implementation has incorporated all the worst that is inherent in both approaches. Until the fourth version, this was the main exception handling option.
      As the name implies, the transfer of control is carried out through a call to longjmp, each try generates a call to setjmp. The corresponding buffer is allocated on the stack for each try block.
      At the beginning of each function, a prolog is created that registers the current frame in the context stack. Similarly, an epilogue is created that removes the current context from the top of the context stack. A helper code is created next to each function to clear resources.
      More precisely, the compiler logs each return from the function, potentially capable of ending with an exception, to the tree as a key; the value is a pointer to the cleaning code, which must be taken in this place to clear the context of the function. The linker collects pieces of these trees for each link module into a single project tree (very rough). This miracle is called LSDA (language specific data area) and is located in the ".gcc_except_table" section.
      If an exception occurs, based on the type_info of the thrown exception, a block is searched that can handle this exception. Starting from the current context and up to the handler context (using navigation on call frames), the addresses of the code are extracted and executed, which (depending on the scope of local variables) must be executed in this place. Then control is transferred.

      There is a prejudice that this method is very expensive. Already due to the fact that setjmp is called for every try-block, which is not cheap. In fact, you need to completely maintain the state of the processor, where there may be dozens of registers. Whereas at the time of the exception, the contents of most of these registers are already useless. In reality, the compiler acts very rationally. He deploys setjmp, and, saves only useful registers (he already has this information). The author doubts that the costs of setjmp are so high.

      But what really catches your eye is the voluminous auxiliary code, especially in non-trivial cases. The compiler, like YACC, paints all the states of the stack machine. And, although, the optimizer, if possible, cleans up redundancy and trivial code, what remains is more than enough.
    • GCC / DW2 . This is just an example of the first approach to handling exceptions. DW2 means DWARF2 (now 3 already) - a format for storing auxiliary, including debugging information in an executable file. After all, debugging information is also needed so that at any time it is possible to find out the value of any variable, including in frames of previous (upper) calls. Therefore, in the process of code generation, the compiler defers information about what it allocates on the stack, in which registers it places the variables, when it stores them ... Actually, this format is not identical to DWARF, although it is very close to it. Standard for the fourth version of GCC.

      Conceptually, information on how to get into the higher frame of the call is stored at each address of the program code. In practice, due to the volume of this information, it is compressed, in fact, it is calculated using the interpretation of the byte code. This bytecode is executed when an exception occurs. All this is located in the sections ".eh_frame" and ".eh_frame_hdr".
      Yes, among other things, the DWARF interpreter is an excellent backdoor, with the help of which, by changing the bytecode, you can catch the exception and send it for processing wherever you like.
      GCC / DW2 uses almost the same LSDA section as GCC / SJLJ.

      As we can see, the costs associated with stack promotion (in the absence of exceptions) are practically absent. However, the cost of raising an exception is high. In addition, one cannot fail to note the strong integration of the architecture-dependent part of the compiler and its rather high-level layers.
    • MS VC ++ . This compiler implements a second processing strategy.
      • For each function that can potentially throw exceptions, the compiler creates as a stack variable a structure from a pointer to a previous similar structure, addresses of the handler function, and auxiliary data. The address of this structure is entered in FS: [0], which is the top of the stack of these structures. The FS register in Win32 is used as the Thread Information Block ( TIB ) (GS in Win64). It also creates a handler function (with its own data set) and an epilogue that restores FS: [0] if successful.
      • The compiler creates a table of structures - by element for each try-block in the function. Each try-block has a start and end index in this table (a nested block has a nested interval) corresponding to a certain state, and the compiler itself monitors the relevance of this index. In this way, the compiler implements a stack of try blocks.
      • For each try-block, a catch-block table is started. For each type of exception, a type_info table is created for all base classes in the hierarchy of this type of exception.
      • An unwind table is created for each function, each element of which contains a pointer to a function that frees some resource and the number of the previous element. A table may have several chains, depending on the scope of objects with destructors. At the time of exception, by the index of the current state, which was mentioned above, you can find the necessary chain and call all the necessary destructors.
      • For version x64, auxiliary stack structures, if possible, were transferred to .pdata; probably, MS considers the first strategy to be more promising.
      • When an exception is thrown, the main work is carried out by the operating system, where control gets through SYSENTER.
      This method has the same drawbacks as SJLJ - extensive helper code and low portability.
    • The process of raising an exception and choosing the appropriate catch block everywhere looks approximately the same:
      • When an exception is thrown, its descriptor is created, which contains a copy of the object, its type_info, a pointer to the destructor
      • Rising along the stack of try blocks and clearing all registered stacked objects (the navigation on this stack is different everywhere, but the essence is the same), we look through the lists of catch blocks and look for a suitable one.
      • If a suitable catch block is found, the exception object becomes a local variable; we call this block. If the catch block takes an exception by value, rather than a reference, a copy of it will be created.
      • If there was no exception re-call, kill the object - exception
      • “Mistress to the note”:
        some_exception exc("oioi");
        throw exc;
        spawns an extra copy constructor / destructor
        throw *new some_exception("oioi");
        gives a memory leak
        catch(some_exception exc) ...
        again an extra call to the constructor and destructor
        catch(const some_exception *exc) ...
        an exception will fly by unless you drop the pointer
        throw some_exception("oioi");
        ...
        catch (const some_exception &exc)....
        
        minimum cost
    Details can be seen here , here and here .

    And what if ...
    And, it would seem, of all things, then - to call in the right order destructors, whose bodies already exist. How did it happen that a simple, in general, task has such viscous, heavy, and, moreover, independently developed solutions? It is difficult to say, so historically.
    Let's try to outline the solution, trying to leave it simple and, if possible, architecturally independent.
    • First of all, we choose a strategy - this will be the second option.
    • Control transfer - setjmp / longjmp
    • We create a structure, all of whose descendants have the ability to self-register for a possible promotion.
      structunw_item_t {unw_item_t ();
          virtual ~unw_item_t ();
          voidunreg();
          unw_item_t  *prev_;  
      };
      
    • And one more, the scope of which is a try-block
      structjmp_buf_splice {    
          jmp_buf_splice ();
          ~jmp_buf_splice ();    
          jmp_buf         buf_;    
          jmp_buf_splice *prev_;    
          unw_item_t      objs_;  
      };
      
    • For simplicity, we will throw only const char * exceptions with
      externintthrow_slice(constchar *str);
      
    • Several macros to simulate a try block
      // начало блока#define TRY_BLOCK { \
        jmp_buf_splice __sl; \
        const char *__exc = (const char *)setjmp (__sl.buf_); \
        if (NULL == __exc) {
      ...
      // что-то вроде catch(…) т.к. мы бросаем только const char*#define CATCH_BLOCK_FIN  \
        } else { 
      ...
      // конец блока#define FIN_BLOCK  \
          } \
        }
      ...
      // бросаем исключение #define THROW_IN_BLOCK(exc)  \
        throw_slice (exc); 
      ...
      // перебрасываем исключение наверх, __exc определено в TRY_BLOCK#define RETHROW_IN_BLOCK  \
        throw_slice (__exc); 
    • Now show the bodies of members of the jmp_buf_splice class:
      static jmp_buf_splice *root_slice_ = NULL;  
      jmp_buf_splice::jmp_buf_splice ()
      {
        objs_ = NULL;
        prev_ = root_slice_;
        root_slice_ = this;
      }
      jmp_buf_splice::~jmp_buf_splice ()
      {
        root_slice_ = prev_;
      }
      
      Here is an option for single-threaded implementation. If there are several threads, instead of root_slice_ we will have to use TLS, similar to, for example, how GCC does it.
    • The time has come for members of the unw_item_t class:
      unw_item_t::unw_item_t ()
      {
        if (NULL != root_slice_) 
        {
            prev_ = root_slice_->objs_;
            root_slice_->objs_ = this;
        }
      }
      unw_item_t::~unw_item_t ()
      {
        unreg();
      }
      unw_item_t::unreg ()
      {
        if (NULL != root_slice_ && 
          (prev_ != reinterpret_cast<unw_item_t *>(~0))) 
        {
            root_slice_->objs_ = prev_;
            prev_ = reinterpret_cast<unw_item_t *>(~0);
        }
      }
      
    • Now consider the process of raising an exception and promoting the stack:
      staticintpop_slice(){
        jmp_buf_splice *sl = root_slice_;
        assert (NULL != sl);
        root_slice_ = sl->prev_;
        return0;
      }
      intthrow_slice(constchar *str, bool popstate){
        if (NULL == str)
          return-1;
        jmp_buf_splice *sl = root_slice_;
        unw_item_t *obj = root_slice_->objs_;
        while (NULL != obj)
          {
            unw_item_t *tmp = obj;
            obj = obj->prev_;
            tmp->~unw_item_t ();
          }
        if (popstate)
          pop_slice ();
        longjmp (sl->buf_, int(str));	
        return0;
      }
      
    • Service class - analogue of std :: auto_ptr:
      template<typename cl>
        classdeleter_t :publicunw_item_t {
        public:
          deleter_t (cl *obj){ptr_ = obj;};
          virtual ~deleter_t () {delete ptr_;};
        private:
          cl *ptr_;
          deleter_t ();
          deleter_t (constdeleter_t &);
          deleter_t &operator= (constdeleter_t &);
        };
      
    • Service class - array:
      template<typename cl>
        classvec_deleter_t :publicunw_item_t {
        public:
          vec_deleter_t (cl *obj){ptr_ = obj;};
          virtual ~ vec_deleter_t () {delete [] ptr_;};
        private:
          cl *ptr_;
          vec_deleter_t ();
          vec_deleter_t (constvec_deleter_t &);
          vec_deleter_t &operator= (constvec_deleter_t &);
        };
      
    • Examples.
      Test class
      class _A {public:
      _A():val_(++cnt_){printf ("A::A(%d)\n",val_);}
      	_A(int i):val_(i){printf ("A::A(%d)\n",val_);}
      	virtual ~_A(){printf ("A::~A(%d)\n",val_);}
      staticint cnt_;
      };
      int _A::cnt_ = 0;
      classA :publicunw_item_t, _A {};
    • Example 1
      A a(1);
        TRY_BLOCK {
      	A b(2);
      	THROW_IN_BLOCK("error\n");
            std::cerr << "notreached\n";
        }
        CATCH_BLOCK_FIN {
            std::cerr << __exc;
        }
        FIN_BLOCK;

      A :: A (1)
      A :: A (2)
      A :: ~ A (2)
      error
      A :: ~ A (1)
    • Example 2
      A a(1);
        TRY_BLOCK {
      	A b(2);
      	TRY_BLOCK {
      	  A c(3);
      	  THROW_IN_BLOCK("error\n");
      	  std::cerr << "notreached\n";
      	}
      CATCH_BLOCK_FIN {
      	  std::cerr << "." << __exc;
      	  RETHROW_IN_BLOCK;
      	}
      	FIN_BLOCK;
            std::cerr << "notreached\n";
          }
        CATCH_BLOCK_FIN {
            std::cerr << ".." << __exc;
          }
        FIN_BLOCK;
      

      A :: A (1)
      A :: A (2)
      A :: A (3)
      A :: ~ A (3)
      .error
      A :: ~ A (2)
      ..error
      A :: ~ A (1)
    • Example 3
        TRY_BLOCK {
          vec_deleter_t<_A> da(new _A[3]);
          TRY_BLOCK {
      	THROW_IN_BLOCK("error\n");
      	std::cerr << "notreached\n";
          }
          CATCH_BLOCK_FIN {
            std::cerr << "." << __exc;
      	RETHROW_IN_BLOCK;
          }
          FIN_BLOCK;
          std::cerr << "notreached\n";
        }
        CATCH_BLOCK_FIN {
            std::cerr << ".." << __exc;
          }
        FIN_BLOCK;
      

      A::A(1)
      A::A(2)
      A::A(3)
      .error
      A::~A(3)
      A::~A(2)
      A::~A(1)
      ..error

    Limitations
    This solution has a lot of disadvantages:
    • You cannot throw exceptions in the destructor. The unw_item_t destructor has not yet deleted the link to this instance, as a result, the destructor will be called again.
    • Creating an object of the class inherited from unw_item_t using the new operator is very dangerous. Even if one takes care of the memory itself, such a pointer may fall into a foreign context or even into a foreign stream, an object that it is looking at may suddenly cause a destructor, which will result in a metabolic catastrophe.
    • A class inherited from unw_item_t cannot be aggregated as a member of another class, otherwise its destructor will be called twice.
    • The described method cannot be integrated with hardware exceptions.
    • Limitations on exception types. Above, we used only a string pointer. If you pass primitive types as an exception, then there can be only one option. If we use an object pointer as an exception, then we can use RTTI. You can suggest something like
      #define CATCH_BLOCK_TYPED(t)  \
        } elseif (NULL != dynamic_cast<t>(__exc)) {
      And this will give us the opportunity to use exceptions of different types. But then it is impossible to throw exceptions of primitive types.
    • The user must delete the thrown exception object.

    But still.
    Despite the limitations described, the described method has inherent advantages:
    • Simplicity. A few dozen lines of code - and everything works.
    • Transparency concept.
    • Easy portability. No architecture dependency.
    Is it possible to eliminate the disadvantages of this method, while retaining its advantages? Yes and no. Using only C ++ tools, this is not possible.

    What is the author driving at?
    In order of technical delirium, we’ll think about how to modify the compiler in order to correctly implement the above scheme?
    What was missing in the above solution? Knowledge of how the object was generated.
    For example, if an object is built on memory allocated from a common heap and can migrate between threads, it should by no means be registered on a stream-dependent stack. You should not register an object aggregated to another object anywhere.
    And with an object of the same type, but on the stack memory, this must be done. Of course, it is possible to pass a pointer to this stack object to another thread, but it is difficult to imagine in what situation this could be useful.
    So:
    • For stack objects of type T, the compiler actually creates a wrapper class of type
      template<classT>
      class __st_wrapper :publicunw_item_t  {
      public:
          virtual ~__st_wrapper() 
          {
            unreg();
            ((T*)data_)->T::~T();
          };
      private:
         char data_[sizeof(T)];
      };
      
      as well as the challenge of the desired designer T .
    • The static member of the jmp_buf_splice :: root_slice_ class is implemented either through TLS or through the corresponding register, if any
    • The programmer still sees only an object of type T located in data_
    • For stacked objects without virtual destructors, this one appears in the wrapper
    • It is now possible to throw exceptions in destructors since Before calling the destructor itself, we deregistered.
    • We do not support hardware exceptions (kernel exceptions), therefore, at the time the exception is thrown, the compiler knows which registers must be “landed” and must do so
    • For regular destruction of stack objects, the compiler creates calls to __st_wrapper 's destructors
    • The mechanism for choosing the appropriate catch block is left as is. Those. we still need auxiliary table information with descriptors of these blocks outside the code.
    • We will transfer control using the analogue of setjmp. It is proposed to implement an intermediate (with respect to the two described above) control transfer option. Setjmp has a significant drawback - the buffer size is quite large, while its small part is actually used.
      DWARF bytecode execution, on the other hand, seems very wasteful.
      Therefore, instead of the setjmp buffer, we will store a list of registers requiring recovery and shifts relative to the stack pointer, where the actual values ​​are. In the case of a calculated value, the value is stored directly in the register. To do this, additional memory is allocated on the stack and a shift is given to it. In fact, a temporary variable is started.
      Before raising an exception, the compiler unloads all relevant data from the registers, in which case it can be restored without loss.

      Nevertheless, it is worth noting that the use of the try block is a conscious act, there is nothing wrong with the fact that this entails certain costs. IMHO these (moderate) costs are even useful since stimulate a responsible attitude to language tools.
    • Exception catching when calling the operator new and new [] is left as is. Those. we protect each iteration with an internal try block and destroy everything created in previous iterations if an exception occurs, which is then overexcited. And, of course, we give back the memory allocated for the object [s.]
    • There is no need to do anything to implement an array of stack objects. But you can save some memory by implementing a special stack object - a vector similar to that used when calling the new [] operator .


    By the way.
    • An object can find out that it is a stack. To do this, its this must be within the stack segment of the current thread.
    • So you can remove the object from the hook? I can’t imagine why this might be needed, but such an opportunity exists.
    • If you can take it off, then you can plant it. Allocate memory on the stack via alloca , force the constructor to be called, and connect it to the stack promotion mechanism.
    • For architectures with separate data and control stacks, exception handling can be implemented very efficiently using the control stack instead of a list.


    PS: Special thanks to Alexander Artyushin for the informative discussion.

    Also popular now: