Virtual Method Chart and Safety

    Safety precautionsAs a short warm-up before the article, I would like the reader to ask himself the following question: does the photographer need to know how the camera works to get high-quality images? Well, at least should he know the concept of “diaphragm”? Signal to noise ratio? "Depth of field"? Practice suggests that even with the knowledge of such complex words, pictures can be obtained with the most “hand-held” ones that are not particularly better shot on a mobile phone through a 0.3-megapixel hollow. And vice versa, truly good pictures can be obtained solely through experience and inspiration with complete ignorance of the materiel (although this is, rather, an exception to the rule, but still). However, it is unlikely that someone will argue with me that professionals who want to squeeze everything out of their technology (and not just the number of megapixels per square millimeter of the matrix) need this knowledge without fail, because otherwise he cannot be called a professional. And this is true not only for the digital photography industry, but for almost any other.

    This is true for programming, and for programming in C ++ - doubly. This article will describe the important concept of the language, known as the "Virtual Table Index", which is present in almost all complex classes, and how it can be accidentally damaged. This, in turn, can lead to hardly debugging errors. First, let me remind you what it is all about, and then I will share my thoughts on how and what can break there.

    To our great regret, in this article there will be many discussions related to the low level. But alas, no longer illustrate the problem. At the same time, I will make a reservation that the article was written for the most part for the Visual C ++ Compiler in the build mode of a 64-bit program - the results of the program in other compilers and for a different architecture may differ.

    Virtual table pointer


    The theory says that the vptr pointer - a pointer to a table of virtual methods, or a virtual table pointer - is present in every class that has at least one virtual method. Let’s understand in more detail what kind of animal it is. To do this, we will write a simple demo program in C ++.
    #include 
    #include 
    using namespace std;
    int nop() {
      static int nop_x; return ++nop_x; // Не удаляй меня, компилятор!
    };
    class A
    {
    public:
      unsigned long long content_A;
      A(void) : content_A(0xAAAAAAAAAAAAAAAAull)
          { cout << "++ A has been constructed" << endl;};
      ~A(void) 
          { cout << "-- A has been destructed" << endl;};
      void function(void) { nop(); };
    };
    void PrintMemory(const unsigned char memory[],
                     const char label[] = "contents")
    {
      cout << "Memory " << label << ": " << endl;
      for (size_t i = 0; i < 4; i++) 
      {
        for (size_t j = 0; j < 8; j++)
          cout << setw(2) << setfill('0') << uppercase << hex
               << static_cast (memory[i * 8 + j]) << " ";
        cout << endl;
      }
    }
    int main()
    {
      unsigned char memory[32];
      memset(memory, 0x11, 32 * sizeof(unsigned char));
      PrintMemory(memory, "before placement new");
      new (memory) A;
      PrintMemory(memory, "after placement new");
      reinterpret_cast(memory)->~A();
      system("pause");
      return 0;
    };

    Despite the relatively large amount of code, the logic of its operation should be fairly obvious: 32 bytes are allocated on the stack, which are filled with 0x11 values ​​(we consider this to be “garbage” in memory). Then, on top of these 32 bytes
    , a fairly trivial object of class A is created using the placement new operator . Finally, the contents of the memory are printed, after which the program destroys the object and ends its execution. Below is the output of this program (Microsoft Visual Studio 2012, x64).
    Memory before placement new:
    11 11 11 11 11 11 11 11
    11 11 11 11 11 11 11 11
    11 11 11 11 11 11 11 11
    11 11 11 11 11 11 11 11
    ++ A has been constructed
    Memory after placement new:
    AA AA AA AA AA AA AA AA
    11 11 11 11 11 11 11 11
    11 11 11 11 11 11 11 11
    11 11 11 11 11 11 11 11
    -- A has been destructed
    Press any key to continue . . .

    It is easy to notice that the size of the class in memory is 8 bytes and is equal to the size of its only member unsigned long long content_A.

    Let's complicate the program a bit by adding the virtual keyword to the declaration of the void function (void) function:
    virtual void function(void) {nop();};

    Program output (hereinafter, only part of the output will be shown with the exception of Memory before placement new and Press any key ...):
    ++ A has been constructed
    Memory after placement new:
    F8 D1 C4 3F 01 00 00 00
    AA AA AA AA AA AA AA AA
    11 11 11 11 11 11 11 11
    11 11 11 11 11 11 11 11
    -- A has been destructed

    Again, it is easy to see that the class size in memory is now 16 bytes. The first eight bytes are now occupied by a pointer to a table of virtual methods. The pointer at this start of the program turned out to be 0x000000013FC4D1F8 (the pointer and content_A are “expanded” in memory, since Intel64 uses a little-endian byte order ; however, in the case of content_A you won’t be able to say so right away).

    The virtual method table is a special structure in memory that is automatically generated that lists pointers to virtual methods. If the function () method is called somewhere in the code with respect to the pointer to the class A, instead of calling the A :: function () function directly, a function will be called that is in the table of virtual methods at the desired offset - this behavior implements polymorphism. The virtual function table itself is presented below (obtained by compiling with the / FAs switch; additionally, pay attention to the somewhat strange function name in the assembler code - it went through the " mangling of names "):
    CONST SEGMENT
    ??_7A@@6B@ DQ  FLAT:??_R4A@@6B@   ; A::'vftable'
     DQ FLAT:?function@A@@UEAAXXZ
    CONST ENDS


    __declspec (novtable)


    Sometimes there are situations when the virtual class table, in principle, is not needed. Suppose that we will never instantiate class A, and if we do, then only on weekends and holidays, but at the same time carefully making sure that not a single virtual function is called. This is a fairly common situation in cases of abstract classes - it is known that if a class is abstract, then it cannot be instantiated. Not at all. Indeed, if the function (void) function were declared in class A as abstract, then the table of virtual methods would look like this:
    CONST SEGMENT
    ??_7A@@6B@ DQ FLAT:??_R4A@@6B@ ; A::'vftable'
     DQ FLAT:_purecall
    CONST ENDS

    Obviously, an attempt to call such a function will lead to a cross of one's own leg.

    The question arises: if a class is never instantiated, then why set a virtual table pointer? In order for the compiler not to generate extra code, it can be given an instruction in the form of __declspec (novtable) (carefully: Microsoft-specific!). We rewrite our example class with a virtual function using the __declspec (novtable) attribute:
    class __declspec(novtable) A { .... }

    The output of the program will be as follows:
    ++ A has been constructed
    Memory after placement new:
    11 11 11 11 11 11 11 11
    AA AA AA AA AA AA AA AA
    11 11 11 11 11 11 11 11
    11 11 11 11 11 11 11 11
    -- A has been destructed

    First of all, pay attention to the fact that the size of the object has not changed: it still takes 16 bytes. In total, after introducing the __declspec (novtable) attribute, only two differences appeared: first, now in the place where the address of the virtual method table was located earlier, there is an uninitialized memory area; secondly - in the assembler code, the table of virtual methods of class A now no longer exists. But the virtual table pointer is still there and still “weighs” eight bytes! This must be remembered because ...

    Inheritance


    We rewrite our example in such a way as to implement the simplest inheritance from an abstract class with a virtual table pointer.
    class __declspec(novtable) A // Я никогда не инстанцируюсь
    {
    public:
      unsigned long long content_A;
      A(void) : content_A(0xAAAAAAAAAAAAAAAAull)
          { cout << "++ A has been constructed" << endl;};
      ~A(void) 
          { cout << "-- A has been destructed" << endl;};
      virtual void function(void) = 0;
    };
    class B : public A // Я всегда инстанцируюсь вместо A
    {
    public:
      unsigned long long content_B;
      B(void) : content_B(0xBBBBBBBBBBBBBBBBull)
          { cout << "++ B has been constructed" << endl;};
      ~B(void) 
          { cout << "-- B has been destructed" << endl;};
      virtual void function(void) { nop(); };
    };

    We also make it so that instead of class A, class B is created (and destroyed) in the main program:
    ....
    new (memory) B;
    PrintMemory(memory, "after placement new");
    reinterpret_cast(memory)->~B();
    ....

    The output of the program will be as follows:
    ++ A has been constructed
    ++ B has been constructed
    Memory after placement new:
    D8 CA 2C 3F 01 00 00 00
    AA AA AA AA AA AA AA AA
    BB BB BB BB BB BB BB BB
    11 11 11 11 11 11 11 11
    -- B has been destructed
    -- A has been destructed

    Let's try to figure out what happened. The constructor B :: B () was called. This constructor, before being executed, calls the constructor of the base class, the constructor A :: A (). First of all, it would have to initialize the virtual table pointer, however, due to the __declspec (novtable) attribute, it was not initialized. Then the constructor sets the value of the content_A field to 0xAAAAAAAAAAAAAAAAAull (the second field in memory) and returns control to the constructor B :: B ().

    Since object B does not have the __declspec (novtable) attribute, the constructor sets the virtual table pointer (first field in memory) to the virtual method table of class B, and then sets content_B to 0xBBBBBBBBBBBBBBBBBull (third field in memory) and returns control to the main program. From the contents of the memory, one can easily understand that an object of class B was constructed correctly, and it is clear from the logic that an operation unnecessary in this context was skipped. If you are confused: by an unnecessary operation we mean initializing a pointer to a virtual table in the constructor of the base class.

    It would seem that only one operation has been missed - the point is to get rid of it? But if the program has thousands and thousands of classes inherited from the same abstract class, getting rid of one auto-generated command can seriously affect performance. And it will affect. Do not believe?

    Memset function


    The main idea of ​​the memset () function is to fill a region of memory with some constant value (most often zeros). In C, it could be used to quickly initialize all fields of a structure. And what is the difference between a C ++ class and a C structure in memory location if it does not have a virtual table pointer? In principle, nothing, data - they are data. To initialize really simple classes (in C ++ 11 terminology - types with a standard device) it is quite possible to use the memset () function. But, in theory, the memset () function can be used to initialize all classes in general, but what will be the consequences? Incorrect memset () can in one fell swoop cause the virtual table pointer to become unusable. But the question immediately arises: is it still possible if the class is declared as __declspec (novtable)?

    Answer: it is possible, but only carefully.

    We rewrite the classes as follows: add the wipe method, which will set the entire contents of class A to 0xAA:
    class __declspec(novtable) A // Я никогда не инстанцируюсь
    {
    public:
      unsigned long long content_A;
      A(void)
        {
          cout << "++ A has been constructed" << endl;
          wipe();
        };
        // { cout << "++ A has been constructed" << endl; };
      ~A(void) 
        { cout << "-- A has been destructed" << endl;};
      virtual void function(void) = 0;
      void wipe(void)
      {
        memset(this, 0xAA, sizeof(*this));
        cout << "++ A has been wiped" << endl;
      };
    };
    class B : public A // Я всегда инстанцируюсь вместо A
    {
    public:
      unsigned long long content_B;
      B(void) : content_B(0xBBBBBBBBBBBBBBBBull)
          { cout << "++ B has been constructed" << endl;};
          // {
          //   cout << "++ B has been constructed" << endl;
          //   A::wipe();
          // };
      ~B(void) 
          { cout << "-- B has been destructed" << endl;};
      virtual void function(void) {nop();};
    };

    The output of the program in this case will be quite expected:
    ++ A has been constructed
    ++ A has been wiped
    ++ B has been constructed
    Memory after placement new:
    E8 CA E8 3F 01 00 00 00
    AA AA AA AA AA AA AA AA
    BB BB BB BB BB BB BB BB
    11 11 11 11 11 11 11 11
    -- B has been destructed
    -- A has been destructed

    So far, everything is working well.

    However, you should slightly change the place of the wipe () function call, commenting out the lines of the constructors and uncommenting the lines following them, and it will immediately become clear that something went wrong. The first call to the virtual function function () will result in a runtime error due to a damaged virtual table pointer:
    ++ A has been constructed
    ++ B has been constructed
    ++ A has been wiped
    Memory after placement new:
    AA AA AA AA AA AA AA AA
    AA AA AA AA AA AA AA AA
    BB BB BB BB BB BB BB BB
    11 11 11 11 11 11 11 11
    -- B has been destructed
    -- A has been destructed

    Why did it happen? The wipe () function was called after the class B constructor initialized the pointer to the virtual method table. As a result, this pointer has deteriorated. In other words, you should not nullify a class with a virtual table pointer, even if it is declared with __declspec (novtable). Full zeroing will be appropriate only in the constructor of the class that will never be instantiated, and even then this must be done with great care.

    Memcpy function


    With memcpy (), the picture is exactly the same. Again, in theory, it can be used to copy types with a standard device in memory. However, judging by the practice, some programmers like to use it where necessary and where not. In the case of types that do not have a standard device in memory, using the memcpy () function is like walking a tightrope over Niagara Falls: one mistake can lead to fatal consequences, and making it is ridiculously simple. As an example:
    class __declspec(novtable) A
    {
      ....
      A(const A &source) { memcpy(this, &source, sizeof(*this)); }
      virtual void foo() { }
      ....
    };
    class B : public A { .... };

    A copy constructor can write whatever its digital soul wants, into a pointer to a virtual table of an abstract class: the correct value will be placed there anyway. But in the implementation of the assignment operator it is no longer possible to use the memcpy () function:
    class __declspec(novtable) A
    {
      ....
      A &operator =(const A &source)
      {
        memcpy(this, &source, sizeof(*this)); 
        return *this;
      }
      virtual void foo() { }
      ....
    };
    class B : public A { .... };

    Now, recall how we are accustomed to the fact that the assignment operator and copy constructor are practically the same thing. No, not everything is so bad: in practice, the code of the assignment operator can even work correctly, but not at all because it is correct, but because the stars are so formed. In the code, a pointer to a table of virtual methods from another object is copied, and it is not known what it will result in.

    PVS-Studio


    This article appeared as a result of a detailed study regarding the mysterious __declspec (novtable), as well as when and when you can not use the memset () and memcpy () functions in high-level code. From time to time, developers write to us that the PVS-Studio analyzer too often gives warnings about the virtual table pointer. Programmers believe that if there is __declspec (novtable), then there is no virtual method table or virtual table pointer. We began to carefully deal with this issue and realized that it was not so simple.

    This must be remembered.If you use __declspec (novtable) when declaring a class, this does not mean that the class does not contain a pointer to the virtual method table! But this pointer is initialized or not - this is a completely different question.

    We will make sure that the analyzer does not swear on the memset () / memcpy () function, but only if they are used in the constructors of the base class declared with __declspec (novtable).

    Conclusion


    Unfortunately, the article failed to cover a lot of material related to inheritance (for example, the topic of multiple inheritance remained completely uncovered). However, I hope that this information will make it possible to understand that “everything is not so simple there” and that it is worth thinking three times before using low-level functions in relation to high-level objects. And in general, is it worth it?

    Also popular now: