Place and rule! Use host new to optimize C ++ code

Published on May 18, 2015

Place and rule! Use host new to optimize C ++ code

  • Tutorial


Creating an object after an object, we often do not pay attention to such a “trifle” as dynamic memory allocation. Along with copying and serialization, allocating memory from the heap through new gradually negates the advantages of C ++ in speed. The more intensively we use the cherished new, the more difficult it becomes for the application, since the memory runs out, is fragmented and strives to leak in every possible way. Convenient, but implicitly dangerous for performance STL containers: vector, string, deque, map have already suffered this fate. It is especially disappointing to lose speed on the allocation of small objects in large quantities. But there is a way to handle the placement of the memory of such objects on the stack, while hiding implementation details in a special data class. This will help us with the mechanism of placing new - an unsurpassed way to optimize the application,

In the last lesson, we did amazing things: we worked with C ++ objects as containers containing values ​​of a type calculated at runtime and filled dynamically. We actively used the Copy-on-Write add-on over std :: shared_ptr, which referred to the real data type, when filling out the object. It was also understood that we would also allocate memory dynamically for any data initialization, calling new every time we needed new data of an arbitrary type.

This approach has its advantages. Data can be shared between multiple objects, delaying copying. You can, in principle, not know anything in advance about the data type. However, this method also has a number of drawbacks, due to which Copy-on-Write is used, as a rule, for objects that are potentially quite large.

The first flaw is revealed immediately. Mass dynamic memory allocation seriously slows down program execution, especially mass implicit memory allocation via new. Yes, I am aware of both std :: string and std :: vector, which often, without asking the programmer, begin to reallocate memory, calling one new after another (and we'll talk about moving data to std :: vector as well). A good specialist in C ++ development always knows about these funny features of standard containers and understands how to avoid unnecessary expenses for allocating new memory segments. What pure C has always been good for is precisely because any work with memory was performed transparently, in C ++ you always need to keep in mind a whole series of cases of implicit work with memory.

The second drawback is a consequence of the first. Frequent allocation of small segments of memory in large quantities will lead to terrible fragmentation of memory and the inability to allocate even a fairly small block of memory in a single piece, for example, to initialize the same std :: vector or std :: string. As a result, we get bad_alloc for no apparent reason. There is much more memory than necessary, but it will not work to allocate a continuous block of even a small size in conditions of highly fragmented memory.

Thus, for small objects comparable to int64_t, which can be easily placed on the stack, another data processing technique can and should be used. Such objects can be passed by value, you can copy as many times as you like, without delaying until the first change, since one or two registers are trivially copied.

At the same time, we should not deviate from the practice of declaring data details in implementation. But we have to sacrifice something: we will need to know in advance the exact size of the data in bytes. It will be required in order to keep a buffer in the class for placing object data together with the usual data pointer. Now in more detail.

First grade


Outwardly, almost nothing changes. All the same class that provides the API objects. The class contains a link to data whose class is declared through forward declaration and will be rendered in implementation details. Because of this, the class field cannot be declared an object of this type, however, you can refer to the data type with a simple pointer and set up a buffer in advance for storing object data in the same object. If an object is created, for example, on the stack, then all data will be stored on the stack as part of the object. Now consider an example so that everything falls into place:

class object
{
public:
    ...
protected:
    // Объявление класса данных
    class data;
    // Заранее известное количество байтов под данные
    static const size_t max_data_size = N;
private:
    // Указатель на данные
    data* m_data;
    // Буфер памяти, где будут храниться данные
    char m_buffer[max_data_size];
};

In this piece of code, we continue the ideology of hiding data in the implementation, all we know about class data is the name of the class and the presence of a pointer to the data. However, now we have the opportunity to stay out of memory in heap. A class in C ++ terminology still stores data in the form of its fields. In fact, the data will be placed in the m_buffer buffer, the memory for which was already allocated when the class was created. It remains only to explain the details of how to place data in a byte buffer.

Hosting new


As a rule, few people think of such a useful property of the new operator as the ability to specify a ready-made memory area for placing the created object. All we need to do is write new (m_buffer) to create any type of object in the allocated buffer. It sounds simple, but you need to remember that we pay a high price: specifying the maximum buffer size in advance. Not only that, the size of the buffer falls into the header file and is explicitly involved in the API declaration.

But we win in speed. If, by allocating data on the heap for each initialization, we run the risk of lagging behind Java, then, by placing all the data on the stack, we have pure si speed, an unattainable speed for almost any high-level language except C ++. At the same time, the level of abstraction is extremely high, we build the API on ordinary C ++ objects, hiding implementation details. The only limitation is the size we set; we can no longer easily change the set of fields in the data class in the implementation, we always need to remember the size. Moreover, we need to check the size of the data described in the implementation for compliance with that specified in the header file. Just because the assembly of the library may diverge from the version of the header files, for example when receiving from various sources. Let's look at an example of how such a check should look,

object::object()
    : m_data(new(m_buffer) object::data)
{
    static_assert(sizeof(object::data) <= max_data_size, "...");
}

Here, static_assert will actually be executed at the compilation stage, so m_data will be initialized only if there is enough memory in the m_buffer buffer for object :: data. Likewise, in a descendant class, for example, flower, in the object class, the data should also not exceed the given bar, since we store the data in the base class implementation.

flower::flower(std::string const& name)
    : object(new(get_buffer()) flower::data(name))
{
    static_assert(sizeof(flower::data) < max_data_size, "..." );
}

Obviously, this requires the get_buffer () protected method to get the m_buffer address in the base class, as well as the object's protected constructor from object :: data *. As in the previous release, we inherit the data of the heirs from the data of the base class, therefore flower :: data * is compatible with object :: data *. For security, it’s worth adding to the base constructor from object :: data * a check that the address of the previously allocated buffer is passed:

object::object(object::data* data_ptr)
{
    if (static_cast<void*>(data_ptr) != static_cast<void*>(m_buffer))
        throw some_exception(...);
    m_data = data_ptr;
}

As a result, as before, we are able to emulate dynamic typing by working with ordinary class objects. For example, like this:

object rose = flower("rose");

Large Data Objects


It remains to find out what to do with objects whose data size goes beyond the indicated maximum. In fact, everything here is quite simple. It is enough that the size of copy_on_write <data :: impl> fits into the limit, which is essentially an add-on for std :: shared_ptr <data :: impl>, where impl is an implementation of a data class of arbitrary size. Since the size of std :: shared_ptr <data :: impl> does not depend on the size of the objects of the data :: impl class itself, we get a universal way to store data with the transition from storage by value to storage by reference.

class huge
{
public:
    ...
protected:
    class data;
};
class huge::data
{
public:
    ...
protected:
    class impl;
private:
    copy_on_write<impl> m_impl;
};

However, let’s digress from the solution to the problem of a single API for objects with dynamic typing and consider another example of optimization through placing new.

copy_on_write :: flashback


If someone skipped the previous release, the copy_on_write class is a template class for storing data with copy optimization. Emulating a pointer, this class has a tricky operator-> overload for const and non-const cases. When copying objects, we refer to the same data without causing expensive copying. However, as soon as we call a non-constant method of the data class that potentially modifies the data, we detach our copy of the data for the current object. Simplified implementation looks something like this:

template <typename impl_type>
class copy_on_write
{
public:
    copy_on_write(impl_type* pimpl)
        : m_pimpl(pimpl) {
    }
    impl_type* operator -> () {
        if (!m_pimpl.unique())
            m_pimpl.reset(new impl_type(*m_pimpl));
        return m_pimpl.get();
    }
    impl_type const* operator -> () const {
        return m_pimpl.get();
    }
private:
    std::shared_ptr<impl_type> m_pimpl;
};

Thus, when choosing the maximum data size for the built-in buffer, it is worth considering the size of the class containing copy_on_write as the field.

Data Fields


The most powerful way to optimize through the host new is to use the fields of the selection records as a result of the SQL query. A sample requests a dataset of a wide variety of types, from integer and real to strings and arrays. Although the data itself is obtained dynamically and the field types obtained from the database side have to be initialized with dynamic typing emulation, but all records contain the same set of field types by which you can determine the total data size for each record. This allows us to allocate memory for record fields only once, calculating the size according to the types of fields included in each record in the selection. You can also allocate memory once for all records in a single block, however, as a rule, after fetching records, all kinds of operations are performed, including filtering and sorting them, therefore, it makes sense to describe the records themselves as Copy-on-Write objects for the convenience of subsequent operations. It is unreasonably expensive to allocate memory from a heap for each field.

This is how our record class will look if we simplify the declaration and use copy_on_write directly from the data class:

class record
{
public:
    record(std::vector<field::type> const& types);
    ...
protected:
    class data;
private:
    copy_on_write<data> m_data;
};
class record::data
{
public:
    data(std::vector<field::type> const& types);
    ...
private:
    std::vector<char> m_buffer;
    std::vector<field*> m_fields;
};

Here, to simplify the explanation, the vector of field types std :: vector <field :: type>, an array of enum values, is introduced. In fact, this array should be typed from arguments via boost :: fusion or, using Boost.Preprocessor, dial an array of generalized objects of type object from any type of arguments. The mechanism of a single allocation of memory from the heap for each record is now important to us.

record::data::data(std::vector<field::type> const& types)
    : m_buffer(field::calc_size(types)),
      m_fields(types.size())
{
    size_t offset = 0;
    std::transform(types.begin(), types.end(), m_fields.begin(),
        [&offset](field::type type, field*& field_ptr) {
           field_ptr = new(m_buffer + offset) field(type);
           offset += field::size(type);
        }
    );
}

where field :: size calculates the size of the data from the passed field :: type, and field :: calc_size already calculates the total size needed for the entire set of record types, passed as std :: vector <field :: type>.

The field field is implemented similarly to the object type, essentially a container of dynamic content. Most types: int64_t, bool, double are scalars and are stored by value. The std :: string type can also be stored by value, however, it is worth considering that almost certainly the string data will be stored in the heap and allocated dynamically. If you want to support a certain varchar of a certain length, then, most likely, you will need your own type copy_on_write with an array of characters of a fixed length.

Different types of fields are similar to different types of objects inherited from the object class. You can not even use enum, but tied directly to types, but, as a rule, parsing the result of an SQL query entails deserializing a packet of bytes with data, where all field types are known in advance, so enum does not entail any restrictions here. Moreover, metaprogramming is not for the faint of heart, and we will not consider MPL and Boost.Fusion here.

It remains to touch upon the last important aspect of using the host new - the pool of objects of the same type in C ++.

Pool of objects of the same type


As before, we are optimizing dynamic memory allocation. What is an object pool? This is an array of workpieces pre-allocated by a large crowd for initializing a certain type. In a way, the record above was a pool for field objects. Also, you probably met a pool of objects if you worked with high-level languages ​​(C #, Python, Java), because they use prepared segments of memory to allocate new objects in which they place objects, in fact the type is object. After one of the pool objects becomes unnecessary, in other words, it is no longer referenced, it is either immediately de-initialized or is waiting for its sad fate in the form of the next bypass of the Garbage Collector, the garbage collector, a special mechanism for removing ownerless goods. Generally speaking, the de-initialization of objects in a pool is its weak point. But we get the speedy selection of objects, usually either already initialized, or prepared for initialization. If we make on the basis of our object type a full-fledged pool of objects with de-initialization by reference count and with Garbage Collector, we get Java or Python. If you needed something like that, maybe you shouldn’t fence the garden and take ready-made language with garbage collection? However, if for optimization of objects of the same type it was required to allocate a large segment of memory in advance and the task really requires mass initialization of a large number of objects with a certain base class, then the pool of objects will help to avoid the mass of dynamic memory allocations. If we make on the basis of our object type a full-fledged pool of objects with de-initialization by reference count and with Garbage Collector, we get Java or Python. If you needed something like that, maybe you shouldn’t fence the garden and take ready-made language with garbage collection? However, if for optimization of objects of the same type it was required to allocate a large segment of memory in advance and the task really requires mass initialization of a large number of objects with a certain base class, then the pool of objects will help to avoid the mass of dynamic memory allocations. If we make on the basis of our object type a full-fledged pool of objects with de-initialization by reference count and with Garbage Collector, we get Java or Python. If you needed something like that, maybe you shouldn’t fence the garden and take ready-made language with garbage collection? However, if for optimization of objects of the same type it was required to allocate a large segment of memory in advance and the task really requires mass initialization of a large number of objects with a certain base class, then the pool of objects will help to avoid the mass of dynamic memory allocations. do not fence the garden and take the finished language with garbage collection? However, if for optimization of objects of the same type it was required to allocate a large segment of memory in advance and the task really requires mass initialization of a large number of objects with a certain base class, then the pool of objects will help to avoid the mass of dynamic memory allocations. do not fence the garden and take the finished language with garbage collection? However, if for optimization of objects of the same type it was required to allocate a large segment of memory in advance and the task really requires mass initialization of a large number of objects with a certain base class, then the pool of objects will help to avoid the mass of dynamic memory allocations.

To understand, we need a clear, applied explanation. How about actually fetching as a result of SQL query with a pool for records? This will optimize the mass of memory allocations for constructing sample record objects.

class selection
{
public:
    selection(std::vector<field::type> const& types,
              size_t row_count);
    ...
protected:
    class data;
private:
    copy_on_write<data> m_data;
};
class selection::data
{
public:
    data(std::vector<field::type> const& types,
         size_t row_count);
    ...
private:
    std::vector<field::type> m_types;
    std::vector<char> m_buffer;
    std::vector<record> m_rows;
};
selection::data::data(std::vector<field::type> const& types,
                      size_t row_count)
    : m_types(types)
{
    if (!row_count) return;
    m_rows.reserve(row_count);
    size_t row_data_size = field::calc_size(types);
    m_buffer.resize(row_count * row_data_size);
    char* offset = m_buffer
    for (size_t i = 0; i < row_count; ++i)
    {
        m_rows.push_back(record::inplace(offset, types));
        offset += row_data_size;
    }
}

Where record :: inplace essentially creates the record data not on the heap, but at the given address.

record record::inplace(void* address,
                       std::vector<field::type> const& types)
{
    return record(new(address) record::data(types));
}

We need a record constructor with initialization and a special destructor, more on that later. This option to initialize record makes it impossible to use it in the previous version, that is, in the form of a class containing only the copy_on_write field. We will not be able, calmly relying on the dynamic allocation of data on the heap, toggle the records as we want. On the other hand, we get a crazy performance boost with a large dataset. However, there is a catch in the posting new that you should be aware of.

Explicit destructor call


WARNING


If someone has the habit of not reading to the end or reading diagonally - very in vain. By skipping this important section, you can spawn memory leaks - memory leaks, and in large quantities.

There is one more “but” when using the host new - you have to call the destructor yourself, manually, because delete will not do anything. Therefore, a class containing data allocated to previously prepared memory must explicitly call the destructor of the class created in memory in the destructor. So, the class destructor object :: ~ object must explicitly call the object :: data :: ~ data destructor, and the record :: data :: ~ data destructor will have to call a number of field :: ~ field destructors - one for each field. In order to clearly show how this should happen, I will describe the object class in more detail.

class object
{
public:
    object();
    virtual ~object();
    ...
protected:
    class data;
    char* get_buffer();
    object(data* derived_data);
    static const size_t max_data_size = N;
private:
    data* m_data;
    char m_buffer[max_data_size];
};
object::object()
    : m_data(new(m_buffer) data)
{
    static_assert(sizeof(data) <= max_data_size, "...");
}
object::~object()
{
    m_data->~data();
}

Since the destructor of the data class must be described as virtual, then the data de-initialization will be successful, no matter what the inheritor of object :: data is used.

You also need to redefine the constructor and copy operator, as well as the move, because unlike the case with copy_on_write, where we were satisfied with the auto-generated constructor, here each object looks at its data area with a simple pointer. Therefore, we will correct the default behavior:

object::object(object const& another)
    : m_buffer(max_data_size),
      m_data(another.clone_data_at(m_buffer))
{
}
object& object::operator = (object const& another)
{
    destruct_data(); // здесь нужно вызвать деструктор
    m_data = another.clone_data_at(m_buffer);
    return *this;
}
object::data* object::clone_data_at(void* address)
{
    return m_data->clone_at(address);
}
// Этот метод должен быть перегружен
// для каждого наследуемого типа данных
object::data* object::data::clone_at(void* address)
{
    return new(address) data(*this);
}
void object::destruct_data()
{
    m_data->~data();
}

Here, our new desctuct_data () method is requested in the destructor object :: ~ object. Once asked, it means that there is the place for him. For the constructor and move operator, the behavior is similar:

object::object(object&& another)
    : m_data(another.move_data_to(m_buffer))
{
}
object& object::operator = (object const& another)
{
    destruct_data(); // здесь нужно вызвать деструктор
    m_data = another.move_data_to(m_buffer);
    return *this;
}
object::data* object::move_data_to(void* address)
{
    return m_data->move_to(address);
}
// Этот метод должен быть перегружен
// для каждого наследуемого типа данных
object::data* object::data::move_to(void* address)
{
    return new(address) data(std::move(*this));
}
object::~object()
{
    destruct_data();
}

So, the danger of memory leaks is eliminated. Your API users can develop calmly.

Hosting new vs new on the heap


As you already noticed, classes using host new are much more difficult to implement. Every aspect of using a class implemented on the technique of placing an object in prepared memory should be thoroughly tested. The complexity of the usual new of any class, as a rule, comes down to wrapping a smart pointer. Then what is the benefit if even emulation of dynamic typing is complicated by explicitly specifying the maximum size of the data type?

The gain in speed. The power of C ++ compared to the more convenient C #, Java and Python is in the speed of execution. Here we reach the highest speeds, because we do not go in a heap for new objects. And do not slow down the application in the future, avoiding memory fragmentation. Fragmented memory is like cheese: it is full of holes, and in total the size of these holes allows you to cram an orange into it, but in fact, an orange will not fit into any of the holes, each of them is too small. So std :: vector, as well as std :: string, requiring a segment of continuous memory, can one day get std :: bad_alloc when redistributing elements.

Hosting new in the standard library


Remember, I promised to tell you about placing new in std :: vector at the beginning of the article? So, all element constructors in std :: vector are called in prepared memory. And destructors are also called actively for elements. This is not important for vectors from simple POD types like int or char, but if we want to highlight std :: vector, and custom has a non-trivial and heavy default constructor and no less heavy copy constructor, then we get a lot of trouble if we don’t Know how std :: vector works with its data.

So what happens when we ask the vector to resize? To begin with, the vector looks that it has not yet reserved the required number of bytes (the vector always allocates a buffer with a margin), after which it allocates a new buffer. All existing elements are transferred to the new buffer by the move constructor through the placing new at the corresponding address. As a result, all elements are in place. After that, the vector gets the required number of elements to the end of the array, creating each by placing a new and default constructor. Also in the opposite direction - reducing the number of elements will cause destructors "manually" when deleting elements.

Unlike std :: vector, the std :: string container does not deal with placement new simply because it always stores char, which does not need constructors or destructors. But a number of containers of the standard library: deque, list, map, and other class templates for storing arbitrary data - actively use the hosting new in their implementation.

No need to think of placing new as something akin to a hack, it is a full-fledged language feature that allows you to initialize an object with a constructor from the specified memory. This operation is similar to the old C language trick, when the allocated block of bytes was declared a pointer to a certain type (usually a structure), and further work with this memory block was carried out through an API of this type.

What is the result?


Of course, the ability to use the hosting new where necessary, and only when it is really needed, efficiently and justifiably, does not come immediately. Some of them are beaten off to the last by the harm of preliminary optimization, while others, on the contrary, only after reading the article will rush to embed new (m_buffer), where it is necessary and where it is not necessary. Over time, both come to a golden mean.

The essence of the method is simple - if it is possible and necessary to place the class object in a prepared memory, it is relatively simple to do this if you remember a couple of simple rules:

  • the memory should live all the time while the object lives in it, if the memory is rubbed, then the object will begin to refer to the broken memory segment;
  • the class destructor for the object allocated by the host new must be called manually, this is sad, but delete does not do anything with memory by pointer.

Everything else is limited only by the accuracy and boundless imagination of the developer. That is you.

image

First published in Hacker Magazine # 190.
Posted by Vladimir Qualab Kerimov, C ++ Lead Parallels Developer


Subscribe to Hacker