std :: shared_ptr and custom allocator

How many of us don't like refactoring? I think that repeatedly, each of us, when refactoring the old code, discovered something new or remembered something important, but well forgotten. More recently, having refreshed my knowledge of the work of std :: shared_ptr somewhat when using a custom allocator, I decided that you should not forget them anymore. All that I managed to refresh was collected in this article.


In one of the projects, performance optimization was required. Profiling pointed to a large number of calls to the new / delete statements and corresponding calls to malloc / free, which not only lead to expensive locks in a multi-threaded environment by themselves, but can also call such heavy functions as malloc_consolidate at the most unexpected moment. A large number of operations with dynamic memory was caused by intensive work with smart pointers std :: shared_ptr.


There were not many classes whose objects were created in this way. In addition, I did not really want to rewrite the application. Therefore, it was decided to explore the possibility of using the pattern - object pool. Those. keep using shared_ptr, but redo the memory allocation mechanism in such a way as to get rid of the intensive receipt / release of dynamic memory.


The replacement of the standard implementation of malloc with other options (tcmalloc, jemalloc) was not considered, because from experience, replacing a standard implementation did not fundamentally affect performance, but changes would affect the entire program with possible consequences.


Subsequently, the idea was transformed into the use of its own memory pool and the implementation of a special allocator. The advantage of using a memory pool in my case over an object pool is transparency for the calling code. When using the allocator, the objects will be allocated in the already allocated memory (the placement operator new will be used) with the corresponding constructor call, as well as cleared by explicit calls to the destructor. Those. additional actions that are characteristic of an object pool are not required to initialize an object (upon receipt from the pool) and to bring it to its initial state (before returning to the pool).


Next, I will consider what interesting features of working with memory when using shared_ptr I personally figured out and sorted out for myself. In order not to overload the text with details, the code will be simplified and will relate to the real project only in the most general terms. First of all, I will focus not on the implementation of the allocator, but on the principle of working with std :: shared_ptr when using a custom allocator.


The current mechanism for creating a pointer was to use std :: make_shared:


auto ptr = std::make_shared();

As you know, this way of creating a pointer eliminates some potential memory leak problems that occur if you create a pointer in a working-peasant way (although in some cases this option is justified. For example, if you need to pass deleter):


auto ptr = std::shared_ptr(new foo_struct);

The key idea is to work with std :: shared_ptr memory in the order in which the control block is created. And we know that this is a special structure that makes the pointer smart. And for her, one must honestly allocate memory accordingly.


The ability to fully control memory usage when working with std :: shared_ptr is provided to us through std :: allocate_shared. When calling std :: allocate_shared, you can pass your own allocator:


auto ptr = std::allocate_shared(allocator);

If you redefine the new and delete operators, you can see how the necessary amount of memory is allocated for the structure from the example:



struct foo_struct
{
    foo_struct()
    {
        std::cout << "foo_struct()" << std::endl;
    }
    ~foo_struct()
    {
        std::cout << "~foo_struct()" << std::endl;
    }
    uint64_t value1 = 1;
    uint64_t value2 = 2;
    uint64_t value3 = 3;
    uint64_t value4 = 4;
};

Take for example the simplest allocator:


template 
struct custom_allocator {
    typedef T value_type;
    custom_allocator() noexcept {}
    template  custom_allocator (const custom_allocator&) noexcept {}
    T* allocate (std::size_t n) {
        return reinterpret_cast( ::operator new(n*sizeof(T)));
    }
    void deallocate (T* p, std::size_t n) {
        ::operator delete(p);
    }
};

Look
---- Construct shared ----
operator new: size = 32 p = 0x1742030
foo_struct()
operator new: size = 24 p = 0x1742060
~foo_struct()
operator delete: p = 0x1742030
operator delete: p = 0x1742060
---- Construct shared ----

---- Make shared ----
operator new: size = 48 p = 0x1742080
foo_struct()
~foo_struct()
operator delete: p = 0x1742080
---- Make shared ----

---- Allocate shared ----
operator new: size = 48 p = 0x1742080
foo_struct()
~foo_struct()
operator delete: p = 0x1742080
---- Allocate shared ----

An important feature of using both std :: make_shared and a custom allocator when working with shared_ptr is, at first glance, an insignificant thing, the ability to allocate memory for both the object itself and the control unit in one call of the allocator. This is often written in books, but it is poorly stored in memory until you come across this in practice.


If this aspect is overlooked, then the behavior of the system when creating the pointer seems rather strange. We plan to use an allocator to allocate memory for a specific object that the pointer should point to, but in reality, a request for memory allocation requires more volume than the object should occupy. And the type of used allocator does not match our original one.


By adding a little debug output to the allocator, you can verify this
---- Allocate shared ----
Allocating: std::_Sp_counted_ptr_inplace, (__gnu_cxx::_Lock_policy)2>
operator new: size = 48 p = 0x1742080
foo_struct()
~foo_struct()
Deallocating: std::_Sp_counted_ptr_inplace, (__gnu_cxx::_Lock_policy)2>
operator delete: p = 0x1742080
---- Allocate shared ----

The memory is allocated not for an object of class foo_struct. More precisely, not only for foo_struct.


Everything falls into place when we recall the std :: shared_ptr control block. Now, if you add a little more debugging output to the copy constructor of the allocator, you can see the type of object being created.


See
---- Allocate shared ----
sizeof control_block_type: 48
sizeof foo_struct: 32
custom_allocator::custom_allocator(const custom_allocator&): 
    T: std::_Sp_counted_ptr_inplace, (__gnu_cxx::_Lock_policy)2>
    U: foo_struct
Allocating: std::_Sp_counted_ptr_inplace, (__gnu_cxx::_Lock_policy)2>
operator new: size = 48 p = 0x1742080
foo_struct()
~foo_struct()
custom_allocator::custom_allocator(const custom_allocator&): 
    T: std::_Sp_counted_ptr_inplace, (__gnu_cxx::_Lock_policy)2>
    U: foo_struct
Deallocating: std::_Sp_counted_ptr_inplace, (__gnu_cxx::_Lock_policy)2>
operator delete: p = 0x1742080
---- Allocate shared ----

In this case, the allocator rebind is triggered . Those. obtaining an allocator of one type from an allocator of another type. This "trick" is used not only in std :: shared_ptr, but also in other classes of the standard library such as std :: list or std :: map - where the actually stored object differs from the user one. At the same time, the necessary option is created from the source allocator to allocate the required amount of memory.


So, when using a custom allocator, memory is allocated both for the control unit and for the object itself. And all this in one call. This should be considered when creating an allocator. Especially if you are using memory previously allocated by blocks of fixed length. The problem here is to correctly determine the size of the memory block that will be really needed when the allocator is working.


Determining the size of a memory block

So far, I have not found anything better than using either a deliberately large value, or a completely non-portable method:


using control_block_type = std::_Sp_counted_ptr_inplace, (__gnu_cxx::_Lock_policy)2>;
constexpr static size_t block_size = sizeof(control_block_type);

By the way, depending on the version of the compiler, the size of the control block varies.


I would be grateful for a hint on how to solve this problem in a more elegant way.


In conclusion, I would like to repeat that an important result of using an alternative allocator was the ability to perform optimization without serious modification of the existing code and interface for working with objects. And of course, do not forget to periodically refresh the memory of various subtle aspects of your programming language!


Source code for the example on github .


Thanks for attention!


Also popular now: