tangro September 25, 2014 at 14:05

folly :: fbvector - improved std :: vector from Facebook

Original author: Helios Alonso Cabanillas

Transfer
Tutorial

Folly is an open C ++ library developed by Facebook and used by it in internal projects. In order to optimize the memory and processor resources, the library includes its own implementations of some standard containers and algorithms. One of them is folly :: fbvector - a replacement for the standard vector (std :: vector). The implementation from Facebook is fully compatible with the original std :: vector interface, changes are always non-negative, almost always measurable, often significantly, and sometimes even tremendously affect performance and / or memory consumption. Just include the header file folly / FBVector.h and replace std :: vector with folly :: fbvector to use it in your code.

Example

folly::fbvector numbers({0, 1, 2, 3});
numbers.reserve(10);
for (int i = 4; i < 10; i++) {
  numbers.push_back(i * 2);
}
assert(numbers[6] == 12);

Motivation

std :: vector is a well-established abstraction, which many use for dynamically allocated arrays in C ++. It is also the most famous and most used container. The big surprise is that its standard implementation leaves a lot of opportunities to improve the efficiency of using the vector. This document explains how the implementation of folly :: fbvector improves some aspects of std :: vector. You can use the tests from folly / test / FBVectorTest.cpp to compare the performance of std :: vector and folly :: fbvector.

Memory management

It is known that std :: vector grows exponentially (with a constant growth coefficient). An important nuance in the correct choice of this constant. Too small a number will lead to frequent sales, too large a number will eat up more memory than is really necessary. Stepanov’s initial implementation used constant 2 (that is, every time a push_back call required a vector extension, its capacity doubled).

Over time, some compilers reduced this constant to 1.5, but gcc persistently uses 2. In fact, it can be mathematically proved that the constant 2 is strictly the worst of all possible values, because it never allows you to reuse the memory previously allocated by the vector.

To understand why this is so, imagine a vector of size n bytes, already completely filled, into which they are trying to add another element. This leads, according to the implementation of std :: vector, to the following steps:

Allocated memory size of 2 * n bytes. Most likely, it will be allocated in the address space FOR the current vector (it may be directly behind, maybe after some period).
The old vector is copied to the beginning of the new.
A new element is added to the new vector.
The old vector is deleted.

When it comes to increasing this new vector, it will be increased to 4 * n, then to 8 * n, etc. With each new allocation of memory, it will be required more than for all previous vectors combined, since at each step k we need (2 ^ k) * n bytes of memory, and this is always more than the sum (2 ^ (k - 1)) * n + (2 ^ (k - 2)) * n ... + ((2 ^ 0)) * n

i.e. the memory manager can never reuse the memory previously allocated for the vector and will always be forced to allocate “new” memory. It is unlikely that this is really what we want. So, any number less than 2 guarantees that at some step of increasing the capacity of the vector we can not allocate memory in the new address space, but reuse the memory already allocated to this vector.

The graph shows that if the capacity increase constant is 1.5 (blue line), the memory can be reused after the 4th step of increasing the vector, at 1.45 (red line) after the third, and at 1.3 (black line) after the second.

Of course, the graph above makes a few simplifications about how memory allocators work in the real world, but the fact that the selected gcc constant 2 is theoretically the worst case does not change. fbvector uses the constant 1.5. And it does not hit performance on small vector sizes, due to the way fbvector interacts with jemalloc.

Interaction with jemalloc

Almost all modern memory allocators allocate memory with certain quanta, chosen so as to minimize the overhead of managing memory and at the same time give good performance on small sizes of allocated blocks. For example, an allocator can choose blocks like this: 32, 64, 128, ... 4096, then in proportion to the page size up to 1 MB, then increments of 512 KB and so on.

As discussed above, std :: vector also requires quantization of memory. The size of the next quantum is determined based on the current capacity of the vector and the growth constant. Thus, at almost every moment in time, we have a certain amount of memory that is not currently in use at the end of the vector, and immediately after it there is a certain amount of unused memory at the end of the memory block allocated by the allocator. The conclusion itself suggests itself that the union of a vector allocator and a shared memory allocator can optimize the consumption of RAM - because a vector can ask the allocator for an “ideal” size block, eliminating unused memory in the block allocated by the allocator. Well, in general, the general strategy of allocating memory by vector and allocator can be sharpened better if you know the exact implementation of both.

Some memory allocators do not support in-place allocation (although most modern ones do support it). This is the result of the not too successful design of the realloc () function, which decides whether to reuse the previously allocated memory or allocate a new one, copy the data there and free the old one. This lack of control over memory allocation affects both the new operator in C ++ and the behavior of std: allocator. This is a serious blow to performance, because in-place deployment, being very cheap, leads to a much less aggressive strategy for increasing memory consumption.

Object allocation

One of the important issues of placing C ++ objects in memory is that by default they are considered non-moving. An object of type is considered to be relocatable, which can be copied to another memory location by bitwise copying and at the same time in a new place the object will completely retain its validity and integrity. For example, int32 is movable, because 4 bytes fully describe the state of a 32-bit signed number, if you copy these 4 bytes to another memory location, this memory block can be completely unambiguously interpreted as the same number.

The C ++ assumption that objects are non-movable harms everyone and does so only for the sake of a few controversial architectural issues. Moving an object involves allocating memory for a new object, calling its copy constructor, destroying the old object. This is not very pleasant and generally against common sense. Imagine a theoretical conversation between Captain Picard and the Skeptical Alien:

Skeptical Alien : So, this is your teleport - how does it work?
Picard : He takes a man, dematerializes him in one place and materializes in another.
Skeptical Alien : Hmm ... is it safe?
Picard: Yes, the truth in the first models was minor flaws. At first, a person was simply cloned, created in a new place. After that, the teleport operator had to manually shoot the original. Ask O'Brien, he worked as an intern just in this position. The implementation was not very elegant.

( Translator's note : apparently, this was written before the introduction of move constructors in the latest C ++ standards).

In fact, only part of the objects are truly immovable:

Those that have pointers to external objects
Those pointed to by pointers from other objects

Objects of the first type can always be redone with minimal cost. Objects of the second type should not be created by the new operator and deleted by the delete operator - they should be controlled by smart pointers and there will be no problems.

Movable objects are very interesting for std :: vector, since knowledge of how to move objects inside a vector significantly affects the efficiency of the vector deployment of these objects. Instead of the Picardov movement cycle described above, objects can be moved with simple memcpy or memmove. For example, working with types like vector or vector> can be much faster.

In order to support the safe movement of objects, fbvector uses the trait folly :: IsRelocatable defined in "folly / Traits.h". By default, folly :: IsRelocatable :: value conservatively returns false. If you know for sure that your Widget type is relocatable, you can add the following immediately after declaring this type:

namespace folly {
  struct IsRelocatable : boost::true_type {};
}

If you do not, fbvector will not compile with BOOST_STATIC_ASSERT.

Additional restrictions

Some improvements are also possible for "simple" types (more precisely, for those that have a trivial assignment operation) or for types that have a constructor that does not throw exceptions (nothrow default constructor). Fortunately, these traits are already present in the C ++ standard (well, or in Boost). Summing up, to work with fbvector, the Widget type must satisfy the condition:

BOOST_STATIC_ASSERT( IsRelocatable::value && (boost::has_trivial_assign::value || boost::has_nothrow_constructor::value));

These types are actually very close - it’s quite difficult to construct a class that will satisfy one part of the condition and violate the other (unless you really try very hard). fbvector uses these simple conditions to minimize copy operations for basic vector operations (push_back, insert, resize).

To simplify type checking for compatibility with fbvector in Traits.h there is a macro FOLLY_ASSUME_FBVECTOR_COMPATIBLE.

Miscellaneous

The fbvector implementation is focused on memory efficiency. In the future, the implementation of copying may be improved because memcpy is not an intrinsic function in gcc and does not work perfectly for large data blocks. Improvements in interaction with jemalloc are also possible.

Good luck using fbvector.

Tags: