Improving the vector container for working with large amounts of data
In the process of reading the article “Lite”, the implementation of the vector container remembered my experience in dealing with the vector. Actually, I would like to share this experience.
Yes, with an array dimension of 10,000 elements, an extra pointer in the element implementation will bring tangible inconvenience, but the real memory problem when using vector fully manifests itself in a slightly different way.
Actually, the problem consists in allocating memory for new elements.
In the recent past, I worked on a project in which I had to process large amounts of data (about 20-30 thousand elements per array, ~ 24 bytes of static per element). It was very convenient to use for this vector. Immediately make a reservation that in the project, performance was not a bottleneck.
Usually no one thinks what happens when push_back () is called. We also did not think about this until a memory limit was introduced for our application. And here the rake struck the first blow. Namely, I had to pay attention to the fact that the memory usage schedule resembles a cardiogram. Those. at some seemingly random, time instants, very significant peaks appear on it for a very insignificant time, and these peaks go beyond the limit. In addition, it was finally noticed that the application has a problem with spontaneous freezes for a short time. The time was insignificant, but it was still unclear why calls to the same method took 1-2 milliseconds or 1-2 seconds.
The trial led us to push_back () itself.
In the course of the proceedings, I learned (from a more savvy colleague) that the memory in the vector is allocated logarithmically, namely:
- if there is no reserved memory when adding an element to it, then size * K memory is reserved, where size is the current vector size and K is the coefficient , depending on the implementation of the vector and usually 2 (for more details, see Alena Sagalaeva, for example ).
Thus, we obtain the following:
Let's say we already have 1024 elements in the vector. When push_back () is called, the size of the vector will be increased to 2048 (for K = 2, for simplicity we will assume that this is always the case), but in reality only 1025 elements will be stored there. Of course, the excess reserved memory can be freed up, for example, using swap trick, but it seems more correct not to allocate the excess, i.e. use reserve.
No sooner said than done. They added, rebuilt and began to test and returned to the beginning as the cardiogram stubbornly continued to have a place to be, as well as “hangs”.
In the end, after long deliberation about the meaning of being, the principles of the vector and the memory allocation system, justice triumphed.
Let's go back to our vector, which already has 1024 elements.
For clarity, suppose that the size of an element is 16 bytes.
Then the amount of memory occupied by such a vector (excluding the overhead of the vector itself and the dynamic part of the elements) will be 16 kilobytes.
Since the vector is always located in continuous memory, when you try to add the unfortunate 1025th element to it, the following occurs:
So the problem is that at some point in time, the memory capacity is equal to size + size * K, i.e. and the old block and the new one - 48 kilobytes in our case.
On small lists this does not matter, but imagine a similar operation with a list of several tens of thousands of elements. Using reserve does not save the situation since the picture is almost the same - the old block + new, slightly larger than the old one, copying, deleting - 32 kilobytes.
It also turned out that this was also the reason for the “freezes” - the allocation of large amounts of memory with subsequent copying of the data requires remarkable efforts.
You can talk for a long time on the topic of optimizing the elements themselves and various spherical horses in a vacuum, but in our case, the solution was to implement MultyVector (as we called it) - a class of an original vector repeating the necessary interface that stores data in a vector of vectors that has internal logic for transforming indices and dynamic expansion / cleaning.
I think that it will be superfluous to describe all the insides of this simple wrapper and give some code.
I’ll just say that in our case the normal size of the internal vector was 1000 elements, and removing / adding elements to an arbitrary point in the array was used so rarely that it could be neglected and not be afraid that one of the internal vectors would again grow to incredible sizes.
Thus, it was possible to replace work with a large block of memory with work with a set of smaller ones (several vectors of 1000 elements each).
The approach does not claim to be ideal, but it coped with the task 100%.
UPD Actually, I do not quite understand the reason for the hollywood in comments, therefore:
1. The specifics of the project did not allow the use of anything third-party. All the essentials were provided by the framework. Among the suitable necessities was only a vector and a leaf. The vector was more convenient since we only learned about working with such volumes of data by the end of the project. For the same reason, at first they did not think about memory.
2. As for the spherical horses in a vacuum and idiots in the comments, I already wrote above.
Yes, with an array dimension of 10,000 elements, an extra pointer in the element implementation will bring tangible inconvenience, but the real memory problem when using vector fully manifests itself in a slightly different way.
Actually, the problem consists in allocating memory for new elements.
In the recent past, I worked on a project in which I had to process large amounts of data (about 20-30 thousand elements per array, ~ 24 bytes of static per element). It was very convenient to use for this vector. Immediately make a reservation that in the project, performance was not a bottleneck.
Usually no one thinks what happens when push_back () is called. We also did not think about this until a memory limit was introduced for our application. And here the rake struck the first blow. Namely, I had to pay attention to the fact that the memory usage schedule resembles a cardiogram. Those. at some seemingly random, time instants, very significant peaks appear on it for a very insignificant time, and these peaks go beyond the limit. In addition, it was finally noticed that the application has a problem with spontaneous freezes for a short time. The time was insignificant, but it was still unclear why calls to the same method took 1-2 milliseconds or 1-2 seconds.
The trial led us to push_back () itself.
In the course of the proceedings, I learned (from a more savvy colleague) that the memory in the vector is allocated logarithmically, namely:
- if there is no reserved memory when adding an element to it, then size * K memory is reserved, where size is the current vector size and K is the coefficient , depending on the implementation of the vector and usually 2 (for more details, see Alena Sagalaeva, for example ).
Thus, we obtain the following:
Let's say we already have 1024 elements in the vector. When push_back () is called, the size of the vector will be increased to 2048 (for K = 2, for simplicity we will assume that this is always the case), but in reality only 1025 elements will be stored there. Of course, the excess reserved memory can be freed up, for example, using swap trick, but it seems more correct not to allocate the excess, i.e. use reserve.
No sooner said than done. They added, rebuilt and began to test and returned to the beginning as the cardiogram stubbornly continued to have a place to be, as well as “hangs”.
In the end, after long deliberation about the meaning of being, the principles of the vector and the memory allocation system, justice triumphed.
Let's go back to our vector, which already has 1024 elements.
For clarity, suppose that the size of an element is 16 bytes.
Then the amount of memory occupied by such a vector (excluding the overhead of the vector itself and the dynamic part of the elements) will be 16 kilobytes.
Since the vector is always located in continuous memory, when you try to add the unfortunate 1025th element to it, the following occurs:
- a new memory block of size * K is allocated, i.e. 32 kilobytes
- the contents of the vector are copied to this new block
- old block is destroyed
- our 1025th element is physically added to the vector
So the problem is that at some point in time, the memory capacity is equal to size + size * K, i.e. and the old block and the new one - 48 kilobytes in our case.
On small lists this does not matter, but imagine a similar operation with a list of several tens of thousands of elements. Using reserve does not save the situation since the picture is almost the same - the old block + new, slightly larger than the old one, copying, deleting - 32 kilobytes.
It also turned out that this was also the reason for the “freezes” - the allocation of large amounts of memory with subsequent copying of the data requires remarkable efforts.
You can talk for a long time on the topic of optimizing the elements themselves and various spherical horses in a vacuum, but in our case, the solution was to implement MultyVector (as we called it) - a class of an original vector repeating the necessary interface that stores data in a vector of vectors that has internal logic for transforming indices and dynamic expansion / cleaning.
I think that it will be superfluous to describe all the insides of this simple wrapper and give some code.
I’ll just say that in our case the normal size of the internal vector was 1000 elements, and removing / adding elements to an arbitrary point in the array was used so rarely that it could be neglected and not be afraid that one of the internal vectors would again grow to incredible sizes.
Thus, it was possible to replace work with a large block of memory with work with a set of smaller ones (several vectors of 1000 elements each).
The approach does not claim to be ideal, but it coped with the task 100%.
UPD Actually, I do not quite understand the reason for the hollywood in comments, therefore:
1. The specifics of the project did not allow the use of anything third-party. All the essentials were provided by the framework. Among the suitable necessities was only a vector and a leaf. The vector was more convenient since we only learned about working with such volumes of data by the end of the project. For the same reason, at first they did not think about memory.
2. As for the spherical horses in a vacuum and idiots in the comments, I already wrote above.