A little bit about why using STL is not optimal

Published on October 31, 2011

A little bit about why using STL is not optimal

This short article will talk about how you can easily and simply kill application performance using the STL library. It is not possible to cover the entire library within this topic, so only one component will be considered - the std :: string container. As an example, the initialization price of std :: string will be shown and, as a result, it will be shown what the non-optimal use of the library can lead to. All of the following is particularly acute in the field of gamedev.


So, as an example, let's take a primitive function to calculate the hash value of a string. I will give two implementations - one in c-style, the second - stl-style using std :: string. Link to the source code of the test case.

  1. uint32 hash_c(const char* str) {
  2.   uint32 h = 0;
  3.   const int len = strlen(str);
  4.   for (int i = 0; i < len; ++i)
  5.     h = 31*h + str[i];
  6.   return h;
  7. }
  8.  
  9. uint32 hash_stl(const std::string& str) {
  10.   uint32 h = 0;
  11.   for (std::string::const_iterator it = str.begin(); it != str.end(); ++it)
  12.     h = 31*h + + *it;
  13.   return h;
  14. }
* This source code was highlighted with Source Code Highlighter.


I ask you to pay attention to the fact that the function will actually pass strings (const char *), so in the second case the temporary object std :: string will be created. I repeat, the essence of the tests is to show the initialization price of the STL string.

Now we’ll measure the speed of the functions, for the reliability of the test results we will make cycles of calculations from 256 * 1024 passes. For each function, performance will be measured on different line sizes of 8, 16 and 32 bytes (this will be described in more detail below).

  1. #define ITERATION_COUNT 256*1024
  2.  
  3. // ...
  4. for (int i = 0; i < ITERATION_COUNT; ++i)
  5.     h1 = hash_c(str);
  6.  
  7. // ...
  8. for (int i = 0; i < ITERATION_COUNT; ++i)
  9.     h2 = hash_stl(str);
* This source code was highlighted with Source Code Highlighter.


Note: the example was compiled by the compiler from 2010 studio from the command line, the compiler keys are shown next.

  1. cl /EHsc /O2 main.cpp
  2.  
  3. *************************************
  4. string length: 8 bytes
  5. const char*: 6.20 msec, hash: 312017024
  6. std::string: 12.62 msec, hash: 312017024, 2.04x
  7.  
  8. total allocs: 0
  9.  
  10. *************************************
  11. string length: 16 bytes
  12. const char*: 11.78 msec, hash: 2657714432
  13. std::string: 131.21 msec, hash: 2657714432, 11.14x
  14.  
  15. total allocs: 262144
  16. *************************************
  17. string length: 32 bytes
  18. const char*: 23.20 msec, hash: 3820028416
  19. std::string: 144.64 msec, hash: 3820028416, 6.24x
  20.  
  21. total allocs: 262144
* This source code was highlighted with Source Code Highlighter.


Here the fun begins. The test results show that the smallest drawdown in performance is 2 times, the largest - 11 times. At the same time, for lines with a size of more than 16 bytes, additional allocations appear on the heap. On 32-byte lines, the memory allocation costs are slightly leveled against the background of calculations and the drawdown is slightly less - almost 2 times lower than on 16-byte lines. Where do allocations come from?

std :: string has its own internal buffer for storing a string, the size of which depends on the implementation of STL. For example, in the STL implementation of MS Visual Studio 2010, the size of this buffer is 16 bytes, which can be seen by looking at the library header files (variable _BUF_SIZE). When std :: string is initialized, a check is made for the size of the string, and if it is smaller than the size of the internal buffer, the string is stored in this buffer, otherwise, the memory of the required size is allocated to heap, and the string is already copied there. As you can see, each time std :: string is initialized, at least data is copied, as well as additional allocations in cases where the string size exceeds the size of the std :: string internal buffer. That is why we can observe a drop in performance in the release assembly up to 11 times in a compartment with allocations, which lead to memory fragmentation. The last moment is a serious problem in the world of consoles, where the amount of RAM is rigidly fixed and there is no virtual memory.

Now, perhaps, it is worth bringing the test results to the debug assembly.

  1. cl /EHsc /MDd /RTC1 /ZI main.cpp
  2.  
  3. *************************************
  4. string length: 8 bytes
  5. const char*: 24.74 msec, hash: 312017024
  6. std::string: 4260.18 msec, hash: 312017024, 172.23x
  7.  
  8. total allocs: 262144
  9.  
  10. *************************************
  11. string length: 16 bytes
  12. const char*: 34.87 msec, hash: 2657714432
  13. std::string: 7697.69 msec, hash: 2657714432, 220.76x
  14.  
  15. total allocs: 524288
  16.  
  17. *************************************
  18. string length: 32 bytes
  19. const char*: 58.38 msec, hash: 3820028416
  20. std::string: 14169.49 msec, hash: 3820028416, 242.70x
  21.  
  22. total allocs: 524288
* This source code was highlighted with Source Code Highlighter.


Fine! Drawdown performance epic 240 times! Which is to be expected. Of course, in critical situations, you can change the Debug Information Format c / ZI (edit & continue) to a faster / Zi, and also turn off various checks for stack failure and uninitialized variables (/ RTC1), and thereby achieve better performance, but in In this case, the whole meaning of the assembly debug will disappear.

Morality. STL, without a doubt, is a convenient and powerful tool, but you need to have an idea of ​​how it is built and, from this, carefully choose places where its use will not lead to sad consequences. And this applies not only to std :: string, but also to other components. Therefore, before using STL, you need to think twice about allocations to heap, memory fragmentation, and data locality.

Good luck in the optimization!