Tip 23. Consider replacing associative containers with sorted vectors.

    “Even if the guaranteed logarithmic search time suits you, standard associative containers are not always the best choice. Oddly enough, standard associative containers for performance are often inferior to the banal container vector ”- C. Meyers“ Effective use of STL ”.
    Many may be interested in the practical side of this advice, how really a sorted vector can actually be faster than associative containers. I was also interested in this issue and I decided to conduct a small test and draw a couple of graphs so that everything fell into place.

    A bit of theory, for starters (those who know what vector and map do not need to read). Map in STL is an associative container for storing key-value pairs. The data in it is stored in sorted form, although the C ++ standard does not specify the implementation, most often the map is implemented as a red-black tree, in which each element stores pointers to two children. Vector in STL is just a container of elements that ensures that the data stored in it is not fragmented. An effective search in it can be organized by sorting and further binary search. As in the first case, the complexity of the search is O (ln (n)). The difference in speed will vary due to better caching in the second case (data is located locally, and less memory is needed for storage).

    To compare the speed of insertion and search was writtena test program that performs measurements of two types - recording random values ​​(random (3)) in containers, while measuring time (using gettimeofday), and searching for random values ​​in the container for a certain time (10 s). Thus, the time taken to insert data into the container and the number of searches in the container over the time interval are measured.

    Toolkit:
    Hardware: Dual-processor (Intel Xeon E5420) server without swap file.
    Software: Ubuntu server with kernel 2.6.28, gcc 4.3.3.
    Compiling a project with keys: -O3 -felide-constructors -fno-exceptions -fno-rtti -funroll-loops -ffast-math -fomit-frame-pointer -march = core2
    Each test was run several times (N = 10), while the minimum, maximum and average values ​​of a certain parameter were recorded. During the tests, the difference in the obtained values ​​was minimized (reducing the load on the server, installing cpu affinity).

    Data received when inserting items into containers



    The first graph shows the dependence of the insertion and sorting time for containers depending on the number of elements. Both scales are logarithmic, so for clarity, another curve was constructed that shows the sum of the insertion and sorting times for the vector — it is shown with a dashed line. Horizontal black lines show the error for each experiment - the maximum and minimum values. As you can see, for 1 million records, the time for inserting and sorting a vector is approximately 10 times less (0.11 s) than for map (1.1 s).

    Data obtained when searching for items in containers



    The second graph shows the dependence of the number of searches per unit time (10 seconds) on the number of elements in the container. You can see that the search speed in the vector is greater, as expected above, with 1 million entries the number of searches in the vector is approximately 3.9 times more than in the map.

    It can be seen from these tests that the performance of vector can be significantly (3.9 times) higher than map, while a single filling of vector and its sorting is also faster (10 times) than in map. Thus, sorted vectors should be used in situations of rare data changes and frequent searches, for example, for data loaded during program startup and rarely changing in the future. And as always, one must not forget that premature optimization is evil.

    Also popular now: