(Static) Selection of optimal containers in C ++ programs
Hello. Today I would like to talk again about static analysis. And again about C ++. Only, unlike PVS-Studio, we will not look for any errors in our programs (although they look not only for errors), but for places that are not written optimally enough. And one of these places is choosing a container for the data in the program. If I am of interest to you, then welcome to cat!
At CoreHard 2018 Autumn (a very good conference, come) I talked about how C ++ compilers do not optimize well at the moment. And one of my complaints was that compilers cannot optimize the use of containers in our programs. Let's look at some code examples.
It would seem that in such a simple case, the compiler should be able to optimize this function and simply throw out a variable declaration of the std :: vector type, since starting with C ++ 14 the compiler is allowed to remove dynamic memory allocations, but the compiler does not. The reason for this is that at the moment only one C ++ compiler implements optimization for removing dynamic allocations - Clang. All other compilers just so far do not know how to do this. But even Clang can do this in a limited number of cases.
In this case, we could replace std :: vector with std :: array, provided that the size of the selected vector is not too large, since we may not have enough stack for such a replacement. Such a replacement will remove a rather expensive memory allocation to the heap, and the plus is that when using std :: array, the compiler can already throw std :: array from the function altogether!
If we are talking about performance optimization, we propose to consider the following example:
In this case, we see the use of an extremely ineffective operation in the case of std :: vector - insertion at the beginning of the container. All C ++ programmers know that this is extremely bad to do, since it causes all elements to shift every time, which leads to large costs for copying / moving. It would be much nicer in this case to replace it with std :: list, which doesn’t care where the insertion takes place, or std :: deque (although it is in this case that you can clearly see that you don’t just need to use insert. But this is just an example, not more :)
Let's look at another code example:
In this case, we can notice that we can painlessly replace std :: list (yes, I know that few people use it) with std :: forward_list. In this case, in this case, we will absolutely not lose anything, but we will get memory savings. Naturally, the compiler does not do such optimization now.
A similar trick can be done in the following example:
Here we can see that what we really need is not std :: deque, but std :: stack. This cannot be called optimization, since std :: stack is an adapter, and by default it uses std :: deque inside (unless the user specifies otherwise). Here we can talk more about semantic optimization, i.e. simplifying the code to understand. From my point of view, this is also important. If you ask, “Maybe such a replacement also gives a performance gain?”, I will answer “Maybe. See implementation details in your version of the standard library. ”
Well, I think there are enough examples. Each of you can also come up with a lot of them.
To implement the static analyzer, I used Clang Static Analzyer (CSA) and Clang Tidy, which are part of the LLVM package. I chose these tools, as I consider them the most promising among open tools for static analysis. In addition, Clang provides one of the highest-quality C ++ parsers that other static analyzers cannot boast of (unless of course they use libclang).
Both CSA and Clang Tidy are static analyzers, both of which are part of LLVM. What is the difference? The difference is that Clang Tidy is designed to write simple checks, which basically consist in finding some kind of pattern on the abstract syntax tree, displaying some kind of warning, and possibly replacing it automatically with another. You can learn more about Clang Tidy here .
CSA is designed to write more "serious" and resource-intensive (both from the point of view of implementation, and from the point of view of execution time / memory spent) checks. There, for example, a symbolic execution mechanism is available.
I decided to implement the check in CSA, since it does not seem commonplace to me, moreover, in the future it will become harder and harder. And it was decided to run through Clang Tidy, as this static analyzer has many integrations with various IDEs.
To begin with, it’s worth introducing a couple of fairly strong restrictions, which are mainly related to the fact that this is only a prototype so far:
Now the prototype implements the following simple algorithm:
The classification of container operations at the moment is as follows (will be expanded in the future):
The classification at the moment is incomplete and even on this list does not work correctly. For example, the insert operation, even if it is performed at the beginning, the analyzer classifies as inserting into the middle, although in fact it is not at all.
In any static analysis, false positives are the main headache. If there are too many of them, then useful messages are lost in the garbage. Therefore, in this case, you have to act very conservatively and issue warnings only in cases where we are really confident in our diagnostics and can quite say that something is really wrong in some place in the code.
If we are talking about compiler optimization, then it’s still sadder there - proper optimization cannot change the behavior of the program according to the C ++ Standard (otherwise, such an optimizer is worthless). And optimization should not introduce pessimization either :) So here you have to be much more careful in your decisions.
In this analyzer, this struggle resulted in the fact that if the analyzer sees that some unsupported operation is currently being performed, then the analysis for this container is turned off.
There are several problems with this method.
The first problem is that for the analyzer at the moment all branches of the code are equally probable. More precisely, he does not even know about such a thing as different branches of code execution.
This translates into problems with analysis for something like this code:
Most likely in our application code these branches do not have an equal probability of execution, as in the real world is usually the pointer usually points to something normal, not to nullptr. In the same LLVM there are static heuristics on this score. For example, it takes into account the above case with comparing pointers with nullptr, and comparing with each other the equality of values of two variables with a floating point, and some other interesting cases. But this is more and more like crutches, and from my point of view, the real solution to this problem is to add dynamic analysis or instrumentation.
The second problem is the lack of support for custom containers. We live in the world of C ++, they like to ride here (let us leave the discussion of the reasons for this not always bad phenomenon outside the scope of this article) everything, including our containers. Examples include the same LLVM, LibreOffice, and many others. In this regard, the question arises - how to analyze containers not from the STL library? After all, I would like to include analysis for as many containers as possible.
There are different ways to solve the problem.
The first is that the user annotate their containers in some way (a special kind of comment, C ++ attributes, something else). The problem with this method is that we need to understand how to annotate in general, what information we need for a qualitative analysis. Another problem may be the code modification of the containers themselves, which is not always possible.
The second method offers the user a mechanism for writing their own rules. At the moment, the rules in the analyzer are sewn into the source code of the analyzer itself, and if the user wants to add their own rules, then he will need to download the source code of the analyzer, assemble it, figure out how to write checks, write, rebuild, and so on. You can provide the user with a way to set his checks on some DSL, where the user writes only checks for his containers, and the analyzer is engaged in the whole routine. I consider this method as more promising than the previous one.
Also, automatic container replacement is not supported, since this functionality is not in the CSA (but it is in Clang Tidy). But in difficult cases, performing AutoCorrect is not always a trivial task, and the analyzer works more likely in semi-manual mode.
I see several applications for this type of analysis:
It is also worth noting that this method is applicable of course not only for C ++ programs. I would really like to see this kind of static analysis / optimization in the compiler and for other programming languages. For example, the SAP static analyzer for ABAP already knows how to carry out static optimality analysis at a basic level, which is good news. If you know similar projects for other programming languages - write in the comments and I will add to the article!
For the C ++ world, I have not found such analyzers anywhere. For the ABAP world, I mentioned the analyzer above, which can find inefficient operations for some part of the standard containers, but as far as I know, a very simple static analysis is implemented there.
A much more interesting work is Chameleon - a dynamic analyzer for Java, which is very cleverly done. They tweaked the JVM a little, and during operation they collect various statistics on the use of containers, and depending on the current load profile, they select certain containers and replace them automatically during operation. Unfortunately, the sources are closed and there is no chance to get them (I tried).
I also recommend looking at various works (there are many) on SETL. In them, the authors also often raised questions about the automatic selection of the container.
Instead of a conclusion, I would like to focus on the fact that it is only a prototype so far and it has too many “holes” in implementation. In this article, I just want to share with you the idea of such an analysis and its popularization. Well, maybe someone will be interested in this topic and there will be a desire to connect to the project - I will only be happy! In addition, you can always collect this analyzer in your own place to try it on your test examples.
If you have something to supplement the material, have encountered a similar problem, or simply have some information that may be useful on this topic - please do not hesitate to share this information in the comments.
Thanks for attention!
Problem
At CoreHard 2018 Autumn (a very good conference, come) I talked about how C ++ compilers do not optimize well at the moment. And one of my complaints was that compilers cannot optimize the use of containers in our programs. Let's look at some code examples.
void foo() {
std::vector v(42);
}
It would seem that in such a simple case, the compiler should be able to optimize this function and simply throw out a variable declaration of the std :: vector type, since starting with C ++ 14 the compiler is allowed to remove dynamic memory allocations, but the compiler does not. The reason for this is that at the moment only one C ++ compiler implements optimization for removing dynamic allocations - Clang. All other compilers just so far do not know how to do this. But even Clang can do this in a limited number of cases.
In this case, we could replace std :: vector with std :: array, provided that the size of the selected vector is not too large, since we may not have enough stack for such a replacement. Such a replacement will remove a rather expensive memory allocation to the heap, and the plus is that when using std :: array, the compiler can already throw std :: array from the function altogether!
If we are talking about performance optimization, we propose to consider the following example:
void foo() {
std::vector v;
for(int i = 0; i < 42; ++i) {
v.insert(v.begin(), 42);
}
for(const auto val : v) {
std::cout << val << ' ';
}
}
In this case, we see the use of an extremely ineffective operation in the case of std :: vector - insertion at the beginning of the container. All C ++ programmers know that this is extremely bad to do, since it causes all elements to shift every time, which leads to large costs for copying / moving. It would be much nicer in this case to replace it with std :: list, which doesn’t care where the insertion takes place, or std :: deque (although it is in this case that you can clearly see that you don’t just need to use insert. But this is just an example, not more :)
Let's look at another code example:
void foo() {
std::list v;
for(int i = 0; i < 3; ++i) {
v.push_front(i);
}
for(const auto val : v) {
std::cout << val << ' ';
}
}
In this case, we can notice that we can painlessly replace std :: list (yes, I know that few people use it) with std :: forward_list. In this case, in this case, we will absolutely not lose anything, but we will get memory savings. Naturally, the compiler does not do such optimization now.
A similar trick can be done in the following example:
void foo() {
std::deque v;
for(int i = 0; i < 3; ++i) {
v.push_back(i);
}
for(int i = 0; i < 3; ++i) {
std::cout << v.back() << ' ';
v.pop_back();
}
}
Here we can see that what we really need is not std :: deque, but std :: stack. This cannot be called optimization, since std :: stack is an adapter, and by default it uses std :: deque inside (unless the user specifies otherwise). Here we can talk more about semantic optimization, i.e. simplifying the code to understand. From my point of view, this is also important. If you ask, “Maybe such a replacement also gives a performance gain?”, I will answer “Maybe. See implementation details in your version of the standard library. ”
Well, I think there are enough examples. Each of you can also come up with a lot of them.
Used means
To implement the static analyzer, I used Clang Static Analzyer (CSA) and Clang Tidy, which are part of the LLVM package. I chose these tools, as I consider them the most promising among open tools for static analysis. In addition, Clang provides one of the highest-quality C ++ parsers that other static analyzers cannot boast of (unless of course they use libclang).
Both CSA and Clang Tidy are static analyzers, both of which are part of LLVM. What is the difference? The difference is that Clang Tidy is designed to write simple checks, which basically consist in finding some kind of pattern on the abstract syntax tree, displaying some kind of warning, and possibly replacing it automatically with another. You can learn more about Clang Tidy here .
CSA is designed to write more "serious" and resource-intensive (both from the point of view of implementation, and from the point of view of execution time / memory spent) checks. There, for example, a symbolic execution mechanism is available.
I decided to implement the check in CSA, since it does not seem commonplace to me, moreover, in the future it will become harder and harder. And it was decided to run through Clang Tidy, as this static analyzer has many integrations with various IDEs.
How we will (try) to solve problems
To begin with, it’s worth introducing a couple of fairly strong restrictions, which are mainly related to the fact that this is only a prototype so far:
- Analysis only at the level of functions; This limitation means that there will be no analysis between functions, as well as between translation units. The restriction on the analysis between functions was imposed to simplify the implementation of this analysis and in the future can be relatively easily fixed by running a static analysis for the entire translation unit, and not just for each function. The restriction on the analysis between translation units is imposed by the existing restrictions in the CSA, which will be fixed soon (commits are already pouring into the upstream);
- Support for only a limited number of containers. This is relatively easily fixed in the future by adding new rules for new containers.
- Use for analysis only a tree of abstract syntax. Since for prototyping this is the simplest type of analysis. For more accurate results, of course, you can try to use at least symbolic execution, but this method has its drawbacks. You can read more about methods here .
Now the prototype implements the following simple algorithm:
- First, on the abstract syntax tree, we find the vertices that are responsible for declaring the container type variables that we support.
- Then we find the operations that relate to these containers, classify them and save this information in a temporary cache.
- After reaching the end of the function, we analyze the collected statistics and, based on predefined rules, issue a recommendation on the use of a container.
The classification of container operations at the moment is as follows (will be expanded in the future):
- Add an item to the top of the container.
- Adding an item to the middle of the container.
- Adding an item to the end of the container.
- Removing an item from the beginning of the container.
- Removing an item from the middle of the container.
- Removing an item from the end of the container.
The classification at the moment is incomplete and even on this list does not work correctly. For example, the insert operation, even if it is performed at the beginning, the analyzer classifies as inserting into the middle, although in fact it is not at all.
Fighting false positives
In any static analysis, false positives are the main headache. If there are too many of them, then useful messages are lost in the garbage. Therefore, in this case, you have to act very conservatively and issue warnings only in cases where we are really confident in our diagnostics and can quite say that something is really wrong in some place in the code.
If we are talking about compiler optimization, then it’s still sadder there - proper optimization cannot change the behavior of the program according to the C ++ Standard (otherwise, such an optimizer is worthless). And optimization should not introduce pessimization either :) So here you have to be much more careful in your decisions.
In this analyzer, this struggle resulted in the fact that if the analyzer sees that some unsupported operation is currently being performed, then the analysis for this container is turned off.
Disadvantages and possible solutions
There are several problems with this method.
The first problem is that for the analyzer at the moment all branches of the code are equally probable. More precisely, he does not even know about such a thing as different branches of code execution.
This translates into problems with analysis for something like this code:
void foo(int* ptr, std::vector& v) {
if(ptr == nullptr) {
v.insert(v.begin(), 42);
} else {
v.push_back(84);
}
}
Most likely in our application code these branches do not have an equal probability of execution, as in the real world is usually the pointer usually points to something normal, not to nullptr. In the same LLVM there are static heuristics on this score. For example, it takes into account the above case with comparing pointers with nullptr, and comparing with each other the equality of values of two variables with a floating point, and some other interesting cases. But this is more and more like crutches, and from my point of view, the real solution to this problem is to add dynamic analysis or instrumentation.
The second problem is the lack of support for custom containers. We live in the world of C ++, they like to ride here (let us leave the discussion of the reasons for this not always bad phenomenon outside the scope of this article) everything, including our containers. Examples include the same LLVM, LibreOffice, and many others. In this regard, the question arises - how to analyze containers not from the STL library? After all, I would like to include analysis for as many containers as possible.
There are different ways to solve the problem.
The first is that the user annotate their containers in some way (a special kind of comment, C ++ attributes, something else). The problem with this method is that we need to understand how to annotate in general, what information we need for a qualitative analysis. Another problem may be the code modification of the containers themselves, which is not always possible.
The second method offers the user a mechanism for writing their own rules. At the moment, the rules in the analyzer are sewn into the source code of the analyzer itself, and if the user wants to add their own rules, then he will need to download the source code of the analyzer, assemble it, figure out how to write checks, write, rebuild, and so on. You can provide the user with a way to set his checks on some DSL, where the user writes only checks for his containers, and the analyzer is engaged in the whole routine. I consider this method as more promising than the previous one.
Also, automatic container replacement is not supported, since this functionality is not in the CSA (but it is in Clang Tidy). But in difficult cases, performing AutoCorrect is not always a trivial task, and the analyzer works more likely in semi-manual mode.
Possible applications
I see several applications for this type of analysis:
- Like a static analyzer. Everything is simple here - another test of static analysis, which you run as your heart desires (with your hands, in the IDE automatically during development, on CI, etc.), where you will probably be given a hint that somewhere you could pick up a container and better.
- Like optimization in the compiler. In some cases, we can guarantee that replacing the container will definitely not negatively affect performance. For example, replacing std :: vector for small sizes known at compile time with std :: array or replacing std :: list with std :: forward_list when we don’t need biconnectedness and we don’t take the size from the list. The compiler could replace containers with more optimal ones without our knowledge, as it already does for a very large number of things.
- Like a dynamic analyzer. This is the direction that seems to me the most promising for this type of analysis. Indeed, with the help of knowledge about the program execution profile, we, for example, can obtain such important information for us as the probabilities of each code branch. And this is necessary for a more accurate assessment. And with such an analysis, you can already think in the direction of integration with PGO ...
It is also worth noting that this method is applicable of course not only for C ++ programs. I would really like to see this kind of static analysis / optimization in the compiler and for other programming languages. For example, the SAP static analyzer for ABAP already knows how to carry out static optimality analysis at a basic level, which is good news. If you know similar projects for other programming languages - write in the comments and I will add to the article!
Work in similar directions
For the C ++ world, I have not found such analyzers anywhere. For the ABAP world, I mentioned the analyzer above, which can find inefficient operations for some part of the standard containers, but as far as I know, a very simple static analysis is implemented there.
A much more interesting work is Chameleon - a dynamic analyzer for Java, which is very cleverly done. They tweaked the JVM a little, and during operation they collect various statistics on the use of containers, and depending on the current load profile, they select certain containers and replace them automatically during operation. Unfortunately, the sources are closed and there is no chance to get them (I tried).
I also recommend looking at various works (there are many) on SETL. In them, the authors also often raised questions about the automatic selection of the container.
References
- Current implementation on github
- C ++ Russia 2017: Yuri Efimochev, clang-tidy: a journey inside C ++ Abstract Syntax Tree
- Chameleon: Adaptive Selection of Collections
- Clang static analyzer guide
- Russian-language chat on the development of compilers in Telegram. If you are interested, come in, it’s very interesting there. Just be careful with the flood - they will immediately punish him :)
Instead of a conclusion, I would like to focus on the fact that it is only a prototype so far and it has too many “holes” in implementation. In this article, I just want to share with you the idea of such an analysis and its popularization. Well, maybe someone will be interested in this topic and there will be a desire to connect to the project - I will only be happy! In addition, you can always collect this analyzer in your own place to try it on your test examples.
If you have something to supplement the material, have encountered a similar problem, or simply have some information that may be useful on this topic - please do not hesitate to share this information in the comments.
Thanks for attention!
Only registered users can participate in the survey. Please come in.
Would you like to see this kind of optimization in compilers, provided that it will not pessimize the code and you can disable optimization data with a separate compiler flag?
- 74.4% Yes 64
- 25.5% No 22
- 0% I will write my answer in the comments 0