Andrey2008 June 23, 2014 at 10:11

PVS-Studio team broadens horizons by performing custom development

As you know, our main activity is the development of the PVS-Studio code analyzer. And although we have been doing this for a long time and, it seems to us, we have been successfully doing it, recently we have an unusual idea. Still, we do not use our tools in the same mode as our customers. No, of course, we check the PVS-Studio code using PVS-Studio. But frankly, the PVS-Studio project is not so big. And working with PVS-Studio code is different in style and character from, for example, working with Chromium or LLVM code.

We wanted to be in the shoes of our clients in order to understand how our tool is used in long-term projects. After all, checks on projects that we do regularly and about which we write a lot of articles, this is exactly the analyzer usage style that we are actively opposing. It is wrong to run a one-time analyzer on a project, fix a few errors, and repeat this after a year. When writing code, the analyzer should be used regularly, every day.

Oh well, what’s it all for? Our theoretical desires to try ourselves in other projects coincided with practical proposals, which gradually began to come to us. Last year, we decided to single out a team in our company that would work - oh, horror! - custom development. That is, she participated in third-party projects as programmers. Moreover, it was interesting for us to participate in long-term and rather large projects, i.e. at least 2-3 developers and at least 6 months of development. We had two goals:

try an alternative type of business (custom development in addition to product development);
see for yourself the use of PVS-Studio in long-term projects.

Both the first and second tasks were successful. But this article is not about the business of custom development, but about our experience. This is not organizational experience. There are so many articles about this. About experience working with code of other people's projects. We want to talk about this.

Interesting points

Third-party projects teach us a lot. I think we will devote more than one article to this topic. Now we will start with some interesting observations. We noticed 3 bright features of the code that proved to be in large and old projects. We are sure that over time there will be publications about other observations.

We have a lot of articles devoted to changing the platform capacity from 32-bit to 64-bit. A lot of errors are associated with porting, which begin to manifest themselves in 64-bit programs. We call them 64-bit errors .

But in addition to the transition to 64-bit systems, there have been other changes in the code that are not so noticeable. This happened in connection with the development of compilers, libraries and the maturation of the projects themselves. The consequences of these changes are clearly visible in code that has a long development history. We suggest discussing them. I think it will be interesting and useful. Someone might want to review their old code to identify similar issues.

The considered error patterns were noticed by us thanks to the PVS-Studio analyzer. Many of these errors are hidden. The code almost works thanks to a fortunate coincidence. However, every such mistake is a small time bomb that can fire at any moment.

Note.To avoid NDA restrictions, we changed the names and edited the code. Actually, this is not at all the code that was in the program. However, "everything is based on real events."

Change in new operator behavior

Once upon a time, the 'new' operator returned NULL in case of a memory allocation error. Then, compilers began to support the modern C ++ standard, and throw std :: bad_alloc exceptions in such situations. You can force the 'new' operator to return NULL, but this is not the point now.

Apparently, these changes were noticed by the community of programmers very superficially. We took note of this fact and began to write code taking into account the new behavior. Of course, there are programmers who are still not aware that the 'new' operator throws an exception. But we are talking about normal, adequate programmers. Persons who do not want to know anything and continue to write in style, as they did 20 years ago, are not interesting to us.

However, even those who know that the 'new' operator has changed their behavior have not reviewed existing code. Someone simply did not realize it. Someone wanted, but he had no time, and then he forgot about it. As a result, incorrect processors of situations when it is impossible to allocate memory continue to live in a huge number of programs.

Some code fragments are harmless:

int *x = new int[N];
if (!x) throw MyMemoryException(); // 1
int *y = new int[M];
if (!y) return false; // 2

In the first case, a std :: bad_alloc exception will be thrown instead of MyMemoryException. Most likely, these exceptions are handled the same way. No problem.

In the second case, the check will not work. The function will not return the value 'false'. Instead, an exception will be raised that will be handled somehow later. This is mistake. The behavior of the program has changed. But in practice, this almost never leads to problems. It’s just that the program will respond a bit differently to a lack of memory.

It is more important to warn about cases when the behavior of the program changes not just a little, but very significantly. There are a lot of such situations in large old projects.

Here are a couple of examples when certain actions must be performed when there is not enough memory:

image->SetBuffer(new unsigned char[size]);
if (!image->GetBuffer())
{
  CloseImage(image);
  return NULL;
}
FormatAttribute *pAttrib = new FormatAttribute(sName, value, false);
if (pAttrib )
{
    m_attributes.Insert(pAttrib);
}
else
{
  TDocument* pDoc = m_state.GetDocument();
  if (pDoc)
     pDoc->SetErrorState(OUT_OF_MEMORY_ERROR, true);
}

Such code is much more dangerous. For example, a user may lose the contents of his document in the event of a lack of memory, although it was quite possible to give the opportunity to save the document to a file.

Recommendation. Find all the 'new' statements in your program. Check if the program is going to take any action if the pointer is null. Rewrite such places.

PVS-Studio analyzer helps detect meaningless checks using V668 diagnostics .

Replacing char * with std :: string

With the transition from C to C ++ and the popularity of the standard library, string classes such as std :: string became widely used in programs. This is understandable and understandable. It is easier and safer to work with a full-fledged string, and not with the char * pointer.

The string class began to be used not only in the new code, but also to change some old fragments. It is with this that troubles can arise. It is enough to weaken attention, and the code becomes dangerous or completely incorrect.

But let's start not with the scary, but rather the fun. Sometimes such inefficient loops come across:

for (i = 0; i < strlen(str.c_str()); ++i)

Obviously, once the variable 'str' was an ordinary pointer. It is bad to call the strlen () function on each iteration of the loop. This is extremely inefficient on long lines. However, after turning the pointer into std :: string, the code began to look even more stupid.

It is immediately clear that the type change occurred thoughtlessly. This can lead to similar inefficient code or even errors. Let's talk about mistakes:

wstring databaseName;
....
wprintf("Error: cannot open database %s", databaseName); // 1
CString s;
....
delete [] s; // 2

In the first case, the pointer 'wchar_t *' turned into 'wstring'. The trouble will arise if you can’t open the database, and you need to print a message. The code compiles, but an abracadabra is printed on the screen. Or the program will generally crash. Reason - forgot to add a call to c_str (). The correct option:

wprintf("Error: cannot open database %s", databaseName.c_str());

The second case is generally epic. It is not surprising, but such code successfully compiles. The very popular CString string class is used. The CString class can implicitly convert to a pointer. This is what is happening here. The result is a double buffer deletion.

Recommendation. If you change pointers to a string class, do it carefully. Do not use bulk replacements without looking at each case. Just because a code compiles does not mean that it works. It is better to leave code using pointers alone if there is no obvious need to edit it. Better the code works with pointers correctly than incorrectly with classes.

PVS-Studio analyzer helps to detect some of the errors that occurred due to replacement of class pointers. V510 diagnostics can help with this., V515 , etc. But fully rely on analyzers is not worth it. Extremely creative code can be caught, in which an error can be found not only by the analyzer, but also by an experienced programmer.

Replacing char with wchar_t

The project is developing, there is a desire to make the application interface multilingual. And at some point there is a massive replacement of char with wchar_t. Fixed errors generated by the compiler. "Ready" unicode version of the application.

In practice, after such a replacement, the application turns into a sieve. Errors that occur after such a replacement can manifest itself for decades and are difficult to reproduce.

How can this be? Very simple. Many pieces of code are not ready for the size of the character to change. The code compiles without errors and without warnings. But it works only at "50%". Now you will understand what we mean.

Note.Let me remind you that we are not intimidated by bad code received from students. We talk about the harsh realities of programmer life. In large and old projects, such errors inevitably appear. And there are hundreds of them. Seriously. Hundreds

Examples:

wchar_t tmpBuffer[500];
memset(tmpBuffer, 0, 500); // 1
TCHAR *message = new TCHAR[MAX_MSG_LENGTH];
memset(charArray, 0, MAX_MSG_LENGTH*sizeof(char)); // 2
LPCTSTR sSource = ...;
LPTSTR sDestination = (LPTSTR) malloc (_tcslen(sSource) + 12); // 3
wchar_t *name = ...;
fprintf(fp, "%i : %s", i, name); // 4

In the first case, the programmer did not think at all that the size of the character would change over time. Therefore, I cleared only half the buffer. That's where my words about 50% come from.

In the second case, the programmer guessed that the size of the character would change. However, the hint "* sizeof (char)" did not help with mass type substitution. It was necessary to do wrong. The correct option:

memset(charArray, 0, MAX_MSG_LENGTH * sizeof(charArray[0]));

The third example. Everything is fine with types. The function _tcslen () is used to calculate the length of a string. At first glance, everything is wonderful. However, when the characters began to have the type 'wchar_t', the program again began to work at 50%.

2 times less memory is allocated than necessary. In fact, the program will work successfully until the message length exceeds half of the maximum possible length. An unpleasant error that lingered in the code for a long time.

Fourth example. Fixed printf (), sprintf (), and so on, but forgot to check frintf (). As a result, garbage is written to the file. It was necessary to do something like this:

fwprintf(fp, L"%i : %s", i, name);

Or so, if it is an ordinary ASCII file:

fprintf(fp, "%i : %S", i, name);

Note. Now I thought that I was talking about 50%, from the point of view of Windows. On Linux, the wchar_t type does not take 2, but 4 bytes. So in the Linux world, the program will work only 25% in general :).

Recommendation. If you have already massively replaced char with wchar_t, we don’t know what might help. This could make a lot of interesting mistakes. Closely reviewing all the places where wcahr_t is used is not realistic. Partially, a static code analyzer will help you. He will reveal part of the errors. The errors shown above will be found by the PVS-Studio analyzer. The consequences of replacing char with wchar_t are very diverse and come to light with various diagnostics. To list them does not make sense. As an example, we can name V512 , V576 , V635 , etc.

If you haven’t done the replacement yet, but are planning, then approach it in all seriousness. Replacing types automatically and fixing compilation errors will take, for example, 2 days. Manually making the same replacements while viewing the code will take an order of magnitude longer. For example, you spend 2 weeks on it. But this is better than catching new errors after 2 years:

incorrect formatting of strings;
going beyond the boundaries of allocated memory buffers;
work with buffers half-cleared;
work with buffers copied to half;
etc.

Conclusion

We were pleased with our experience of participating in third-party custom development, so it allowed us to look at the world with different eyes. We will continue to participate in third-party projects as developers, so if anyone is interested in talking with us about this topic, write. If you’re afraid that later there will be a revealing article about you (about the errors found), then write anyway - let the NDA help us all!

Tags: