Big brother helps you
Once again I was convinced that programmers write programs completely carelessly. And they work not because of their merits, but due to a successful combination of circumstances and the care of compiler developers at Microsoft or Intel. Yes, yes, it is they who care and at the right time substitute crutches for our curved side programs.
Pray, pray for compilers and their developers. They put so much effort into our programs to work, despite many shortcomings and even mistakes. Moreover, their work is difficult and not visible. They are the noble knights of coding and patron angels for all of us.
I knew that there was a department at Microsoft that worked to ensure maximum compatibility of new versions of operating systems with old applications. They have more than 10,000 of the most famous old programs that should definitely work in new versions of Windows. Thanks to such efforts, I was recently able to play Heroes of Might and Magic II (1996 game) under 64-bit Windows Vista without any problems. I think the game will launch successfully in Windows 7. Here are Alexey Pakhunov’s interesting notes on compatibility [ 1 , 2 , 3 ], I highly recommend reading it.
But apparently there are also departments that are engaged in helping our terrible C / C ++ code work, work and work. I will start this story from the very beginning.
I am involved in the development of the PVS-Studio tool for analyzing application source code. Quiet comrades, quiet - this is not advertising. This time, this is definitely a charitable cause, for we began to create a free general-purpose static analyzer. So far, even the alpha version is far away, but work is slowly going on and someday I will make a post on Habrahabr about this analyzer. I started talking about this because we started collecting the most interesting typical mistakes and learning how to diagnose them.
Many errors are associated with the use of ellipses in programs. Theoretical background:
There are functions in the description of which it is impossible to indicate the number and types of all valid parameters. Then the list of formal parameters ends with an ellipsis (...), which means: "and, possibly, a few more arguments." For example: int printf (const char * ...);
One such unpleasant, but easily diagnosed error is the transfer to a function with a variable number of arguments of an object of type class, instead of a pointer to a string. Here is an example of this error:
Such code will lead to the formation of rubbish in the buffer or to crash of the program. In a real program, of course, the code will be more confusing, so please - do not write comments that, unlike Visual C ++, the GCC compiler will check the arguments and warn. Strings can come from resources or other functions and nothing can be verified. Here, the diagnosis is simple - a class object is passed to the line forming function, which leads to an error.
The correct version of the code should look like this:
It is precisely because in a function with a variable number of arguments that you can pass anything you like and are not recommended for use in almost all books on C ++ programming. Instead, it is suggested to use safe mechanisms, for example, boost :: format. However, recommendations are recommendations, and there are a lot of code with different printf, sprintf, CString :: Format and we will live with it for a very long time. That is why we have implemented a diagnostic rule that identifies such dangerous structures.
Let's understand theoretically what the code above is incorrect. It turns out he is incorrect twice.
Theoretical background about POD types:
POD is an abbreviation for “Plain Old Data”, which can be translated as “Simple C-style data”. POD types include:
Accordingly, the std :: wstring class does not apply to POD types, since it has constructors, a base class, and so on.
Moreover, if you pass an ellipsis object that is not a POD type, then this leads to undefined behavior. Thus, at least theoretically, we cannot in any way correctly pass an object of type std :: wstring as an ellipsis of the argument.
We should observe the same picture with the Format functions from the CString class. Invalid code option:
The correct version of the code:
Or, as suggested in MSDN [ 4 ], to obtain a pointer to a string, you can use the explicit cast operator LPCTSTR implemented in the CString class. Example of the correct code from MSDN:
So, everything seems to be transparent and clear. How to make the rule clear too. We will detect typos when using functions with a variable number of arguments.
This was done. And here I was shocked by the result. It turns out that most developers never think about these problems at all and calmly write code of the form:
And some are thoughtful, but forgotten. And therefore, the following code looks so touching:
And there were so many such examples in the projects on which we are testing PVS-Studio that it became not clear how this could even be. Nevertheless, this all works great, as I was able to verify by writing a test program and trying various options for using CString.
What is the matter? Apparently, the compiler developers could not stand the endless questions why the Hindu programs using CString do not work and the accusations of "compiler errors that work incorrectly with strings." And they quietly performed the sacred ritual of exorcism, expelling evil from CString. They made the impossible possible. Namely, the CString class is implemented in a special tricky way, so that it can be passed in functions of the form printf, Format.
This has been done quite cunningly, and anyone who is interested can read the source code for the CStringT class, and also get acquainted with these extensive discussions on " Pass CString to printf? " [5]. I will not go into details. I note only an important point. A special implementation of CString is not sufficient; theoretically, non-POD type transfers result in unpredictable behavior. So the developers of Visual C ++, and with them Intel C ++, made it so that unpredictable behavior is always the correct result. :) After all, the correct operation of the program is quite a subset of unpredictable behavior. :)
And now I'm starting to think about some strange features of the compiler's behavior when building 64-bit programs. There is a suspicion that the compiler’s developers deliberately make the program’s behavior not theoretical, but practical (workable), in those simple cases when they recognize a certain pattern. The most understandable example would be a loop pattern. Example of invalid code:
Theoretically, if the value of n> UINT_MAX is greater, an infinite loop should occur. However, it does not occur in the Release version, since the variable “i” uses a 64-bit register. Of course, if the code is more complicated, an infinite loop will occur, but at least in some cases the program will be lucky. I wrote more about this in the article “A 64-bit horse that can count ” [6].
I used to think that such unexpectedly successful behavior of a program is connected exclusively with the features of optimization of Release versions. However, now I am not so sure. Perhaps this is a conscious attempt at least sometimes to make an unworkable program workable. Of course, I don’t know, the reason is optimization or the care of a big brother, but this is a wave of reason to philosophize. :) Well, who knows, that is unlikely to say. :)
I am sure that there are other moments when the compiler puts a hand to crippled programs. If something else is interesting, I’ll definitely tell you.
I wish you a bug-free code!
Pray, pray for compilers and their developers. They put so much effort into our programs to work, despite many shortcomings and even mistakes. Moreover, their work is difficult and not visible. They are the noble knights of coding and patron angels for all of us.
I knew that there was a department at Microsoft that worked to ensure maximum compatibility of new versions of operating systems with old applications. They have more than 10,000 of the most famous old programs that should definitely work in new versions of Windows. Thanks to such efforts, I was recently able to play Heroes of Might and Magic II (1996 game) under 64-bit Windows Vista without any problems. I think the game will launch successfully in Windows 7. Here are Alexey Pakhunov’s interesting notes on compatibility [ 1 , 2 , 3 ], I highly recommend reading it.
But apparently there are also departments that are engaged in helping our terrible C / C ++ code work, work and work. I will start this story from the very beginning.
I am involved in the development of the PVS-Studio tool for analyzing application source code. Quiet comrades, quiet - this is not advertising. This time, this is definitely a charitable cause, for we began to create a free general-purpose static analyzer. So far, even the alpha version is far away, but work is slowly going on and someday I will make a post on Habrahabr about this analyzer. I started talking about this because we started collecting the most interesting typical mistakes and learning how to diagnose them.
Many errors are associated with the use of ellipses in programs. Theoretical background:
There are functions in the description of which it is impossible to indicate the number and types of all valid parameters. Then the list of formal parameters ends with an ellipsis (...), which means: "and, possibly, a few more arguments." For example: int printf (const char * ...);
One such unpleasant, but easily diagnosed error is the transfer to a function with a variable number of arguments of an object of type class, instead of a pointer to a string. Here is an example of this error:
wchar_t buf [100]; std :: wstring ws (L "12345"); swprintf (buf, L "% s", ws);
Such code will lead to the formation of rubbish in the buffer or to crash of the program. In a real program, of course, the code will be more confusing, so please - do not write comments that, unlike Visual C ++, the GCC compiler will check the arguments and warn. Strings can come from resources or other functions and nothing can be verified. Here, the diagnosis is simple - a class object is passed to the line forming function, which leads to an error.
The correct version of the code should look like this:
wchar_t buf [100]; std :: wstring ws (L "12345"); swprintf (buf, L "% s", ws.c_str ());
It is precisely because in a function with a variable number of arguments that you can pass anything you like and are not recommended for use in almost all books on C ++ programming. Instead, it is suggested to use safe mechanisms, for example, boost :: format. However, recommendations are recommendations, and there are a lot of code with different printf, sprintf, CString :: Format and we will live with it for a very long time. That is why we have implemented a diagnostic rule that identifies such dangerous structures.
Let's understand theoretically what the code above is incorrect. It turns out he is incorrect twice.
- The argument does not match the specified format. Once we specify "% s", then we must pass a pointer to a string. However, theoretically, we can write our sprintf function, which will know that an object of the std :: wstring class has been passed to it and correctly prints it. However, this is also impossible due to reason number 2.
- The argument for the ellipsis "..." can only be a POD type. And std :: string POD is not a type.
Theoretical background about POD types:
POD is an abbreviation for “Plain Old Data”, which can be translated as “Simple C-style data”. POD types include:
- all built-in arithmetic types (including wchar_t and bool);
- types declared using the enum keyword;
- pointers;
- POD structures (struct or class) and POD unions (union) that satisfy the following requirements:
- do not contain custom constructors, destructor or copy assignment operator;
- do not have base classes;
- do not contain virtual functions;
- do not contain protected (protected) or private (private) non-static data members;
- do not contain non-static data members of non-POD types (or arrays of such types), as well as references.
Accordingly, the std :: wstring class does not apply to POD types, since it has constructors, a base class, and so on.
Moreover, if you pass an ellipsis object that is not a POD type, then this leads to undefined behavior. Thus, at least theoretically, we cannot in any way correctly pass an object of type std :: wstring as an ellipsis of the argument.
We should observe the same picture with the Format functions from the CString class. Invalid code option:
Cstring s; CString arg (L "OK"); s.Format (L "Test CString:% s \ n", arg);
The correct version of the code:
s.Format (L "Test CString:% s \ n", arg.GetString ());
Or, as suggested in MSDN [ 4 ], to obtain a pointer to a string, you can use the explicit cast operator LPCTSTR implemented in the CString class. Example of the correct code from MSDN:
CString kindOfFruit = "bananas"; int howmany = 25; printf ("You have% d% s \ n", howmany, (LPCTSTR) kindOfFruit);
So, everything seems to be transparent and clear. How to make the rule clear too. We will detect typos when using functions with a variable number of arguments.
This was done. And here I was shocked by the result. It turns out that most developers never think about these problems at all and calmly write code of the form:
class CRuleDesc { CString GetProtocol (); CString GetSrcIp (); CString GetDestIp (); CString GetSrcPort (); CString GetIpDesc (CString strIp); ... CString CRuleDesc :: GetRuleDesc () { CString strDesc; strDesc.Format ( _T ("% s all network traffic from
% s" "on% s
to% s on% s
for the% s"), GetAction (), GetSrcIp (), GetSrcPort (), GetDestIp (), GetDestPort (), GetProtocol ()); return strDesc; } // --------------- CString strText; CString _strProcName (L ""); ... strText.Format (_T ("% s"), _strProcName); // --------------- CString m_strDriverDosName; CString m_strDriverName; ... m_strDriverDosName.Format ( _T ("\\\\. \\% s"), m_strDriverName); // --------------- CString __stdcall GetResString (UINT dwStringID); ... _stprintf (acBuf, _T ("% s"), GetResString (IDS_SV_SERVERINFO)); // --------------- // I think it’s clear // that examples can be cited and cited.
And some are thoughtful, but forgotten. And therefore, the following code looks so touching:
CString sAddr; CString m_sName; CString sTo = GetNick (hContact); sAddr.Format (_T ("\\\\% s \\ mailslot \\% s"), sTo, (LPCTSTR) m_sName);
And there were so many such examples in the projects on which we are testing PVS-Studio that it became not clear how this could even be. Nevertheless, this all works great, as I was able to verify by writing a test program and trying various options for using CString.
What is the matter? Apparently, the compiler developers could not stand the endless questions why the Hindu programs using CString do not work and the accusations of "compiler errors that work incorrectly with strings." And they quietly performed the sacred ritual of exorcism, expelling evil from CString. They made the impossible possible. Namely, the CString class is implemented in a special tricky way, so that it can be passed in functions of the form printf, Format.
This has been done quite cunningly, and anyone who is interested can read the source code for the CStringT class, and also get acquainted with these extensive discussions on " Pass CString to printf? " [5]. I will not go into details. I note only an important point. A special implementation of CString is not sufficient; theoretically, non-POD type transfers result in unpredictable behavior. So the developers of Visual C ++, and with them Intel C ++, made it so that unpredictable behavior is always the correct result. :) After all, the correct operation of the program is quite a subset of unpredictable behavior. :)
And now I'm starting to think about some strange features of the compiler's behavior when building 64-bit programs. There is a suspicion that the compiler’s developers deliberately make the program’s behavior not theoretical, but practical (workable), in those simple cases when they recognize a certain pattern. The most understandable example would be a loop pattern. Example of invalid code:
size_t n = BigValue; for (unsigned i = 0; i <n; i ++) {...}
Theoretically, if the value of n> UINT_MAX is greater, an infinite loop should occur. However, it does not occur in the Release version, since the variable “i” uses a 64-bit register. Of course, if the code is more complicated, an infinite loop will occur, but at least in some cases the program will be lucky. I wrote more about this in the article “A 64-bit horse that can count ” [6].
I used to think that such unexpectedly successful behavior of a program is connected exclusively with the features of optimization of Release versions. However, now I am not so sure. Perhaps this is a conscious attempt at least sometimes to make an unworkable program workable. Of course, I don’t know, the reason is optimization or the care of a big brother, but this is a wave of reason to philosophize. :) Well, who knows, that is unlikely to say. :)
I am sure that there are other moments when the compiler puts a hand to crippled programs. If something else is interesting, I’ll definitely tell you.
I wish you a bug-free code!
Bibliographic list
- Blog Alexey Pakhunov. Backward compatibility is serious. http://www.viva64.com/go.php?url=390
- Blog Alexey Pakhunov. AppCompat. http://www.viva64.com/go.php?url=391
- Blog Alexey Pakhunov. Is Windows 3.x Alive? http://www.viva64.com/go.php?url=392
- MSDN CString Operations Relating to C-Style Strings. Topic: Using CString Objects with Variable Argument Functions. http://www.viva64.com/go.php?url=393
- Discussion on eggheadcafe.com. Pass CString to printf? http://www.viva64.com/go.php?url=394
- Andrey Karpov. A 64-bit horse that can count. http://www.viva64.com/art-1-1-1064884779.html