A collection of examples of 64-bit errors in real programs - part 1
I dedicate this article to the habrauser f0b0s , who constantly monitors our activity, accompanying it with subtle humor, which keeps us in good shape.
Readers of our articles on the development of 64-bit applications often blame us for the lack of substantiation of the described problems. Namely, that we do not give examples of errors in real applications.
I decided to collect examples of various types of errors that we ourselves found in real programs that we read about on the Internet or that PVS-Studio users told us about. So, I bring to your attention an article, which is a collection of 30 examples of 64-bit errors in C and C ++.
Continuation of the article >>
Our company OOO "Program Verification Systems" is developing a specialized static analyzer Viva64 detecting 64-bit errors in the application code in C / C ++. In the course of this work, our collection of examples of 64-bit defects is constantly updated, and we decided to collect the most interesting errors in our opinion in this article. The article gives examples both taken directly from the code of real applications, and compiled synthetically based on real code, since they are too “stretched" in it.
The article only demonstrates various types of 64-bit errors and does not describe methods for their detection and prevention. You can familiarize yourself in detail with the methods for diagnosing and fixing defects in 64-bit programs by contacting the following resources:
Two objects of type STRUCT_1 and STRUCT_2 are declared in the program, which must be cleared before initial use (initialize all fields with zeros). When implementing the initialization, the programmer decided to copy a similar line and replaced "& Abcd" with "& Qwer" in it. But at the same time, he forgot to replace “sizeof (Abcd)” with “sizeof (Qwer).” By a fortunate coincidence, the size of the STRUCT_1 and STRUCT_2 structures coincided on a 32-bit system and the code worked correctly for a long time.
When porting the code to a 64-bit system, the size of the Abcd structure increased, and as a result, a buffer overflow error occurred (see Figure 1). Figure 1 - Schematic explanation of an example of a buffer overflow. Such an error can be difficult to detect if the data used much later is corrupted.
The code is bad, but it is real code. His task is to find the end of the line indicated by 0x0A. The code will not work with lines longer than INT_MAX characters, since the variable length is of type int. However, we are interested in another error, so we assume that the program works with a small buffer and the use of the int type is correct.
The problem is that on a 64-bit system, the buffer and curr_pos pointers can lie outside the first 4 gigabytes of the address space. In this case, explicit casting of pointers to the UINT type will discard the significant bits, and the algorithm will be violated (see Figure 2). Figure 2 - Incorrect calculations when searching for a terminal symbol
The error is unpleasant in that the code can work correctly for a long time while the memory for the buffer is allocated in the lower four gigabytes of the address space. The bug fix is to remove completely unnecessary explicit type conversions:
Often in programs with a long history, you can find sections of code wrapped in the #ifdef - - # else - #endif construct. When porting programs to the new architecture, incorrectly written conditions can lead to compilation of the wrong code fragments as planned by the developers in the past (see Figure 3). Example:
Figure 3 - Two options - it's too little
Relying on the #else option is dangerous in such situations. It is better to explicitly consider the behavior for each case (see Figure 4), and put a compilation error message in the #else branch:
Figure 4 - All possible compilation paths are checked
In old programs, especially in C, snippets of code where the pointer is stored in int type are not rare. However, sometimes this is not done intentionally, but rather by inattention. Consider an example containing confusion arising from the use of type int and a pointer to type int:
In this example, the variable XX is used as a buffer to hold the pointer. This code will work correctly on those 32-bit systems where the size of the pointer matches the size of the int type. On a 64-bit system, this code is incorrect and the call
it will corrupt 4 bytes of memory next to the variable XX (see Figure 5). Figure 5 - Memory corruption next to variable XX. The above code was written either by a novice or in a hurry. Moreover, explicit type conversions indicate that the compiler resisted to the last, hinting to the developer that the pointer and int are different entities. However, brute force won. The error correction is elementary and consists in choosing the right type for the variable XX. In this case, explicit casting ceases to be necessary:
A number of API functions, although left for compatibility, constitute a danger when developing 64-bit applications. A classic example is the use of functions such as SetWindowLong and GetWindowLong. In programs, you can find code similar to the following:
The programmer who once wrote this code has nothing to blame. During development, about 5-10 years ago, a programmer, drawing on his experience and MSDN, compiled the code completely correct from the point of view of a 32-bit Windows system. The prototype of these functions is as follows:
The fact that the pointer is explicitly cast to the LONG type is also justified, since the size of the pointer and the LONG type are the same on Win32 systems. But I think it’s clear that when recompiling the program in the 64-bit version, data of type casting can cause the application to crash or malfunction.
The unpleasantness of the error lies in its irregular or even extremely rare manifestation. Whether an error occurs or not depends on in which area of the memory the object is created that the “this” pointer points to. If an object is created in the lower 4 gigabytes of address space, then the 64-bit program can function correctly. The error can unexpectedly manifest itself after a long period of time when, due to the allocation of memory, objects will begin to be created outside the first four gigabytes.
In a 64-bit system, you can use the SetWindowLong / GetWindowLong functions only if the program really saves some values like LONG, int, bool and the like. If you need to work with pointers, then you should use advanced options for functions: SetWindowLongPtr / GetWindowLongPtr. Although, perhaps, it is recommended to use new functions in any case, so as not to provoke new errors in the future.
Examples with the SetWindowLong and GetWindowLong functions are classic and are given in almost all articles devoted to the development of 64-bit applications. However, it should be noted that business is not limited to these functions. Pay attention to: SetClassLong, GetClassLong, GetFileSize, EnumProcessModules, GlobalMemoryStatus (see Figure 6).
Figure 6 - Table with the names of some obsolete and modern functions
Implicit casts of size_t to unsigned and similar casts are well diagnosed with compiler warnings. However, in large programs, such warnings can easily be lost. Consider an example similar to real code, where the warning was ignored, because it seemed that nothing bad could happen when working with short lines.
The above function searches for the text “ABC” in an array of strings and returns true if at least one string contains the sequence “ABC”. When compiling a 64-bit version of the code, this function will always return true.
The constant “string :: npos” in a 64-bit system has the value 0xFFFFFFFFFFFFFFFFFF of type size_t. When this value is placed in an unsigned variable "n", it is truncated to 0xFFFFFFFF. As a result, the condition "n! = String :: npos" is always true, since 0xFFFFFFFFFFFFFFFFFF is not equal to 0xFFFFFFFF (see Figure 7). Figure 7 - Schematic explanation of the error of cutting the value The correction is elementary, just listen to the compiler warnings:
Despite the years, programs or parts of programs written in C remain alive than all living things. The code of these programs is much more prone to 64-bit errors due to less stringent type control rules in the C language.
In C, you can use functions without first declaring them. Let us analyze an interesting example of a 64-bit error related to this. To begin, consider the correct version of the code in which the allocation and use of three gigabyte-sized arrays each occurs:
This code will correctly allocate memory, write to the first element of each array one by one, and free up occupied memory. The code works correctly on a 64-bit system.
Now delete or comment out the line "#include". The code will continue to be collected, but it will crash when the program starts. If the header file" stdlib.h "is not connected, the C compiler considers the malloc function to return the int type. The first two memory allocations will most likely pass successfully. After the third call malloc function returns the address of the array beyond the first 2 gigabytes. Since the compiler considers that the result of the function is an int, it incorrectly interprets the result and stores in an array pointers invalid pointer value.
Consider the assembly to . Generated by the Visual C ++ compiler for 64-bit version of the Debug First is the correct code that will be generated when there is classified malloc (attached file «stdlib.h») function:
Now consider a variant of incorrect code when there is no declaration of the malloc function:
Note the availability of the CDQE (Convert doubleword to quadword) statement. The compiler calculated that the result is in the eax register and expanded it to a 64-bit value to write to the Pointers array. Accordingly, the high bits of the rax register will be lost. Even if the address of the allocated memory lies within the first four gigabytes, in the case when the highest bit of the eax register is 1, we will still get an incorrect result. For example, the address 0x81000000 will turn into 0xFFFFFFFF81000000.
Large old software systems that have been developing for decades are replete with a variety of atavisms and simply pieces of code written using popular paradigms and styles over the years. In such systems, one can observe the evolution of the development of programming languages, when the oldest parts are written in the style of the C language, and in the latest ones you can find complex templates in the style of Alexandrescu.
Figure 8 - Dinosaur excavations
There are atavisms associated with 64-bit. Rather, atavisms that impede the operation of modern 64-bit code. Consider an example:
Firstly, the function code contains a check for the permissible size of the allocated memory, which is strange for a 64-bit system. And secondly, the diagnostic message that is issued will be incorrect, because if we ask to allocate memory for 4,400,000,000 elements, due to the explicit casting to unsigned, we will get a strange message about the impossibility of allocating memory for only 105,032,704 elements.
One of the beautiful examples of 64-bit errors is the use of invalid argument types in virtual function declarations. And usually this is not someone’s sloppiness, but simply an “accident”, where there are no guilty parties, but there is a mistake. Consider the following situation.
Since time immemorial, the MFC library has a CWinApp class that has a WinHelp function:
To show your own help in a user application, you had to block this function:
And everything was fine until 64-bit systems appeared. MFC developers had to change the interface of the WinHelp function (and some other functions) as follows:
In 32-bit mode, the types DWORD_PTR and DWORD coincided, but in 64-bit mode, no. Naturally, custom application developers should also change the type to DWORD_PTR, but to do this, you need to find out about this at the beginning. As a result, an error occurs in a 64-bit program, since the WinHelp function in the user class is not called (see Figure 9). Figure 9 - Error related to virtual functions
Magic numbers contained in the body of programs are bad style and cause errors. An example of magic numbers is 1024 and 768, which rigidly indicate the size of the screen resolution. In the framework of this article, we are interested in those magic numbers that can lead to problems in a 64-bit application. The most common numbers that are dangerous for 64-bit programs are presented in the table in Figure 10.
Figure 10 - Magic numbers that are dangerous for 64-bit programs
Let us demonstrate an example of working with the CreateFileMapping function found in one of the CAD systems:
Instead of the correct reserved constant INVALID_HANDLE_VALUE, the number 0xFFFFFFFF is used. It is incorrect to Win64 program, which is set to a constant INVALID_HANDLE_VALUE 0xFFFFFFFFFFFFFFFF. The correct option to call the function would be:
Note. Some believe that the value 0xFFFFFFFF when expanded to a pointer turns into 0xFFFFFFFFFFFFFFFFFF. This is not true. According to C / C ++ rules, the value 0xFFFFFFFF is of type “unsigned int”, since it cannot be represented by type “int”. Accordingly, expanding to a 64-bit type, the value 0xFFFFFFFFFFu turns into 0x00000000FFFFFFFFFFu. But if you write like this (size_t) (- 1), then we get the expected 0xFFFFFFFFFFFFFFFF. Here, “int” is first expanded to “ptrdiff_t”, and then converted to “size_t”.
Another common mistake is to use magic numbers to set the size of an object. Consider an example of allocating and zeroing a buffer:
In this case, in a 64-bit system, more memory is allocated than is then filled with zero values (see Figure 11). The error is to assume that size_t is always four bytes. Figure 11 - Filling only part of the array. Correct option:
Similar errors can be encountered when calculating the size of the allocated memory or data serialization.
In many cases, a 64-bit program consumes more memory and stack. Allocating more memory on the heap is not dangerous, since this type of memory is available to a 64-bit program many times more than 32-bit. But increasing the used stack memory can lead to its unexpected overflow (stack overflow).
The mechanism for using the stack is different in different operating systems and compilers. We will consider the peculiarity of using the stack in Win64 code of applications built by the Visual C ++ compiler.
In the development of agreements on calls ( callingconventions) in Win64 systems decided to put an end to the existence of various options for calling functions. There were a number of calling conventions in Win32: stdcall, cdecl, fastcall, thiscall, and so on. In Win64, there is only one “native” calling convention. Modifiers like the __cdecl compiler are ignored.
The x86-64 calling convention is similar to the fastcall convention in x86. In the x64 convention, the first four integer arguments (from left to right) are passed in 64-bit registers selected specifically for this purpose:
RCX: 1st integer argument
RDX: 2nd integer argument
R8: 3rd integer argument
R9: 4- th integer argument
The remaining integer arguments are passed through the stack. The this pointer is considered an integer argument, so it is always placed in the RCX register. If floating-point values are passed, then the first four of them are transferred in the XMM0-XMM3 registers, and the subsequent ones through the stack.
Although arguments can be passed in registers, the compiler still reserves space for them on the stack, decreasing the value of the RSP register (stack pointer). At a minimum, each function should reserve 32 bytes on the stack (four 64-bit values corresponding to the registers RCX, RDX, R8, R9). This space on the stack makes it easy to save the contents of the registers passed to the function on the stack. The function being called is not required to dump the input parameters passed through the registers onto the stack, but reserving the place on the stack allows this, if necessary. If more than four integer parameters are passed, the corresponding additional space is reserved on the stack.
The described feature leads to a substantial increase in the rate of absorption of the stack. Even if the function has no parameters, 32 bytes will still be "bitten off" from the stack, which are then not used in any way. The meaning of using such an uneconomical mechanism is related to unification and simplification of debugging.
Let's pay attention to one more moment. The RSP stack pointer must be aligned at the 16 byte boundary before the next function call. Thus, the total size of the used stack when calling a function without parameters in a 64-bit code is 48 bytes: 8 (return address) + 8 (alignment) + 32 (reserve for arguments).
is it so bad? Not. It should not be forgotten that the larger number of registers available to the 64-bit compiler allow you to build more efficient code and not reserve memory on the stack for some local function variables. Thus, in some cases, the 64-bit version of the function uses less stack than the 32-bit version. This issue and various examples are discussed in more detail in the article " Reasons why 64-bit programs require more stack memory ."
It is impossible to predict whether a 64-bit program will consume more than a stack or less. Due to the fact that the Win64-program can use 2-3 times more stack memory, you need to play it safe and change the project setting, which is responsible for the size of the reserved stack. In the project settings, select the parameter Stack Reserve Size (switch / STACK: reserve) and increase the size of the reserved stack three times. By default, this size is 1 megabyte.
Although using functions with a variable number of arguments, such as printf, scanf is considered a bad style in C ++, they are still widely used. These functions create many problems when porting applications to other systems, including 64-bit systems. Consider an example:
sprintf (buf, "% p", & x);
The author of the code did not take into account that the size of the pointer in the future may be more than 32 bits. As a result, on a 64-bit architecture, this code will lead to a buffer overflow (see Figure 12). This error can be attributed to the use of the magic number '9', but in a real application, buffer overflows can occur without magic numbers. Figure 12 - Buffer overflow when working with the sprintf function. The options for fixing this code are different. The most efficient way is to refactor the code in order to get rid of the use of dangerous functions. For example, you can replace printf with cout, and sprintf with boost :: format or std :: stringstream.
Примечание. Эту рекомендацию часто критикуют разработчики под Linux, аргументируя тем, что gcc проверяет соответствие строки форматирования фактическим параметрам, передаваемым, например, в функцию printf. И, следовательно, использование printf безопасно. Однако они забывают, что строка форматирования может передаваться из другой части программы, загружаться из ресурсов. Другими словами, в реальной программе строка форматирования редко присутствует в явном виде в коде, и, соответственно, компилятор не может ее проверить. Если же разработчик использует Visual Studio 2005/2008/2010, то он не сможет получить предупреждение на код вида void *p = 0; printf("%x", p); даже используя ключи /W4 и /Wall.
Often in programs you can find incorrect formatting strings when working with the printf function and other similar functions. Because of this, incorrect values will be displayed, which, although it will not lead to an abnormal termination of the program, it is, of course, an error:
In other cases, an error in the format string will be critical. Consider an example based on the implementation of the UNDO / REDO subsystem in one of the programs:
The format "% X" is not intended for working with pointers, and as a result, such a code is incorrect from the point of view of 64-bit systems. In 32-bit systems, it is quite functional, although not beautiful.
We did not have to encounter such a mistake ourselves. Probably this mistake is rare, but quite real.
Type double, has a size of 64 bits, and is compatible with the IEEE-754 standard on 32-bit and 64-bit systems. Some programmers use the double type to store and work with integer types:
You can still try to justify this example on a 32-bit system, since the double type has 52 significant bits and is capable of storing a 32-bit integer value without loss. But when trying to save a 64-bit integer into double, the exact value may be lost (see Figure 13).
Figure 13 - The number of significant bits in the types size_t and double
The second part of the article.
Readers of our articles on the development of 64-bit applications often blame us for the lack of substantiation of the described problems. Namely, that we do not give examples of errors in real applications.
I decided to collect examples of various types of errors that we ourselves found in real programs that we read about on the Internet or that PVS-Studio users told us about. So, I bring to your attention an article, which is a collection of 30 examples of 64-bit errors in C and C ++.
Continuation of the article >>
Introduction
Our company OOO "Program Verification Systems" is developing a specialized static analyzer Viva64 detecting 64-bit errors in the application code in C / C ++. In the course of this work, our collection of examples of 64-bit defects is constantly updated, and we decided to collect the most interesting errors in our opinion in this article. The article gives examples both taken directly from the code of real applications, and compiled synthetically based on real code, since they are too “stretched" in it.
The article only demonstrates various types of 64-bit errors and does not describe methods for their detection and prevention. You can familiarize yourself in detail with the methods for diagnosing and fixing defects in 64-bit programs by contacting the following resources:
- Course on the development of 64-bit applications in C / C ++ [1];
- What are size_t and ptrdiff_t [2];
- 20 traps of porting C ++ code to a 64-bit platform [3];
- Tutorial on PVS-Studio [4];
- A 64-bit horse that can count [5].
Example 1. Buffer overflow
struct STRUCT_1 { int * a; }; struct STRUCT_2 { int x; }; ... STRUCT_1 Abcd; STRUCT_2 Qwer; memset (& Abcd, 0, sizeof (Abcd)); memset (& Qwer, 0, sizeof (Abcd));
Two objects of type STRUCT_1 and STRUCT_2 are declared in the program, which must be cleared before initial use (initialize all fields with zeros). When implementing the initialization, the programmer decided to copy a similar line and replaced "& Abcd" with "& Qwer" in it. But at the same time, he forgot to replace “sizeof (Abcd)” with “sizeof (Qwer).” By a fortunate coincidence, the size of the STRUCT_1 and STRUCT_2 structures coincided on a 32-bit system and the code worked correctly for a long time.
When porting the code to a 64-bit system, the size of the Abcd structure increased, and as a result, a buffer overflow error occurred (see Figure 1). Figure 1 - Schematic explanation of an example of a buffer overflow. Such an error can be difficult to detect if the data used much later is corrupted.
Example 2. Extra casts
char * buffer; char * curr_pos; int length; ... while ((* (curr_pos ++)! = 0x0a) && ((UINT) curr_pos - (UINT) buffer <(UINT) length));
The code is bad, but it is real code. His task is to find the end of the line indicated by 0x0A. The code will not work with lines longer than INT_MAX characters, since the variable length is of type int. However, we are interested in another error, so we assume that the program works with a small buffer and the use of the int type is correct.
The problem is that on a 64-bit system, the buffer and curr_pos pointers can lie outside the first 4 gigabytes of the address space. In this case, explicit casting of pointers to the UINT type will discard the significant bits, and the algorithm will be violated (see Figure 2). Figure 2 - Incorrect calculations when searching for a terminal symbol
The error is unpleasant in that the code can work correctly for a long time while the memory for the buffer is allocated in the lower four gigabytes of the address space. The bug fix is to remove completely unnecessary explicit type conversions:
while (curr_pos - buffer <length && * curr_pos! = '\ r') curr_pos ++;
Example 3. Incorrect #ifdef
Often in programs with a long history, you can find sections of code wrapped in the #ifdef - - # else - #endif construct. When porting programs to the new architecture, incorrectly written conditions can lead to compilation of the wrong code fragments as planned by the developers in the past (see Figure 3). Example:
#ifdef _WIN32 // Win32 code cout << "This is Win32" << endl; #else // win16 code cout << "This is Win16" << endl; #endif // Alternative incorrect option: #ifdef _WIN16 // Win16 code cout << "This is Win16" << endl; #else // win32 code cout << "This is Win32" << endl; #endif
Figure 3 - Two options - it's too little
Relying on the #else option is dangerous in such situations. It is better to explicitly consider the behavior for each case (see Figure 4), and put a compilation error message in the #else branch:
#if defined _M_X64 // Win64 code (Intel 64) cout << "This is Win64" << endl; #elif defined _WIN32 // Win32 code cout << "This is Win32" << endl; #elif defined _WIN16 // Win16 code cout << "This is Win16" << endl; #else static_assert (false, "Unknown platform"); #endif
Figure 4 - All possible compilation paths are checked
Example 4. Confusion with int and int *
In old programs, especially in C, snippets of code where the pointer is stored in int type are not rare. However, sometimes this is not done intentionally, but rather by inattention. Consider an example containing confusion arising from the use of type int and a pointer to type int:
int GlobalInt = 1; void GetValue (int ** x) { * x = & GlobalInt; } void SetValue (int * x) { GlobalInt = * x; } ... int XX; GetValue ((int **) & XX); SetValue ((int *) XX);
In this example, the variable XX is used as a buffer to hold the pointer. This code will work correctly on those 32-bit systems where the size of the pointer matches the size of the int type. On a 64-bit system, this code is incorrect and the call
GetValue ((int **) & XX);
it will corrupt 4 bytes of memory next to the variable XX (see Figure 5). Figure 5 - Memory corruption next to variable XX. The above code was written either by a novice or in a hurry. Moreover, explicit type conversions indicate that the compiler resisted to the last, hinting to the developer that the pointer and int are different entities. However, brute force won. The error correction is elementary and consists in choosing the right type for the variable XX. In this case, explicit casting ceases to be necessary:
int * XX; GetValue (& XX); SetValue (XX);
Example 5. Using deprecated features
A number of API functions, although left for compatibility, constitute a danger when developing 64-bit applications. A classic example is the use of functions such as SetWindowLong and GetWindowLong. In programs, you can find code similar to the following:
SetWindowLong (window, 0, (LONG) this); ... Win32Window * this_window = (Win32Window *) GetWindowLong (window, 0);
The programmer who once wrote this code has nothing to blame. During development, about 5-10 years ago, a programmer, drawing on his experience and MSDN, compiled the code completely correct from the point of view of a 32-bit Windows system. The prototype of these functions is as follows:
LONG WINAPI SetWindowLong (HWND hWnd, int nIndex, LONG dwNewLong); LONG WINAPI GetWindowLong (HWND hWnd, int nIndex);
The fact that the pointer is explicitly cast to the LONG type is also justified, since the size of the pointer and the LONG type are the same on Win32 systems. But I think it’s clear that when recompiling the program in the 64-bit version, data of type casting can cause the application to crash or malfunction.
The unpleasantness of the error lies in its irregular or even extremely rare manifestation. Whether an error occurs or not depends on in which area of the memory the object is created that the “this” pointer points to. If an object is created in the lower 4 gigabytes of address space, then the 64-bit program can function correctly. The error can unexpectedly manifest itself after a long period of time when, due to the allocation of memory, objects will begin to be created outside the first four gigabytes.
In a 64-bit system, you can use the SetWindowLong / GetWindowLong functions only if the program really saves some values like LONG, int, bool and the like. If you need to work with pointers, then you should use advanced options for functions: SetWindowLongPtr / GetWindowLongPtr. Although, perhaps, it is recommended to use new functions in any case, so as not to provoke new errors in the future.
Examples with the SetWindowLong and GetWindowLong functions are classic and are given in almost all articles devoted to the development of 64-bit applications. However, it should be noted that business is not limited to these functions. Pay attention to: SetClassLong, GetClassLong, GetFileSize, EnumProcessModules, GlobalMemoryStatus (see Figure 6).
Figure 6 - Table with the names of some obsolete and modern functions
Example 6. Trimming Values with Implicit Type Coercion
Implicit casts of size_t to unsigned and similar casts are well diagnosed with compiler warnings. However, in large programs, such warnings can easily be lost. Consider an example similar to real code, where the warning was ignored, because it seemed that nothing bad could happen when working with short lines.
bool Find (const ArrayOfStrings & arrStr) { ArrayOfStrings :: const_iterator it; for (it = arrStr.begin (); it! = arrStr.end (); ++ it) { unsigned n = it-> find ("ABC"); // truncation if (n! = string :: npos) return true; } return false; };
The above function searches for the text “ABC” in an array of strings and returns true if at least one string contains the sequence “ABC”. When compiling a 64-bit version of the code, this function will always return true.
The constant “string :: npos” in a 64-bit system has the value 0xFFFFFFFFFFFFFFFFFF of type size_t. When this value is placed in an unsigned variable "n", it is truncated to 0xFFFFFFFF. As a result, the condition "n! = String :: npos" is always true, since 0xFFFFFFFFFFFFFFFFFF is not equal to 0xFFFFFFFF (see Figure 7). Figure 7 - Schematic explanation of the error of cutting the value The correction is elementary, just listen to the compiler warnings:
for (auto it = arrStr.begin (); it! = arrStr.end (); ++ it) { auto n = it-> find ("ABC"); if (n! = string :: npos) return true; } return false;
Example 7. Undeclared functions in C
Despite the years, programs or parts of programs written in C remain alive than all living things. The code of these programs is much more prone to 64-bit errors due to less stringent type control rules in the C language.
In C, you can use functions without first declaring them. Let us analyze an interesting example of a 64-bit error related to this. To begin, consider the correct version of the code in which the allocation and use of three gigabyte-sized arrays each occurs:
#includevoid test () { const size_t Gbyte = 1024 * 1024 * 1024; size_t i; char * Pointers [3]; // Allocate for (i = 0; i! = 3; ++ i) Pointers [i] = (char *) malloc (Gbyte); // Use for (i = 0; i! = 3; ++ i) Pointers [i] [0] = 1; // Free for (i = 0; i! = 3; ++ i) free (Pointers [i]); }
This code will correctly allocate memory, write to the first element of each array one by one, and free up occupied memory. The code works correctly on a 64-bit system.
Now delete or comment out the line "#include
Consider the assembly to . Generated by the Visual C ++ compiler for 64-bit version of the Debug First is the correct code that will be generated when there is classified malloc (attached file «stdlib.h») function:
Pointers [i] = (char *) malloc (Gbyte); mov rcx, qword ptr [Gbyte] call qword ptr [__imp_malloc (14000A518h)] mov rcx, qword ptr [i] mov qword ptr Pointers [rcx * 8], rax
Now consider a variant of incorrect code when there is no declaration of the malloc function:
Pointers [i] = (char *) malloc (Gbyte); mov rcx, qword ptr [Gbyte] call malloc (1400011A6h) cdqe mov rcx, qword ptr [i] mov qword ptr Pointers [rcx * 8], rax
Note the availability of the CDQE (Convert doubleword to quadword) statement. The compiler calculated that the result is in the eax register and expanded it to a 64-bit value to write to the Pointers array. Accordingly, the high bits of the rax register will be lost. Even if the address of the allocated memory lies within the first four gigabytes, in the case when the highest bit of the eax register is 1, we will still get an incorrect result. For example, the address 0x81000000 will turn into 0xFFFFFFFF81000000.
Example 8. The remains of dinosaurs in large and old programs
Large old software systems that have been developing for decades are replete with a variety of atavisms and simply pieces of code written using popular paradigms and styles over the years. In such systems, one can observe the evolution of the development of programming languages, when the oldest parts are written in the style of the C language, and in the latest ones you can find complex templates in the style of Alexandrescu.
Figure 8 - Dinosaur excavations
There are atavisms associated with 64-bit. Rather, atavisms that impede the operation of modern 64-bit code. Consider an example:
// beyond this, assume a programming error #define MAX_ALLOCATION 0xc0000000 void * malloc_zone_calloc (malloc_zone_t * zone, size_t num_items, size_t size) { void * ptr; ... if (((unsigned) num_items> = MAX_ALLOCATION) || ((unsigned) size> = MAX_ALLOCATION) || ((long long) size * num_items> = (long long) MAX_ALLOCATION)) { fprintf (stderr, "*** malloc_zone_calloc [% d]: arguments too large:% d,% d \ n", getpid (), (unsigned) num_items, (unsigned) size); return NULL; } ptr = zone-> calloc (zone, num_items, size); ... return ptr; }
Firstly, the function code contains a check for the permissible size of the allocated memory, which is strange for a 64-bit system. And secondly, the diagnostic message that is issued will be incorrect, because if we ask to allocate memory for 4,400,000,000 elements, due to the explicit casting to unsigned, we will get a strange message about the impossibility of allocating memory for only 105,032,704 elements.
Example 9. Virtual functions
One of the beautiful examples of 64-bit errors is the use of invalid argument types in virtual function declarations. And usually this is not someone’s sloppiness, but simply an “accident”, where there are no guilty parties, but there is a mistake. Consider the following situation.
Since time immemorial, the MFC library has a CWinApp class that has a WinHelp function:
class CWinApp { ... virtual void WinHelp (DWORD dwData, UINT nCmd); };
To show your own help in a user application, you had to block this function:
class CSampleApp: public CWinApp { ... virtual void WinHelp (DWORD dwData, UINT nCmd); };
And everything was fine until 64-bit systems appeared. MFC developers had to change the interface of the WinHelp function (and some other functions) as follows:
class CWinApp { ... virtual void WinHelp (DWORD_PTR dwData, UINT nCmd); };
In 32-bit mode, the types DWORD_PTR and DWORD coincided, but in 64-bit mode, no. Naturally, custom application developers should also change the type to DWORD_PTR, but to do this, you need to find out about this at the beginning. As a result, an error occurs in a 64-bit program, since the WinHelp function in the user class is not called (see Figure 9). Figure 9 - Error related to virtual functions
Example 10. Magic numbers as parameters
Magic numbers contained in the body of programs are bad style and cause errors. An example of magic numbers is 1024 and 768, which rigidly indicate the size of the screen resolution. In the framework of this article, we are interested in those magic numbers that can lead to problems in a 64-bit application. The most common numbers that are dangerous for 64-bit programs are presented in the table in Figure 10.
Figure 10 - Magic numbers that are dangerous for 64-bit programs
Let us demonstrate an example of working with the CreateFileMapping function found in one of the CAD systems:
HANDLE hFileMapping = CreateFileMapping ( (HANDLE) 0xFFFFFFFF, NULL PAGE_READWRITE, dwMaximumSizeHigh, dwMaximumSizeLow, name);
Instead of the correct reserved constant INVALID_HANDLE_VALUE, the number 0xFFFFFFFF is used. It is incorrect to Win64 program, which is set to a constant INVALID_HANDLE_VALUE 0xFFFFFFFFFFFFFFFF. The correct option to call the function would be:
HANDLE hFileMapping = CreateFileMapping ( INVALID_HANDLE_VALUE, NULL PAGE_READWRITE, dwMaximumSizeHigh, dwMaximumSizeLow, name);
Note. Some believe that the value 0xFFFFFFFF when expanded to a pointer turns into 0xFFFFFFFFFFFFFFFFFF. This is not true. According to C / C ++ rules, the value 0xFFFFFFFF is of type “unsigned int”, since it cannot be represented by type “int”. Accordingly, expanding to a 64-bit type, the value 0xFFFFFFFFFFu turns into 0x00000000FFFFFFFFFFu. But if you write like this (size_t) (- 1), then we get the expected 0xFFFFFFFFFFFFFFFF. Here, “int” is first expanded to “ptrdiff_t”, and then converted to “size_t”.
Example 11. Magic constants denoting size
Another common mistake is to use magic numbers to set the size of an object. Consider an example of allocating and zeroing a buffer:
size_t count = 500; size_t * values = new size_t [count]; // Only part of the buffer will be filled memset (values, 0, count * 4);
In this case, in a 64-bit system, more memory is allocated than is then filled with zero values (see Figure 11). The error is to assume that size_t is always four bytes. Figure 11 - Filling only part of the array. Correct option:
size_t count = 500; size_t * values = new size_t [count]; memset (values, 0, count * sizeof (values [0]));
Similar errors can be encountered when calculating the size of the allocated memory or data serialization.
Example 12. Stack Overflow
In many cases, a 64-bit program consumes more memory and stack. Allocating more memory on the heap is not dangerous, since this type of memory is available to a 64-bit program many times more than 32-bit. But increasing the used stack memory can lead to its unexpected overflow (stack overflow).
The mechanism for using the stack is different in different operating systems and compilers. We will consider the peculiarity of using the stack in Win64 code of applications built by the Visual C ++ compiler.
In the development of agreements on calls ( callingconventions) in Win64 systems decided to put an end to the existence of various options for calling functions. There were a number of calling conventions in Win32: stdcall, cdecl, fastcall, thiscall, and so on. In Win64, there is only one “native” calling convention. Modifiers like the __cdecl compiler are ignored.
The x86-64 calling convention is similar to the fastcall convention in x86. In the x64 convention, the first four integer arguments (from left to right) are passed in 64-bit registers selected specifically for this purpose:
RCX: 1st integer argument
RDX: 2nd integer argument
R8: 3rd integer argument
R9: 4- th integer argument
The remaining integer arguments are passed through the stack. The this pointer is considered an integer argument, so it is always placed in the RCX register. If floating-point values are passed, then the first four of them are transferred in the XMM0-XMM3 registers, and the subsequent ones through the stack.
Although arguments can be passed in registers, the compiler still reserves space for them on the stack, decreasing the value of the RSP register (stack pointer). At a minimum, each function should reserve 32 bytes on the stack (four 64-bit values corresponding to the registers RCX, RDX, R8, R9). This space on the stack makes it easy to save the contents of the registers passed to the function on the stack. The function being called is not required to dump the input parameters passed through the registers onto the stack, but reserving the place on the stack allows this, if necessary. If more than four integer parameters are passed, the corresponding additional space is reserved on the stack.
The described feature leads to a substantial increase in the rate of absorption of the stack. Even if the function has no parameters, 32 bytes will still be "bitten off" from the stack, which are then not used in any way. The meaning of using such an uneconomical mechanism is related to unification and simplification of debugging.
Let's pay attention to one more moment. The RSP stack pointer must be aligned at the 16 byte boundary before the next function call. Thus, the total size of the used stack when calling a function without parameters in a 64-bit code is 48 bytes: 8 (return address) + 8 (alignment) + 32 (reserve for arguments).
is it so bad? Not. It should not be forgotten that the larger number of registers available to the 64-bit compiler allow you to build more efficient code and not reserve memory on the stack for some local function variables. Thus, in some cases, the 64-bit version of the function uses less stack than the 32-bit version. This issue and various examples are discussed in more detail in the article " Reasons why 64-bit programs require more stack memory ."
It is impossible to predict whether a 64-bit program will consume more than a stack or less. Due to the fact that the Win64-program can use 2-3 times more stack memory, you need to play it safe and change the project setting, which is responsible for the size of the reserved stack. In the project settings, select the parameter Stack Reserve Size (switch / STACK: reserve) and increase the size of the reserved stack three times. By default, this size is 1 megabyte.
Example 13. A function with a variable number of arguments and buffer overflow
Although using functions with a variable number of arguments, such as printf, scanf is considered a bad style in C ++, they are still widely used. These functions create many problems when porting applications to other systems, including 64-bit systems. Consider an example:
int x; char buf [9];
sprintf (buf, "% p", & x);
The author of the code did not take into account that the size of the pointer in the future may be more than 32 bits. As a result, on a 64-bit architecture, this code will lead to a buffer overflow (see Figure 12). This error can be attributed to the use of the magic number '9', but in a real application, buffer overflows can occur without magic numbers. Figure 12 - Buffer overflow when working with the sprintf function. The options for fixing this code are different. The most efficient way is to refactor the code in order to get rid of the use of dangerous functions. For example, you can replace printf with cout, and sprintf with boost :: format or std :: stringstream.
Примечание. Эту рекомендацию часто критикуют разработчики под Linux, аргументируя тем, что gcc проверяет соответствие строки форматирования фактическим параметрам, передаваемым, например, в функцию printf. И, следовательно, использование printf безопасно. Однако они забывают, что строка форматирования может передаваться из другой части программы, загружаться из ресурсов. Другими словами, в реальной программе строка форматирования редко присутствует в явном виде в коде, и, соответственно, компилятор не может ее проверить. Если же разработчик использует Visual Studio 2005/2008/2010, то он не сможет получить предупреждение на код вида void *p = 0; printf("%x", p); даже используя ключи /W4 и /Wall.
Пример 14. Функция с переменным количеством аргументов и неверный формат
Often in programs you can find incorrect formatting strings when working with the printf function and other similar functions. Because of this, incorrect values will be displayed, which, although it will not lead to an abnormal termination of the program, it is, of course, an error:
const char * invalidFormat = "% u"; size_t value = SIZE_MAX; // Wrong value will be printed printf (invalidFormat, value);
In other cases, an error in the format string will be critical. Consider an example based on the implementation of the UNDO / REDO subsystem in one of the programs:
// Here the pointers were stored as a string int * p1, * p2; .... char str [128]; sprintf (str, "% X% X", p1, p2); // In another function, this line // processed as follows: void foo (char * str) { int * p1, * p2; sscanf (str, "% X% X", & p1, & p2); // Result - incorrect value of pointers p1 and p2. ... }
The format "% X" is not intended for working with pointers, and as a result, such a code is incorrect from the point of view of 64-bit systems. In 32-bit systems, it is quite functional, although not beautiful.
Example 15. Storage of integer values in double
We did not have to encounter such a mistake ourselves. Probably this mistake is rare, but quite real.
Type double, has a size of 64 bits, and is compatible with the IEEE-754 standard on 32-bit and 64-bit systems. Some programmers use the double type to store and work with integer types:
size_t a = size_t (-1); double b = a; --a; --b; size_t c = b; // x86: a == c // x64: a! = c
You can still try to justify this example on a 32-bit system, since the double type has 52 significant bits and is capable of storing a 32-bit integer value without loss. But when trying to save a 64-bit integer into double, the exact value may be lost (see Figure 13).
Figure 13 - The number of significant bits in the types size_t and double
The second part of the article.