
Problems testing 64-bit applications

The note addresses a number of issues related to software testing. The difficulties that a developer of resource-intensive applications may face and ways to overcome them are indicated. The terms “resource-intensive” and “64-bit” applications are considered synonyms in the article.
The size of the average computer program is growing steadily year by year. Programs are becoming more complicated and confusing, they are processing more and more data, and they are acquiring an increasingly functional and beautiful graphical interface. Previously, a program of several kilobytes in size with the simplest editing capabilities was considered a full-fledged text editor, but now word processors occupy tens and hundreds of megabytes, providing great functionality in the same order. Naturally, the requirements for the performance of the hardware of computer technology are growing at the same speed.
The next step in increasing computing power was the transition to 64-bitmicroprocessor systems. This step cannot be called revolutionary, but it can significantly expand the capabilities of computer systems. First of all, 64-bit systems allowed to overcome the 4GB barrier, which has already begun to limit many software developers. This primarily concerns developers of numerical simulation packages, three-dimensional editors, DBMSs, and games. A large amount of RAM significantly expands the capabilities of applications, allowing you to store large amounts of data and access them directly, without loading them from external data stores. Do not forget about the higher performance of 64-bit versions of programs, due to the large number of registers, advanced floating-point arithmetic, and the ability to work with 64-bit numbers.
The increasing complexity of software solutions accordingly complicates the task of maintaining and testing them. The inability to manually test large software systems gave impetus to the development of test automation systems and program quality control. There are various approaches to ensuring the required quality of software tools, and we will briefly recall them.
The oldest, proven and reliable approach to finding defects is a joint code review (English code review ) [ 1 ]. This technique is based on the joint reading of code with the implementation of a number of rules and recommendations, well described, for example, in Steve McConnell's book “Perfect Code” ( Steve McConnell, “Code Complete”) [2]. Unfortunately, this practice is not applicable for large-scale verification of modern software systems due to their large volume. Although this method gives the best results, it is not always used in the conditions of modern software development life cycles, where the development time and the time to market for the product are important. Therefore, code review most often comes down to infrequent meetings, the purpose of which is to teach new and less experienced employees how to write high-quality code, rather than checking the performance of a number of modules. This is a very good way to improve the skills of programmers, but it cannot be considered as a full-fledged means of quality control of the developed program.
To help developers who are aware of the need to regularly review the code, but do not have enough time, static code analysis tools come in [ 3]. Their main task is to reduce the amount of code that requires human attention, and thereby reduce the time it takes to view it. Static code analyzers include a fairly large class of programs implemented for various programming languages and having a diverse set of functions, from the simplest control of code alignment to complex analysis of potentially dangerous places. The systematic use of static analyzers can significantly improve the quality of the code and find many errors. The approach based on static analysis has many fans and many interesting works have been devoted to it (for example [ 4 ], [ 5 ]). The advantage of this approach is that it does not depend on the complexity and size of the developed software solution.
In addition, a noteworthy way to improve the quality of software products is the method of selective testing. This technique is based on a well-known and intuitive way to test only those parts of software products that were directly affected by the changes. The main problem of using selective testing is obtaining a reliable list of all parts of a software product affected by changes. The selective testing technique supported, for example, by the Testing Relief software product, solves this problem.
White Box Method [ 6]. By testing with the white box method we mean the execution of the maximum available number of different branches of code using a debugger or other means. The more code coverage has been achieved, the more fully tested. White box testing also sometimes means simply debugging an application to find a known bug. Full-fledged white-box testing of all program code has long been impossible due to the huge amount of modern program code. Now testing by the white box method is convenient to use at the stage when an error is found and it is necessary to understand the cause of its occurrence. White box testing has opponents who deny the usefulness of real-time program debugging. The main motive is that the ability to observe the program’s progress and at the same time make changes to its state gives rise to an unacceptable programming approach based on a large number of code corrections by trial and error. We will not touch on these disputes, but we note that white-box testing is in any case a very expensive way to improve the quality of large and complex software systems.
The black box method has proven itself much better [ 7 ]. Unit testing (English unit test ) [ 8 ] can also be attributed to this . The main idea of the method is to write a set of tests for individual modules and functions, checking all the main modes of their operation. A number of sources refer unit testing to the white box method, because it is based on knowledge of the program device. The author adheres to the position that the tested functions and modules should be considered as black boxes, since unit tests should not take into account the internal structure of the function. The justification for this can be such a methodology, when tests are developed before writing the functions themselves, which helps to increase control over their functionality from the point of view of specification.
A lot of literature is devoted to the unit testing method, for example [ 9 , 10]. Unit testing has proven itself both in the development of simple and complex projects. One of the advantages of unit testing is that you can easily check the correctness of corrections made to the program right during development. They try to make sure that all tests pass within a few minutes, which allows the developer who made changes to the code to immediately notice the error and fix it. If the run of all tests is impossible, then usually long tests are carried out separately and run, for example, at night. It also helps to quickly detect errors, at least the next morning.
Manual testing. This is perhaps the final stage of any development, but it should not be regarded as a good and reliable technique. Manual testing must necessarily exist, since it is impossible to detect all errors in automatic mode or by viewing the code. But counting on this method is especially not worth it. If the program is of poor quality and a large number of internal defects, its testing and correction can take a very long time, and all the same, it is impossible to ensure the proper quality of the program. The only way to get a quality program is with a quality code. Therefore, we will not consider manual testing as a full-fledged methodology when developing large projects.
So, what do we have left that deserves the most attention when developing large software systems? This is a static analysis and unit testing. These approaches can significantly improve the quality and reliability of program code, and they should be given the most attention, although, of course, do not forget about others.
Now let us turn to the question of testing 64-bit programs, since the application of the methods we have chosen encounters several unpleasant difficulties. Let's start with static code analyzers.
Strange as it may seem, despite all its enormous capabilities, a long development period and practical use, static analyzers were not well prepared to search for errors in 64-bit programs. Let us consider the situation using the example of analysis of C ++ code as an area where static analyzers have found the greatest application. Many static analyzers support a number of rules related to the search for code that has incorrect behavior when porting it to 64-bit systems. But they realize this by very scattered methods and is very incomplete. This was especially evident after the beginning of mass development of applications for the 64-bit version of Windows in Microsoft Visual C ++ 2005.
This can be explained by the fact that most of the checks are based on fairly old materials on the problems of porting programs to 64-bit systems from the point of view of the C language. As a result, a number of constructions that appeared in the C ++ language were deprived of attention from the point of view of portability control and were not reflected in the analyzers. A number of other changes are not taken into account, such as, for example, a significantly increased amount of RAM and the use of different data models in different compilers (LP64, LLP64, ILP64 [ 11 ]).
For clarity, consider a couple of examples:
double * DoubleArray; unsigned Index = 0; while (...) DoubleArray [Index ++] =;
You will not be able to receive a warning for such code even with such powerful analyzers as Parasoft C ++ test and Gimpel Software PC-Lint . No wonder. The above code does not cause any suspicion among the average developer who is accustomed to the practice of using variables of the int or unsigned type as indices. Unfortunately, the given code on a 64-bit system will be inoperable if the size of the processed DoubleArray array exceeds the size of 4Gb elements. In this case, the Index variable will overflow, and the result of the program will be incorrect. The correct option would be to use the type size_t when programming under Windows x64 (LLP64 data model) or size_t / unsigned long when programming under Linux (LP64 data model).
The reason why static analyzers cannot diagnose such code is probably hiding in the fact that when we studied the issues of transferring to 64-bit systems, hardly anyone imagined arrays of more than 4 billion elements. And 4 billion elements of type double is 4 * 8 = 32 gigabytes of memory for one array. A huge amount, especially when you consider that this is 1993-1995. It was during this time that most publications and discussions devoted to the use of 64-bit systems came to be.
As a result, no one paid attention to possible incorrect indexing when using the int type, and in the future, transfer issues were rarely studied. And practically no static analyzer will give a warning on the given code. Perhaps the only exception is the Viva64 analyzer, which is part of PVS-Studio. It was developed in order to close the gaps in the diagnosis of 64-bit C / C ++ code by other analyzers, and is based on newly conducted research. But it has a significant drawback, namely that it is not a general-purpose analyzer. He specializes only in the analysis of errors that occur when porting code to 64-bit Windows systems, and therefore should only be used in combination with other analyzers to ensure proper quality of the code.
Consider another example:
char * p; long g = (long) p;
With this simple example, you can check which data models your static analyzer can understand. The problem is that most of them are designed only for the LP64 data model. This is again caused by the history of the development of 64-bit systems. It was the LP64 data model at the initial stages of the development of 64-bit systems that gained the greatest popularity and is now widely used in the Unix world. In this data model, the long type is 8 bytes in size, which means that such a code is completely correct. But on 64-bit Windows systems, the LLP64 data model is implemented, where the long size remains 4-byte and, therefore, the above code will be incorrect. On Windows, for example, you should use the type LONG_PTR or ptrdiff_t.
Fortunately, the above code will be diagnosed as dangerous by both the Microsoft Visual C ++ 2005 compiler and the Viva64 analyzer. But you should always remember about these pitfalls when using static analyzers.
It turned out to be an interesting situation. The issue of porting programs to 64-bit systems was discussed in detail, various methods and verification rules were implemented in static analyzers, after which the interest in this topic faded. Many years have passed, a lot has changed, but the rules by which the analysis is carried out remain unchanged and modifications. What caused this is difficult to explain. Perhaps the developers simply do not notice the changes, believing that the issue of testing and checking 64-bit applications has long been resolved. I want you to not fall into the same trap. Be careful. What was relevant 10 years ago now may not be one, but a lot of new things have appeared. Using static analysis tools, make sure that they are compatible with your 64-bit data model. If the analyzer does not satisfy the necessary conditions, Do not be too lazy to look for another or fill in the gap using the Viva64 narrowly targeted analyzer. The efforts spent on this will more than pay off by increasing the reliability of your program, reducing the time for debugging and testing.
Now let's talk about unit tests. With them on 64-bit systems, we also have a number of unpleasant moments. In an effort to reduce the test execution time, they are trying to use a small amount of computation and the amount of processed data during their development. For example, when developing a test for the function of finding an element in an array, it does not matter much whether it will process 100 elements or 10,000,000. A hundred elements will be enough, but compared to the processing of 10.000.000 elements, the speed of the test can be significantly higher. But if you want to develop full-fledged tests to test this function on a 64-bit system, you will need to process more than 4 billion elements! It seems to you that if a function works on 100 elements, will it work on billions? Not. If you don’t believe
bool FooFind (char * Array, char Value, size_t Size) { for (unsigned i = 0; i! = Size; ++ i) if (i% 5 == 0 && Array [i] == Value) return true; return false; } #ifdef _WIN64 const size_t BufSize = 5368709120ui64; #else const size_t BufSize = 5242880; #endif int _tmain (int, _TCHAR *) { char * Array = (char *) calloc (BufSize, sizeof (char)); if (Array == NULL) std :: cout << "Error allocate memory" << std :: endl; if (FooFind (Array, 33, BufSize)) std :: cout << "Find" << std :: endl; free (Array); }
As you can see from the example, if your program on a 64-bit system starts processing more data, then you should not rely on old sets of unit tests. They should be expanded to include the processing of large amounts of data.
But, unfortunately, writing new tests is not enough. Here we are faced with the problem of the speed of execution of a modified test suite covering the processing of large amounts of data. The first consequence will be that such tests cannot be added to the set of tests launched by the developer during development. Entering them into nightly tests can also be difficult. The total execution time of all tests can grow by an order or two, or even more. As a result, the test run time can exceed even 24 hours. Keep this in mind and approach the finalization of tests for the 64-bit version of the program in all seriousness.
The way out of this situation may be the splitting of all tests into several groups, performed
in parallel on several computers. Multiprocessor systems can also be used. Of course, this will complicate the testing system somewhat and require additional hardware resources, but this will be the most correct and, as a result, a simple step to solving the problem of building a unit testing system.
Naturally, you will need to use an automated testing system that allows you to organize the launch of tests on several machines. An example is the AutomatedQA TestComplete Windows Automation Testing System . With its help, it is possible to perform distributed testing of applications on several workstations, to synchronize and collect results [ 12 ].
It can also greatly simplify life using modern tools like Intel Parallel Studio . Since in resource-intensive applications part of the code can be written by some scientist who is not a strong master in programming technique, it is better to look for errors in such code with a tool than without it.
In conclusion, I would like to once again return to the question of testing by the white box method, which we considered unacceptable for large systems. It should be added that when debugging 64-bit applications that process large data arrays, this method becomes completely inapplicable. Debugging such applications can take much longer or be difficult on the developer's machines. Therefore, it is worth considering in advance the possibility of using logging systems for debugging applications or using other methods. For example, remote debugging.
Summing up, I want to say that you should not rely on a separate methodology. A quality application can only be developed using several of the testing approaches considered.
Summarizing the problems of developing 64-bit systems, I want to remind you again of key points:
- Be prepared for surprises when developing and testing 64-bit applications.
- Be prepared that debugging 64-bit applications using the white box method may become impossible or extremely difficult if large amounts of data are processed.
- Take a close look at the capabilities of your static analyzer. And if it does not satisfy all the necessary requirements, do not be too lazy to find another or use an additional static analyzer.
- Do not trust the old sets of unit tests. Be sure to review them and add new tests that take into account the features of 64-bit systems.
- Remember to significantly slow down unit test suites and take care of new computers in advance to run them.
- Use a test automation system that supports distributed application launch, like the TestComplete system, which provides quick application verification.
- The best result can be achieved using a combination of different techniques.
The comments are very interesting to read the opinions of people involved in the development of resource-intensive applications.
Bibliographic list
- Wikipedia, " Code review ".
- Steve McConnell, "Code Complete, 2nd Edition" Microsoft Press, Paperback, 2nd edition, Published June 2004, 914 pages, ISBN: 0-7356-1967-0.
- Wikipedia, " Static code analysis ".
- Scott Meyers, Martin Klaus " A First Look at C ++ Program Analyzers. ", 1997.
- Walter W. Schilling, Jr. and Mansoor Alam. " Integrate Static Analysis Into a Software Development Process ", 01, 2006.
- Wikipedia, " White box testing ".
- Wikipedia, " Black box testing ".
- Wikipedia, " Unit testing ".
- Justin Gehtland, " 10 Ways to Make Your Code More Testable ", July, 2004.
- Paul Hamill, “Unit Test Frameworks”, November 2004, 212 pages, ISBN 10: 0-596-00689-6
- Andrew Josey, " Data Size Neutrality and 64-bit Support ."
- AutomatedQA, " TestComplete - Distributed Testing Support ".