How to transfer a project of 9 million lines of code to a 64-bit platform?

    64 bits?  Oh interesting!
    Recently, our team completed the migration to a 64-bit platform of a satisfied large project (9 million lines of code, 300Mb source). The project took a year and a half. Although we cannot give the name of the project due to the NDA, we hope that our experience will be useful to other developers.



    About Authors

    Many people know us as the authors of the PVS-Studio static code analyzer . This is really our main activity. However, in addition to this, we still take part in third-party projects as a team of experts. We call it the "sale of expertise." We recently published a report on working on Unreal Engine 4 code . Today is the time for the next progress report as part of the sale of our expertise.

    “Yeah, it means that they are doing very badly with PVS-Studio!”, May exclaim a reader who is following our activities. We are in a hurry to upset the audience, who is sensational. Participation in such projects is very important for our team, but for a completely different reason. Thus, we ourselves can more actively use our code analyzer in real life, rather than just when developing it. The real use of the analyzer in commercial projects, which tens or even hundreds of programmers are working on, gives amazing experience to the PVS-Studio team. We see how our tool is used, what difficulties arise and what needs to be changed or at least simply improved in our product.

    Therefore, we continue to plan to participate in such projects as part of the sale of our expertise. Write if you have a suitable project for us. In the meantime, we are pleased to present a report on code migration to a 64-bit platform.

    Introduction, or what is the problem? Project scope and team size

    At first glance, with the topic of code migration to the x64 platform, everything is already clear. The time-tested article “ Collection of examples of 64-bit errors in real programs ” was written by us in 2010. Our course " Lessons in the development of 64-bit applications in C / C ++ " - in 2012. It would seem, read, do as it is written, and everything will be fine. Why did the customer need to contact a third-party organization (us), and why even we spent a year and a half on the project? After all, if we do analysis of 64-bit problems within the framework of PVS-Studio, it seems that we should understand the topic? Of course, we understand, and this was the main reason that the customer contacted us. But why did the customer even have an idea to turn to someone about 64-bit migration?

    Let's first describe the project and the customer. Since the NDA prohibits speaking directly, we give only quantitative characteristics. The project we have been working on is about 20 years old. Currently, several dozen developers work on it daily. Clients are large companies, sales are sporadic, as the product is very niche and highly specialized.

    Well and the most important thing is the size. 9 million lines of code, 300Mb of source code, a thousand projects in the solution (.sln) is VERY much. Platform - Windows only. But even with such a project, 64-bit migration seems to be understandable. In order to transfer such a project to x64, you just need to:
    • stop development for several months;
    • quickly replace data types with 64-bit;
    • check that everything works correctly after replacement;
    • You can return to development.
    Why is the first paragraph “stop development completely”? Yes, because 64-bit migration requires, of course, the replacement of some data types with 64-bit ones. If you create a separate branch in a project of this size and make all the necessary changes there, then it will be impossible to merge the code (execute merge)! Do not forget about the volume of the project and dozens of programmers who write new code every day.

    Due to business restrictions, the customer could not stop the development process. His customers constantly need new releases, bug fixes, accessibility features, etc. Stopping development under such conditions means stopping the business. Therefore, the customer began to look for a team that can complete the migration without stopping the development process. We became such a team, as our competence in 64-bit development is confirmed by the PVS-Studio code analyzer and articles on this topic.

    We completed the migration in a year and a half. For our part, two people took part in the project for the first six months, then four more people for another year. Why so long? For the first six months, two people were involved in setting up the infrastructure, getting to know the project, and testing specific migration methods. Then, six months later, when the task became more specific, more people joined the project and already 4 people completed the migration in a year.

    How to transfer a project to a 64-bit system?

    Transferring a project to a 64-bit platform, by and large, consists of the following two steps:
    1. Creating a 64-bit configuration, obtaining 64-bit versions of third-party libraries, and building the project.
    2. Correction of code that leads to errors in the 64-bit version. This item is almost completely reduced to the fact that you need to replace 32-bit types with memsize types in the program code.
    Recall that by memsize types we mean types of variable dimension. These types are 4 bytes on a 32-bit system and 8 bytes on a 64-bit system.

    Porting a large and actively developing project should not interfere with the current development, so we took the following measures. Firstly, we did all our edits in a separate branch so as not to break the main assembly. When the next set of our changes was ready and tested, we combined our changes with the main branch. And secondly, we did not change hard 32-bit types to memsize types. We introduced our types and replaced them. This was done in order to avoid potential problems, such as, for example, calling another implementation of an overloaded function, as well as to be able to quickly roll back our changes. Types were introduced in approximately this way:
        #if defined(_M_IX86)
            typedef long MyLong;
            typedef unsigned long MyULong;
        #elif defined(_M_X64)
            typedef ptrdiff_t MyLong;
            typedef size_t MyULong;        
        #else
            #error "Unsupported build platform"
        #endif

    We want to emphasize again that we changed the types not to size_t / ptrdiff_t and similar types, but to our own data types. This gave great flexibility and the ability to easily track those places that are already ported, from those where so far "no man has gone before."

    Possible approaches to migration: their pros and cons, what were we wrong

    The first idea of ​​porting the project was as follows: first, replace all 32-bit types with memsize types except where 32-bit types needed to be left (for example, these are structures representing data formats, functions that process such structures), and then bring the project into working condition. We decided to do so in order to immediately eliminate as many 64-bit errors as possible and do it in one pass, and then correct all remaining warnings of the compiler and PVS-Studio. Although this method works for small projects, in our case it did not work. First, type substitutions took too much time and led to a lot of changes. And secondly, no matter how we tried to do it carefully, we nevertheless corrected by mistake structures with data formats. As a result

    So, the first plan involved the following sequence of actions.
    1. Creating a 64-bit configuration.
    2. Compilation.
    3. Replacing most 32-bit types with 64-bit ones (or rather memsize types).
    4. Link with third-party libraries.
    5. Application Launch.
    6. Edit remaining compiler warnings.
    7. Editing the remaining 64-bit errors detected by the PVS-Studio analyzer.
    And this plan was declared unsuccessful. We completed the first five points, and all of our changes in the source code had to be rolled back. We wasted several months of work.

    Now we decided to get the working 64-bit version of the application as quickly as possible, and then fix the obvious 64-bit errors. Our plan now excluded mass type substitutions and suggested editing only explicit 64-bit errors:
    1. Creating a 64-bit configuration.
    2. Compilation.
    3. Link with third-party libraries.
    4. Application Launch.
    5. Editing compiler warnings.
    6. Editing the highest priority 64-bit errors that the PVS-Studio analyzer will detect.
    This time we got the first working version of the application much faster, including because third-party libraries were already compiled, and interface templates loaded correctly. I must say that the application basically worked stably enough, which surprised us. We found only a few drops during the first test.

    Next, we had to fix the compiler warnings and 64-bit warnings of PVS-Studio in order to eliminate the found and potential crashes. Since the total number of 64-bit warnings of PVS-Studio was calculated in the thousands, we decided to fix only the most basic ones: implicit conversions of memsize types to 32-bit types (V103, V107, V110), conversions of pointers to 32-bit types, and vice versa ( V204, V205), suspicious conversion chains (V220, V221), type matching in the parameters of virtual functions (V301) and replacing obsolete functions with new versions (V303). You can find a description of all these diagnostics in the documentation .

    In other words, the task of this stage is to fix all 64-bit PVS-Studio messages of only the first level (level 1). These are the most important diagnostics. And to run a 64-bit application, all 64 L1 errors must be fixed.

    Most of these edits came down to replacing 32-bit types with memsize types, as in the first approach. But this time, unlike the first approach, these replacements were selective and iterative. This was due to the fact that the editing of parameter types of a function dragged along the editing of types of local variables and the return value, which in turn led to the editing of parameter types of other functions. And so on until the process came together.

    Another minus of this approach compared to the first one is that in this way we corrected only the main 64-bit errors. For example, we did not correct the types of cycle counters. In the vast majority of cases, this was not necessary. And this does not lead to errors, but perhaps somewhere it had to be done, and we missed such places and will not find it with our approach. In other words, maybe something else will have to be fixed over time.

    When porting the application, we also needed to get 64-bit versions of third-party libraries. In the case of open source libraries, we tried to compile them from the same sources from which 32-bit versions were compiled. This was due to the fact that we wanted to save possible changes in the code of third-party libraries, if there were any, and we also needed to collect them, if possible, in the same configuration in which they were collected for the 32-bit version. For example, some libraries were compiled with the option not to consider wchar_t a built-in type or with unicode support disabled. In such cases, we had to tinker a bit with the assembly parameters before we could understand why our project could not link to them. Some libraries did not provide for the assembly under the 64-bit version. And in this case, we had to either convert them ourselves, or download a newer version with the possibility of assembly for a 64-bit platform. In the case of commercial libraries, we either asked for a 64-bit version, or looked for a replacement for libraries no longer supported, as in the case of xaudio.

    We also needed to get rid of all assembler inserts, since the assembler is not supported in the 64-bit version of the Visual C ++ compiler. In this case, we either used intrinsic functions where this could be done, or we rewrote the code in C ++. In some cases, this did not lead to performance degradation, for example, if 64-bit MMX registers were used in 32-bit assembler code, then in the 64-bit version, all registers are 64-bit.

    How long does it take to fix 64-bit errors in such a project

    At the beginning of work on a large project, it is difficult to say how long porting will take. At the first stage, we took a significant amount of time building third-party libraries, setting up the environment for the daily assembly of the 64-bit version and running the tests. When the work on the first part of the projects was completed, we were able to assess the speed with which we work, by the amount of ported code for a certain period.

    Examples of 64-bit problems we encountered

    The most common mistake when porting to a 64-bit platform was the explicit conversion of pointers to 32-bit types, for example, to DWORD. In such cases, the solution was to replace with the memsize type. Code example:
    MMRESULT m_tmScroll = timeSetEvent(
      GetScrollDelay(), TIMERRESOLUTION, TimerProc, 
      (DWORD)this, TIME_CALLBACK_FUNCTION);

    Errors also appeared when changing the parameters of virtual functions in the base class. For example, in CWnd :: OnTimer (UINT_PTR nIDEvent), the parameter type changed from UINT to UINT_PTR with the advent of the 64-bit version of Windows, and accordingly, in all the heirs in our project, we also had to perform this replacement. Code example:
    class CConversionDlg : public CDialog {
    ...
    public:
      afx_msg void OnTimer(UINT nIDEvent);
    ...
    }

    Some WinAPI functions support working with large amounts of data, such as CreateFileMapping and MapViewOfFile. And we adapted the code accordingly:

    It was:
    sharedMemory_ = ::CreateFileMapping(
      INVALID_HANDLE_VALUE, // specify shared memory file
      pSecurityAttributes,  //NULL, // security attributes
      PAGE_READWRITE,       // sharing
      NULL,                 // high-order DWORD of the file size
      sharedMemorySize,     // low-order DWORD of the file size
      sharedMemoryName_.c_str());

    It became:
    #if defined(_M_IX86)
      DWORD sharedMemorySizeHigh = 0;
      DWORD sharedMemorySizeLow = sharedMemorySize;
    #elif defined(_M_X64)
      ULARGE_INTEGER converter;
      converter.QuadPart = sharedMemorySize;
      DWORD sharedMemorySizeHigh = converter.HighPart;
      DWORD sharedMemorySizeLow = converter.LowPart;
    #else
      #error "Unsuported build platform"
    #endif
      sharedMemory_ = ::CreateFileMapping(
        INVALID_HANDLE_VALUE,   // specify shared memory file
        pSecurityAttributes,  //NULL, // security attributes
        PAGE_READWRITE,       // sharing
        sharedMemorySizeHigh, // high-order DWORD of the file size
        sharedMemorySizeLow,  // low-order DWORD of the file size
        sharedMemoryName_.c_str());

    Even in the project, there were found places for using functions that are considered obsolete in the 64-bit version and should be replaced with the corresponding new implementations. For example, GetWindowLong / SetWindowLong should be replaced with GetWindowLongPtr / SetWindowLongPtr.

    PVS-Studio finds all the above examples and other types of 64-bit errors.

    The role of the PVS-Studio static analyzer during 64-bit migration

    Some of the potential errors when migrating to a 64-bit platform are found by the compiler and give appropriate warnings. PVS-Studio copes with this task better, since the tool was originally developed in order to find all such errors. For more details about which 64-bit errors PVS-Studio finds and the Visual Studio compiler and static analyzer cannot find, you can read in the article “ 64-bit code in 2015: what's new in diagnosing possible problems? ”.

    I would like to draw attention to another important point. Using the static analyzer regularly, we could constantly observe how old ones disappear, and sometimes new 64-bit errors are added. After all, dozens of programmers constantly rule the code. And sometimes they can make a mistake and introduce a 64-bit error into a project that is already adapted to x64. If it weren’t for static analysis, it would be impossible to say how many errors were corrected, how many were introduced, and at what stage we are now. Thanks to PVS-Studio, we built graphs that helped us have an idea of ​​progress. But this is a topic for a separate article.

    Conclusion

    In order for the 64-bit migration of your project to go as calmly as possible, the sequence of steps should be as follows:
    1. To study the theory (for example, our articles).
    2. Find all 64-bit libraries that are used in the project.
    3. Build the 64-bit version as quickly as possible, which is compiled and linked.
    4. Correct all 64-bit messages of the first level of the PVS-Studio analyzer (64 L1).

    What to read about 64-bit migration?


    1. A collection of examples of 64-bit errors in real programs .
    2. Lessons for developing 64-bit C / C ++ applications .
    3. C ++ 11 and 64-bit errors
    4. 64-bit code in 2015: what's new in diagnosing possible problems?



    If you want to share this article with an English-speaking audience, please use the translation link: How to Port a 9 Million Code Line Project to 64 bits? .

    Also popular now: