How we translated 10 million lines of C ++ code to the C ++ 14 standard (and later to C ++ 17)

Some time ago (in the fall of 2016), when developing the next version of the 1C: Enterprise technology platform within the development team, the question arose of supporting the new C ++ 14 standard in our code. The transition to the new standard, as we expected, would allow us to write many things more elegantly, more simply and more reliably, simplified support and maintenance of the code. And the translation seems to be nothing extraordinary, if not for the scale of the code base and the specific features of our code.

For those who do not know, 1C: Enterprise is an environment for the rapid development of cross-platform business applications and runtime for their execution in different operating systems and DBMSs. In general terms, the product includes:

Cluster of application servers running on Windows and Linux
The client working with the server via http (s) or its own binary protocol, runs on Windows, Linux, macOS
Web client running in Chrome, Internet Explorer, Microsoft Edge, Firefox, Safari (written in JavaScript) browsers
Development Environment ( Configurator ), runs on Windows, Linux, macOS
Application Server Administration Tools , run on Windows, Linux, macOS
Mobile client connecting to the server via http (s), works on mobile devices running Android, iOS, Windows
Mobile platform - a framework for creating offline mobile applications with the ability to synchronize, working on Android, iOS, Windows
Development environment 1C: Enterprise Development Tools , written in Java
Interaction System Server

We try to maximally write one code for different operating systems - the server code base is 99% common, and the client approximately 95%. The 1C: Enterprise technology platform is primarily written in C ++ and the following are approximate characteristics of the code:

10 million lines of C ++ code
14 thousand files
60 thousand classes
half a million methods.

And all this economy should be transferred to C ++ 14. We will tell you today how we did it and what we encountered in the process.

Disclaimer

Everything written below about slow / fast work, (not) high memory consumption by implementations of standard classes in various libraries means one thing: this is true FOR US. Quite possibly, standard implementations are best suited for your tasks. We were repelled by our tasks: we took typical data for our customers, drove typical scenarios on them, looked at the speed, the amount of memory consumed, etc., and analyzed whether we and our customers are satisfied with such results or not. And acted depending on.

What we had

Initially, we wrote the code for the 1C: Enterprise 8 platform in Microsoft Visual Studio. The project began in the early 2000s and we had a version only for Windows. Naturally, since then the code has been actively developed, many mechanisms have been completely rewritten. But the code was written according to the 1998 standard, and, for example, our right-angle brackets were separated by spaces in order for the compilation to succeed, like this:

vector<vector<int> > IntV;

In 2006, with the release of version 8.1 of the platform, we began to support Linux and switched to the third-party STLPort standard library.. One of the reasons for the transition was working with wide lines. In our code, we everywhere use std :: wstring, based on the type of wchar_t. Its size in Windows is 2 bytes, and in Linux the default is 4 bytes. This led to the incompatibility of our binary protocols between the client and the server, as well as various persistent data. With gcc options you can specify that the size of wchar_t when compiling was also 2 bytes, but then you can forget about using the standard library from the compiler. it uses glibc, and that one in turn is compiled for 4-byte wchar_t. Other reasons were better implementation of standard classes, support for hash tables, and even emulation of the semantics of movement within containers, which we actively used. And another reason, as they say last but not least, was string performance. We had our own class for strings, because because of the nature of our software, string operations are used very widely and this is critical for us.

Our line is based on the ideas of optimizing the lines expressed by Andrei Alexandrescu in the early 2000s . Later, when Alexandrescu worked on Facebook, with his submission in the Facebook engine, a string was used that works on similar principles (see the folly library ).

Our line used two main optimization technologies:

For short values, an internal buffer is used in the string object itself (not requiring additional memory allocation).
For all others, use the mechanic Copy On Write . The value of the string is stored in one place; when assigning / modifying, the reference count is used.

To speed up the compilation of the platform, we excluded the implementation of stream (which we did not use) from our version of STLPort, this gave us a compilation speed of about 20%. Subsequently, we had to use Boost to a limited extent . Boost actively uses stream, in particular, in its service APIs (for example, for logging), so we had to modify it, excluding the use of stream from it. This, in turn, made it difficult for us to switch to new versions of Boost.

Third way

In the transition to the standard C ++ 14, we considered the following options:

Raise our modified STLPort to C ++ 14 standard. The option is very difficult, because STLPort support was discontinued in 2010, and we would have to pick up all of its code on our own.
Switch to another STL implementation compatible with C ++ 14. It is highly desirable that this implementation be under Windows and Linux.
When compiling for each OS, use the library built into the appropriate compiler.

The first option was rejected immediately because of too much work.

We thought about the second option for a while; they considered libc ++ as a candidate , but at that time it did not work under Windows. To port libc ++ to Windows, you would have to do a lot of work — for example, writing everything related to threads, thread synchronization, and atomicity, because the POSIX API was used in libc ++ .

And we chose the third way.

Transition

So, we had to replace the use of STLPort with the libraries of the corresponding compilers (Visual Studio 2015 for Windows, gcc 7 for Linux, clang 8 for macOS).

Fortunately, our code was written mainly on guidelines and did not use all sorts of tricky tricks, so the migration to new libraries proceeded relatively smoothly, using scripts to replace the names of types, classes, neymspeysov and incluses in the source files. The migration affected 10,000 source files (out of 14,000). wchar_t was replaced by char16_t; we decided to stop using wchar_t, because char16_t on all operating systems takes 2 bytes and does not spoil code compatibility between Windows and Linux.

Not without a little adventure. For example, in STLPort, an iterator could implicitly be casted to a pointer to an element, and in some places in our code this was used. In the new libraries it was no longer possible to do this, and these places had to be analyzed and rewritten manually.

So, the code migration is complete, the code is compiled for all OS. It's time to test.

Tests after the transition showed a decrease in performance (in some places up to 20-30%) and an increase in memory consumption (up to 10-15%) compared with the old version of the code. This was, in particular, due to the suboptimal work of standard strings. Therefore, the line again we had to use our own, slightly modified.

An interesting feature of the container implementation in embedded libraries was also revealed: empty (without elements) std :: map and std :: set from the built-in libraries allocate memory. And due to the implementation peculiarities, quite a few empty containers of this type are created in some places of the code. They allocate standard memory containers a little, for one root element, but for us it turned out to be critical - in a number of scenarios we have significantly decreased performance and memory consumption has increased (compared to STLPort). Therefore, in our code we replaced these two types of containers from the built-in libraries with their implementation from Boost, where these containers did not have such a feature, and this solved the problem with slowing down and increased memory consumption.

As often happens after large-scale changes in large projects, the first iteration of the source code did not work without problems, and here we were very useful, in particular, to support debugging iterators in the Windows implementation. Step by step, we moved forward, and by the spring of 2017 (version 8.3.11 of 1C: Enterprise) the migration was complete.

Results

The transition to standard C ++ 14 took us about 6 months. Most of the time, one (but very highly qualified) developer worked on the project, and at the final stage representatives of the teams responsible for specific areas — the UI, server cluster, development and administration tools, etc. — connected.

The transition greatly simplified our work on migration to the latest versions of the standard. So, version 1C: Enterprise 8.3.14 (in development, the release is scheduled for the beginning of the next year) has already been translated to standard C ++ 17 .

After migration, developers have more options. If earlier we had our own modified version of STL and one namespace std, now we have standard classes from the built-in compiler libraries in std namespace, our stdx-optimized lines and containers in boost, our latest version of boost. And the developer uses those classes that are best suited for solving his problems.

It also helps in the development of the native implementation of move constructors for a number of classes. If the class has a displacement constructor and this class is placed in a container, then STL optimizes copying of elements inside the container (for example, when the container expands and you need to change the capacity and allocate memory).

A spoon of tar

Perhaps the most unpleasant (but not critical) consequence of migration - we are faced with an increase in obj-files, and the full result of the build with all the intermediate files began to occupy 60 - 70 GB each. This behavior is associated with the features of modern standard libraries that have become less critical of the size of the generated service files. This does not affect the operation of the compiled application, but delivers a number of inconveniences in the development, in particular, increases the compilation time. Requirements to free disk space on build servers and developer machines are also increasing. Our developers are working in parallel on several versions of the platform, and hundreds of gigabytes of intermediate files sometimes create difficulties in their work. The problem is unpleasant, but not critical, we have postponed its decision for now. As one of the solutions we consider the unity build technique (it, in particular, is used by Google when developing the Chrome browser).

Tags: