Badoo switched to PHP7 and saved $ 1M

    Badoo switched to PHP7 and saved $ 1M

    We did it! Several hundred of our application servers have been translated into PHP7 and feel great. As far as we know, this is the second transition to a PHP7 project of this magnitude (after Etsy). In the process, we found some very unpleasant bugs in the PHP7 bytecode cache system, but they were fixed. And now, cheers! - good news for the whole PHP community: PHP7 is really ready for production, stable, consumes significantly less memory and gives a very good performance boost. Below we will tell you in detail how we switched to PHP7, what difficulties we encountered, how we fought with them and what results we got. But let's start with a short introduction.

    The opinion that the bottleneck in web projects is the database is one of the most common misconceptions. A well-designed system is balanced - when the input load increases, all parts of the system hold the blow, and when thresholds are exceeded, everything starts to slow down: both the processor and the network part, and not just the drives on the bases. In this reality, the processor power of the application-cluster is almost the most important characteristic. In many projects, this cluster consists of hundreds or even thousands of servers, so the "tuning" of the processor load on the application cluster is more than justified economically (a million dollars in our case).

    The processor in PHP web applications "eats" as much as any high-level dynamic language - a lot. But over the years, PHP developers have had a special sadness (and an occasion for the strongest “trolling” from other communities) - the absence in PHP of an “honest” JIT or at least a generator into compiled text in languages ​​like C and C ++. The inability of the community to provide such solutions within the framework of the main project gave rise to an unpleasant trend: large players began to come up with their own solutions. This is how HHVM appeared on Facebook, KPHP on Vkontakte, and there were probably other “crafts” as well.

    Fortunately, in 2015, the first step was taken to make PHP “mature”: PHP7 was released. JIT did not appear in PHP7, however, the result of changes in the “engine” can hardly be overestimated: now, in many tasks, PHP7 even without JIT is not inferior to HHVM (see, for example, benchmarks from the LiteSpeed ​​blog and benchmarks from the presentation of PHP7 developers ). The new PHP7 architecture also simplifies the further addition of JIT.

    Platform developers at Badoo have been closely following these passions for the past few years and even made a pilot project with HHVM, but decided to wait for PHP7 because they considered it more promising. And we recently launched Badoo in PHP7! It was an epic project, at least because of its size: we have more than 3 million lines of PHP code and 60,000 tests. About how we dealt with all this, simultaneously inventing a new framework for testing PHP applications (already released in open source - similar to Go! AOP) and saved a million - read on.

    Experiments with HHVM


    Before switching to PHP7, we looked for other ways for a while to optimize our backend. Of course, the first thing we decided to "play around" with HHVM.

    After spending a couple of weeks on research, we got very decent results: after warming up JIT on our framework, the speed and CPU utilization gains were hundreds of percent.

    However, HHVM had unpleasant drawbacks:

    • difficult and slow deployment . When deploying, it is necessary to warm up the JIT cache. At the time of warming up, the machine should not be loaded with production traffic, because everything works quite slowly. Warming up with parallel queries is also not recommended. In short, the heating phase of a large cluster is not a quick operation, plus you need to learn how to lay out portionwise on a large cluster of several hundred machines. As a result, we get a non-trivial architecture and deployment procedure with unpredictable operating time. And we want to have the simplest and fastest deployment possible: an important part of our development culture is laying out two planned releases a day and the ability to quickly “roll out” hotfixes into battle;
    • inconvenience of testing.For unit testing, we actively used the runkit extension, which is not available in HHVM. We’ll tell you more about this later, but if you don’t know, then this is an extension that allows you to change the behavior of variables, classes, methods, functions on the fly almost any way, and this is done through a very "hardcore" integration with the "internals" PHP The HHVM core is only remotely similar to the PHP core, so these same "internals" are completely different there. Therefore, implementing runkit on top of HHVM yourself is a hell of a job: due to the nature of the extension, we would have to rewrite tens of thousands of tests to make sure that HHVM works correctly with our code. It seemed to us impractical. To be honest, this was a problem for any of the options, and when switching to PHP7 we still had to redo a lot,
    • compatibility. This is primarily incomplete compatibility with PHP 5.5 (see github.com/facebook/hhvm/blob/master/hphp/doc/inconsistencies , github.com/facebook/hhvm/issues?labels=php5+incompatibility&state=open) and incompatibility with the extensions already written, and we have dozens of them. Both incompatibilities stem from the obvious structural flaw of the project: HHVM is not being developed by the community, but by a department within Facebook. In such cases, it is much easier for companies to change internal rules and standards without looking back at the community and tons of already written code. It’s easier for them to redo everything for themselves, to solve the problem with their resources. Therefore, in order to successfully work with the same volumes of tasks, it is necessary to have a resource of comparable power both for the initial implementation phase and for further support. This is risky and potentially expensive - we did not want to take that risk;
    • prospects. Despite the fact that Facebook is a big company with great programmers, we had great doubts that the HHVM development department could be more powerful than the PHP community. We thought that as soon as something similar appeared inside PHP, all homegrown projects would begin to die slowly but surely.

    And we began to wait for PHP7.

    The transition to a new version of the interpreter is an important and complex process, so we prepared for it, having drawn up a clear plan for the transition. It consisted of three stages of preparation:

    • changing the infrastructure for building and deploying PHP and adapting a lot of the extensions we wrote;
    • changing the infrastructure and testing environment;
    • Changes to the PHP code of applications.

    We will tell you about all the stages in more detail.

    Fixes in the kernel and extensions


    We have our own, actively supported and finalized PHP branch. We started a project to translate Badoo to PHP7 before its official release, so we had to smoothly provide a regular rebase PHP7 upstream in our tree in order to be able to receive updates of each release candidate. All patches and customizations (see the "Patches" section of our tech.badoo.com/open-source techsite ), which we use in our daily work, also had to be portable between versions and work correctly.

    We automated the pumping and assembly of all dependencies, extensions, and the PHP tree under 5.5 and 7.0. This not only simplified the work, but also gave a good foundation for the future: when version 7.1 is released, everything will be ready for us.

    We also had to sweat over the extensions. We support about 40 extensions, more than half of which are external open source extensions with our improvements.

    For the fastest possible transition, we decided to run two processes in parallel. The first is to rewrite independently the most critical extensions for us: Blitz template engine, APcu data cache in Shared memory, collecting statistics in Pinba and some custom ones for working with internal services (as a result, about 20 extensions).
    The second is to actively get rid of extensions that are used in non-critical parts of the infrastructure. We managed to easily get rid of 11 extensions - a lot!

    And, of course, we began to actively communicate with people who support the main open extensions we use for compatibility with PHP7 (special thanks to Derick Rethans, who is developing Xdebug).

    Further, we will dwell in more detail on the technical details of porting extensions to PHP7.

    In version 7, PHP developers changed many internal APIs, which made it necessary to edit a lot of code in extensions. The most important changes are as follows:
    • zval * → zval. Previously, when creating a new variable, the zval structure was always allocated, and now the structure from the stack is used;
    • char * -> zend_string. PHP7 uses aggressive string caching in the PHP core, so in the new core it has universally switched from regular strings to the zend_string structure, which stores the string and its length;
    • changes to the arrays API. Now zend_string is used as a key; in the implementation of arrays, the double linked list was replaced with a regular array, which is allocated in one block instead of many small ones.

    All this made it possible to drastically reduce the number of small memory allocations and, as a result, speed up the PHP core by tens of percent.

    It should be noted that all these changes entailed the need, if not to rewrite, then actively edit all the extensions. If in the case of built-in extensions we could count on their authors, then our developments could only be edited by us, and there were a lot of edits: due to changes in the internal APIs, some parts of the code were easier to rewrite.

    Unfortunately, the introduction of new structures that use garbage collection, while speeding up code execution, complicates the engine itself and finding problems in it. One of them was the problem in OPcache, which consisted of the following: when the cache was cleared, the bytecode of the cached file was destroyed at a time when it could still be used in another process, which led to the process crashing. Outwardly, it looked like this: the lines (zend_string) in the names of functions or constants suddenly collapse and garbage appears instead.

    Since we use a significant number of extensions of our own design, many of which are actively working with strings, the first suspicion fell on the misuse of strings in them. They wrote many tests, conducted many experiments, but all to no avail. As a result, I had to turn to the main developer of the PHP core, Dmitry Stogov, for help.

    First of all, he asked if the cache was cleared. We found out that, indeed, in each case it was so. It became clear that the problem is still not with us, but in OPcache. We quickly reproduced the problem and Dmitry fixed it in a couple of days. Without this fix, which was included in PHP version 7.0.4, it was impossible to use it stably in production!

    Test Infrastructure Change


    Testing at Badoo is our special pride. We spread the PHP code in production 2 times a day, we get 20-50 tasks in each layout (we use the feature branch in Git and the automated build build with close JIRA integration). With such a schedule and the volume of tasks without autotests in any way.

    Today we have about 60 thousand unit tests with approximately 50% coverage, which pass on average in 2-3 minutes in the cloud (we already talked about this on Habré ). In addition to unit tests, we use higher-level autotests - integration and system tests, selenium tests for web pages and calabash tests for mobile applications. All this diversity allows us to quickly conclude on the quality of each specific version of the code and make appropriate decisions.

    Switching to a new version of the interpreter is a cardinal change. There can be as many possible problems as possible, so it is imperative that all tests work. In order to make it clear that, how and why we did, it is necessary to take a short excursion into the history and tell about the evolution of the development of tests in our company.

    Often people who think about testing their products encounter in the process of experimenting (and some already during implementation) that their code is not ready for this. Indeed, the developer must remember that his code must be testable.. The architecture should allow unit tests to replace calls and objects of external dependencies in order to isolate the tested code from external conditions. I must say that this requirement complicates life, and many programmers from the principle do not want to write code so that it can be tested - imposed restrictions enter into an unequal struggle with other values ​​of “good code” and usually lose. And often, imagining the amount of available code written not according to the rules, experimenters simply postpone testing until better times or try to be content with small ones, covering only what can be covered with tests (as a result, tests do not always give the expected result).

    Our company is no exception. We also began to introduce testing far from immediately after the start of the project. Quite a few lines of code have already been written, which quite worked for itself in production and brought good money. Rewriting all this code for the sake of being able to cover it with tests as recommended would be too long and expensive.

    Fortunately, at that time there was already a great tool that allowed you to solve most of the problems with untestable code - runkit. This is an extension for PHP, which allows you to change, delete, add methods, classes and functions used in the program during script execution. It may still have a lot of things, but we did not use other extension functions. The tool was developed and maintained for several years (from 2005 to 2008) by Sarah Golemon, who now works on Facebook, including on HHVM. From 2008 to the present, the project has been supported by our compatriot Dmitry Zenovich (he worked as the head of the testing departments at Begun and Mail.Ru). And we also “contributed” a little to the project.

    Runkit itself is a very dangerous extension. With it, you can change the constants, functions and classes right at the time of the script that uses them. In fact, this is a tool with which you can rebuild your plane right during the flight. Runkit crawls into the very insides of PHP on the fly; one mistake or flaw in runkit - and the plane explodes beautifully in the air, PHP crashes, or you spend many hours searching for memory leaks and other low-level debugging. Nevertheless, it was a necessary tool for us: it is only possible to implement testing in a project without serious rewriting, by changing the code on the fly, simply replacing it with the one you need.

    When switching to PHP7, runkit turned out to be a big problem - it did not support this version of PHP. There was an option of sponsoring the development of a new version, but this path did not seem to us the most reliable in the long run. In parallel, we considered several other options.

    One promising solution was to switch from runkit to uopz. This is also an extension of PHP with similar functionality, which appeared in April 2014. It was offered to us by colleagues from Mamba, giving very good feedback primarily on the speed of work. The project is supported by Joe Watkins of the First Beat Media (UK). This project looked more lively and promising compared to runkit. But, unfortunately, we failed to transfer all tests to uopz. Fatal errors happened somewhere, somewhere segfaults - we got some reports, but, alas, there is no movement (for more details see, for example, this bug on github ). It would be very expensive to do the test rewriting in this case, and not the fact that something else would not be revealed.

    As a result, we came to an obvious solution for us: since we already need to rewrite a lot of code and still depend on external projects like runkit or uopz, with which we constantly have problems that are very expensive or impossible to solve on our own, why Wouldn’t you have to rewrite the code so that you remove all dependencies to the maximum? Yes, so that we never again have such problems, even if we want to switch to HHVM or any other similar product. And then we got our own framework.

    The system is called SoftMocks. The word soft emphasizes that the system runs in pure PHP instead of using extensions. This is an open source project, it is available as a plug-in library and is in the public domain. SoftMocks is not tied to the features of the PHP core implementation and works by rewriting code on the fly, similar to the Go! Aop .

    Our test code mainly uses the following things:
    1. Substitution of the implementation of one of the methods of the class.
    2. Substitution of the result of the function.
    3. Change the value of a global constant or class constant.
    4. Adding a method to a class.

    All these features are perfectly implemented with the help of runkit. When rewriting code, this becomes possible, but with some caveats.

    Job Description SoftMocks - material for a separate article, which we will write in the near future. In the meantime, we restrict ourselves only to a brief description of the operation of this system:

    • user code is connected via the rewrite wrapper function. After that, all include statements are automatically recursively replaced with wrappers;
    • a check for the existence of a substitution is added inside the definition of each user method, and if it is, then the corresponding code is executed. Direct function calls are replaced with a call through a wrapper - this allows you to intercept both built-in and user-defined functions;
    • calls to constants in the code are also dynamically replaced with a call to the wrapper;
    • SoftMocks uses Nikita Popov's PHP-Parser . This library is not very fast (parsing is about 15 times slower than token_get_all), but it provides a convenient interface for traversing the syntax tree and provides a convenient API for working with syntactic constructions of arbitrary complexity.

    Let's return to our task - the transition to PHP7. After we started using SoftMocks in the project, we had about 1000 tests left that needed to be repaired manually. This can be considered a good result, given that initially we had 60,000 tests. The speed of their run compared to runkit has not decreased, so in terms of performance, there are no serious losses from using SoftMocks. In fairness, we note that uopz should still work much faster.

    Utilities and Application Code


    In addition to many innovations, PHP7 brought with it some backward incompatibilities. The first thing we started to study the problem with is reading the official migration guide . It quickly became clear that without correcting the existing code, we run the risk of getting fatal errors in production, as well as encountering a change in behavior that will not be reflected in the logs, but will lead to incorrect logic of the application.

    Badoo are several repositories of PHP code, the largest of which contains more than 2 million lines of code. Moreover, we have implemented a lot of things in PHP: from the business logic of the web and the backend of mobile applications to the testing and calculation utilities. In addition, the situation was complicated by the fact that Badoo is a project with a history that is already 10 years old, and the legacy of PHP4, unfortunately, was still present. Accordingly, the method of "close peering" is not applicable. The “Brazilian system” is also inapplicable, that is, put it in production as it is and watch what breaks, too much increases the risks of breaking the business logic for too many percent of users. Therefore, we began to look for the opportunity to automate the search for incompatible places.

    At first we tried to use the most popular IDEs among developers, but, unfortunately, at that time they either simply did not support the syntax and features of PHP7, or they detected suspiciously few problems, apparently missing dangerous places in the code. After a little research, it was decided to try the php7mar utility . This is such a simple static code analyzer implemented in PHP. It is very easy to use, it works quite quickly, the result is provided as a text file, it requires PHP7. Of course, this utility is not a panacea, there are both false positives and omissions of especially “tricky” places in the code. But about 90% of the problems with its help were found, which significantly accelerated and facilitated the process of preparing the code for work under PHP7.

    The most common and potentially dangerous problems for us were:
    • changing the behavior of the func_get_arg () and func_get_args () functions. In the fifth version of PHP, these functions returned the values ​​of the arguments of the functions at the time of their transfer, and in the seventh version, at the time of calling func_get_args (). Thus, if inside the function, before calling func_get_args (), the argument of the argument changes, then there is a risk of behaving differently from the fifth version. This is the case when the logs will be empty, and the business logic of the application may be broken;
    • indirect access to variables, properties and methods of objects. Again, the danger is that behavior can change “silently”. The documentation describes in sufficient detail what exactly the differences are;
    • the use of reserved class names. In PHP7, it became impossible to use bool, int, float, string, null, true and false as the class name. Yes, yes, we had a Null class. Fortunately, this case is already easier, because it leads to an error;
    • a lot of potentially problematic foreach constructs have been found that use the link. But almost all of them behaved the same way in the fifth and seventh versions, since we had previously tried not to change the iterable array inside foreach and not rely on its internal pointer.

    The remaining cases of incompatibility were either extremely rare (such as the 'e' modifier for regular expressions) or corrected by a simple replacement (for example, now all constructors should be called __construct (), it is forbidden to use the class name).
    But, before starting to fix the code, we thought that while some developers make the changes necessary for compatibility, others will continue to write code incompatible with PHP7. To solve this problem, we added a pre-receive hook to each Git repository that ran on mutable php7 -l files, i.e. tested them against the syntax of PHP7. This does not guarantee complete protection against incompatibilities, but already eliminates a number of problems. In other cases, the developers just had to be a little more careful. In addition, we began to make a regular run of the full test suite for PHP7 and compare the results with runs for PHP5. At the same time, developers were forbidden to use any new features of PHP7, i.e. we didn’t turn off the old pre-receive hook with php5 -l. This allowed us to get the code at some point, compatible with both the seventh and fifth versions of the interpreter. Why is it important? Because in addition to problems with PHP code, when upgrading to a new version, there may be problems with PHP7 itself and its extensions (actually, as mentioned above, we encountered these problems). And, unfortunately, not all of them were reproduced in a test environment, some of which we were able to see only under significant load in production.

    "Launch" and the results


    Obviously, we needed a simple and quick way to change the version of PHP on any number of any servers. To do this, in the whole code, the paths to the CLI interpreter were replaced with / local / php, which, in turn, was a symlink either to / local / php5 or to / local / php7. Thus, to change the version of PHP on the server, it was necessary to change the link (the operation is atomic - this is important for CLI scripts), stop php5-fpm and start php7-fpm. It would be possible to have two upstream in php-fpm in nginx, run php5-fpm and php7-fpm on different ports, but we did not like this option by complicating the configuration of nginx.

    After all of the above was completed, we were able to proceed to run the selenium tests in the preproduction environment, which allowed us to detect a number of problems that were not noticed before. They concerned both the PHP code (for example, I had to abandon the outdated global variable $ HTTP_RAW_POST_DATA in favor of file_get_contents ("php: // input")), and extensions (various kinds of segmentation errors).

    Correcting detected at an early stage problems and finished rewriting unit tests (during which we also managed to find a few bugs in the interpreter, for example, a), we finally started quarantining production. “Quarantine” we call the launch of a new version of PHP on a limited number of servers. We started from one server in each large cluster (the backend of the web and mobile applications, the cloud), gradually increasing the number if there are no errors. The first major cluster to fully migrate to PHP7 was the cloud . The reason for this was the lack of need for php-fpm on it. The same clusters where fpm works had to wait until we found, and Dmitry Stogov did not fix the problem with OPcache. After that, we have already transferred the fpm cluster.

    Now about the results. In short, they are more than impressive. Below are the graphs of response time, rusage, memory consumption and CPU usage in the largest (263 server) of the clusters we have, namely the backend of mobile applications in the Prague data center:

    Distribution of response times:


    RUsage (CPU time):


    Memory usage :


    CPU load (%) on the entire cluster:


    Thus, the processor time was reduced by 2 times, which improved the overall response time by about 40%, since some part of the time spent processing the request communicates with databases and daemons, and with the transition to PHP7 this part is not accelerated in any way, which is expected. In addition, the effect is somewhat enhanced by the fact that the total load on the cluster has dropped below 50%, which indicates some features in the operation of Hyper-Threading technology . Roughly speaking, when the load increases above 50%, HT cores begin to work, which are not as “useful” as physical cores, but this is a topic for another article.
    Memory consumption, although it has never been a bottleneck for us, has decreased by about 8 times! And finally, we saved on equipment - now we can withstand the much greater load on the same number of servers, which, in essence, reduces the cost of its purchase and maintenance. The results on the other clusters differ slightly, except that the gain on the cloud is a bit more modest (about 40% of the CPU) due to the lack of OPcache there.

    How much money did we save? Let's count. Our application server cluster consists of more than 600 servers. By halving CPU usage, we get savings of about 300 servers. Adding the initial price of such "iron" (about $ 4000 for each) and depreciation, we get about a million dollars in savings plus about one hundred thousand per year on the hosting! And that's not counting the cloud, whose performance has also grown. We believe that this is an excellent result!

    Have you already switched to PHP7? We will be glad to hear your opinion and questions in the comments.

    Also popular now: