Graal: how to use the new JVM JVM compiler in real life
At the main Siberian Java-conference JBreak-2018 , held in Novosibirsk, Christian Thalinger from Twitter shared his practical experience using Graal. The company (Peter-Service) sent our entire working group to the conference, and we came to listen to this report as a whole. Understandably, given the fact that Graal is still considered a bold and potentially dangerous experiment (although it is very likely that it will go into JDK 10). It was very interesting to learn how this new product manifests itself in battle - not just anywhere, but in the development of such a level.
Christian Talinger has been working with Java virtual machines for more than a dozen years, and the key skill in his expertise is just JIT compilers. It was Christian who introduced Graal and became the initiator of its current (quite, according to Chris, active) use in the Twitter production environment. And, according to Talinger, this innovation saves the company a decent amount of money by saving iron resources.
In this interview with JBreak organizers, Christian lucidly explains the basics - what is Graal and how to manage it. But the report in Novosibirsk was more practice-oriented: his main task was to show the audience how to simply and painlessly start working with Graal, and why it is worth trying to do it.
To begin with - after all, a couple of theoretical introductory notes. So what is a JIT just-in-time compiler? To run the Java program, you need to perform several steps: first compile the source code in the instructions for the JVM - bytecode, and then run this bytecode in the JVM. Here the JVM acts as an interpreter. The JIT compiler was created to accelerate the work of Java applications: it is engaged in optimizing the bytecode to be launched by translating it into low-level machine instructions directly during the execution of the program.
HotSpot / OpenJDK uses two levels of JIT compilation implemented in C ++. These are C1 and C2 (also known as client and server). By default, they work together: first, fast but superficial optimization is performed using C1, and then the “hottest” methods are additionally optimized using C2.
In Java 9, JEP-243 implemented a mechanism for embedding a Java compiler in the JVM. And this is the dynamic compiler - JVMCI (Java Virtual Machine Compiler Interface). Actually, this mechanism supports Graal. I must say that in Java 9 Graal was already available as part of JEP-295- AOT-compilation (Ahead-of-time) in the JVM. True, even though the AOT compilation mechanisms use Graal as a compiler, this JEP states that initially the integration of Graal code in the JDK is supposed only within the Linux / x64 platform.
Thus, to try Graal, you need to take the JDK with AOT and JMVCI. Moreover, if you need to run on MacOS or Windows platforms, you will have to wait for the release of Java 10 (in the corresponding ticket JDK-8172670 fix version is put in the top ten).
Here Christian drew the attention of listeners to the fact that in the current JDK distributions the version of Graal, to put it mildly, is outdated (either a year ago, or even younger). But here Java 9 modularity comes to the rescue. Thanks to it, we can collect the latest version from Graal sources and embed it in the JVM using the command --upgrade-module-path. Since the development of Graal was started long before the module system, a special tool is used to build it - mx, which to some extent repeats the modular Java system. The tool runs on Python 2.7, all links can be found in the Graal repository on GitHub .
That is, we first deflate and install mx, then deflate Graal and assemble it into a module via mx, which then replaces the original module in the JDK.
At first glance, these manipulations may seem complicated and time-consuming, but in reality this trait is not so terrible. And in principle, the ability to replace the Graal version, without waiting for the release of the patch on the JDK or even the new JDK, personally seems to me more than convenient. At least, Christian showed how he himself collected all this live on machines in the cloud. True, an error occurred while assembling Truffle - some additional dependencies were required installed on the machine. But Graal assembled correctly and was then used in this form (from which we conclude that you can completely forget about Truffle: Graal is completely independent of it).
Next: in order for the JVM to start using Graal, you need to additionally set 3 flags:
Since Graal is essentially a normal Java application, it also needs to compile and prepare itself for work (the so-called bootstrapping). In the "on-demand" mode, this happens in parallel with the start of the application, in which case Graal uses C1 to optimize its code.
There is also the option to explicitly start initialization before starting the application, and in this scenario, you can even instruct Graal to optimize itself. However, this usually takes much longer and does not provide significant benefits. The grail is initialized a little longer than C1 / C2, and more actively uses free processor power due to the fact that it needs to compile more classes. But these differences are not so great and are practically leveled, lost in the general noise during application initialization.
In addition, since Graal is written in Java, it uses heap to initialize (in the case of C1 / C2, memory is also used, only through malloc). The main memory consumption is at the start of the application. Both Graal and C1 / C2 use free kernels when compiling. Graal memory consumption can be monitored by enabling GC logging (currently there is no isolation of the heap for initializing the Graal from the main heap of the application).
Well, we learned how to set it all up - it's time to understand why. What are the benefits of using Graal?
Christian used a practical example to answer this question. He launched a couple of benchmarks from one project written in Scala: one was actively working with the CPU, and the other was more actively interacting with memory. On the benchmark that worked with the CPU, when using Graal, there was a noticeable slowdown on average for a second due to a longer start (the benchmark itself took 5 seconds to complete). But on the second benchmark, Graal showed quite a good result - ~ 20 seconds against ~ 28 on C1 / C2. And this despite the fact that, as Christian noted, the example with Scala Graal does not work as well as it could (due to the dynamic structure of the bytecode generated by Scala). That is, we can hope that in the case of a pure Java application, everything should be even better.
Plus, when displaying GC logs, it was clear that with Graal the application produces much less garbage collections (about 2 times). This is due to a more efficient escape analysis, which allows you to optimize the number of objects created on heap.
Summarizing my personal impressions of what I heard, I’ll say that the report seemed to me quite comprehensive, and did not at all carry an advertising message in the spirit of “all urgently switch to Graal”. It is clear that there is no magic pill, and everything is always determined by the real application - Christian himself admits that the specific values, of course, depend on specific benchmarks. Anyone who decides to try the Grail, in any case, will have to use the scientific poke method, run and (probably) find bugs (and better then edit them and fill out pull requests in the Graal repo).
But overall, with the current trend towards the use of microservices and stateless applications - and, as a result, towards a more active (and correct) application of Young Gen - Graal looks very good.
So if the project can be translated with little blood into Java 9 (or written from scratch on it), I would definitely try Graal. And I, for example, was even pleased that the emphasis in the report was made specifically on Graal as a JIT compiler - because, on the whole, an ordinary Java developer needs it in that quality (that is, without Truffel and other things GraalVM, which Oracle has recently combined into a framework for development and runtime for various languages based on JVM). It would be interesting to test the memory costs and see how noticeable the difference between the standard C1 / C2 and Graal is. On the other hand, despite the fact that in our time quite a decent amount of memory is allocated for the application, and its main amount is consumed at startup (and today it is usually the initialization and start of the container, which already launches the application itself), these numbers, apparently,
Here you can download the presentation from the report.
In truth, I personally became so interested in the idea that I plan to repeat all the steps Christian did, but try to run Java benchmark suites directly (for example, DaCapo and SPECjvm2008 - I'm not so good at Java benchmarking, so I would be grateful if someone will suggest more appropriate options in the comments or hp). Well, and closer to the specifics of the work - I'll try to sketch out a simple web application (for example, SpringBoot + Jetty + PostgreSQL), drive under load and compare the numbers. I promise to share the results with the community.
Christian Talinger has been working with Java virtual machines for more than a dozen years, and the key skill in his expertise is just JIT compilers. It was Christian who introduced Graal and became the initiator of its current (quite, according to Chris, active) use in the Twitter production environment. And, according to Talinger, this innovation saves the company a decent amount of money by saving iron resources.
In this interview with JBreak organizers, Christian lucidly explains the basics - what is Graal and how to manage it. But the report in Novosibirsk was more practice-oriented: his main task was to show the audience how to simply and painlessly start working with Graal, and why it is worth trying to do it.
To begin with - after all, a couple of theoretical introductory notes. So what is a JIT just-in-time compiler? To run the Java program, you need to perform several steps: first compile the source code in the instructions for the JVM - bytecode, and then run this bytecode in the JVM. Here the JVM acts as an interpreter. The JIT compiler was created to accelerate the work of Java applications: it is engaged in optimizing the bytecode to be launched by translating it into low-level machine instructions directly during the execution of the program.
HotSpot / OpenJDK uses two levels of JIT compilation implemented in C ++. These are C1 and C2 (also known as client and server). By default, they work together: first, fast but superficial optimization is performed using C1, and then the “hottest” methods are additionally optimized using C2.
In Java 9, JEP-243 implemented a mechanism for embedding a Java compiler in the JVM. And this is the dynamic compiler - JVMCI (Java Virtual Machine Compiler Interface). Actually, this mechanism supports Graal. I must say that in Java 9 Graal was already available as part of JEP-295- AOT-compilation (Ahead-of-time) in the JVM. True, even though the AOT compilation mechanisms use Graal as a compiler, this JEP states that initially the integration of Graal code in the JDK is supposed only within the Linux / x64 platform.
Thus, to try Graal, you need to take the JDK with AOT and JMVCI. Moreover, if you need to run on MacOS or Windows platforms, you will have to wait for the release of Java 10 (in the corresponding ticket JDK-8172670 fix version is put in the top ten).
Here Christian drew the attention of listeners to the fact that in the current JDK distributions the version of Graal, to put it mildly, is outdated (either a year ago, or even younger). But here Java 9 modularity comes to the rescue. Thanks to it, we can collect the latest version from Graal sources and embed it in the JVM using the command --upgrade-module-path. Since the development of Graal was started long before the module system, a special tool is used to build it - mx, which to some extent repeats the modular Java system. The tool runs on Python 2.7, all links can be found in the Graal repository on GitHub .
That is, we first deflate and install mx, then deflate Graal and assemble it into a module via mx, which then replaces the original module in the JDK.
At first glance, these manipulations may seem complicated and time-consuming, but in reality this trait is not so terrible. And in principle, the ability to replace the Graal version, without waiting for the release of the patch on the JDK or even the new JDK, personally seems to me more than convenient. At least, Christian showed how he himself collected all this live on machines in the cloud. True, an error occurred while assembling Truffle - some additional dependencies were required installed on the machine. But Graal assembled correctly and was then used in this form (from which we conclude that you can completely forget about Truffle: Graal is completely independent of it).
Next: in order for the JVM to start using Graal, you need to additionally set 3 flags:
-XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler -XX:-EnableJVMCI
Since Graal is essentially a normal Java application, it also needs to compile and prepare itself for work (the so-called bootstrapping). In the "on-demand" mode, this happens in parallel with the start of the application, in which case Graal uses C1 to optimize its code.
There is also the option to explicitly start initialization before starting the application, and in this scenario, you can even instruct Graal to optimize itself. However, this usually takes much longer and does not provide significant benefits. The grail is initialized a little longer than C1 / C2, and more actively uses free processor power due to the fact that it needs to compile more classes. But these differences are not so great and are practically leveled, lost in the general noise during application initialization.
In addition, since Graal is written in Java, it uses heap to initialize (in the case of C1 / C2, memory is also used, only through malloc). The main memory consumption is at the start of the application. Both Graal and C1 / C2 use free kernels when compiling. Graal memory consumption can be monitored by enabling GC logging (currently there is no isolation of the heap for initializing the Graal from the main heap of the application).
Well, we learned how to set it all up - it's time to understand why. What are the benefits of using Graal?
Christian used a practical example to answer this question. He launched a couple of benchmarks from one project written in Scala: one was actively working with the CPU, and the other was more actively interacting with memory. On the benchmark that worked with the CPU, when using Graal, there was a noticeable slowdown on average for a second due to a longer start (the benchmark itself took 5 seconds to complete). But on the second benchmark, Graal showed quite a good result - ~ 20 seconds against ~ 28 on C1 / C2. And this despite the fact that, as Christian noted, the example with Scala Graal does not work as well as it could (due to the dynamic structure of the bytecode generated by Scala). That is, we can hope that in the case of a pure Java application, everything should be even better.
Plus, when displaying GC logs, it was clear that with Graal the application produces much less garbage collections (about 2 times). This is due to a more efficient escape analysis, which allows you to optimize the number of objects created on heap.
Summarizing my personal impressions of what I heard, I’ll say that the report seemed to me quite comprehensive, and did not at all carry an advertising message in the spirit of “all urgently switch to Graal”. It is clear that there is no magic pill, and everything is always determined by the real application - Christian himself admits that the specific values, of course, depend on specific benchmarks. Anyone who decides to try the Grail, in any case, will have to use the scientific poke method, run and (probably) find bugs (and better then edit them and fill out pull requests in the Graal repo).
But overall, with the current trend towards the use of microservices and stateless applications - and, as a result, towards a more active (and correct) application of Young Gen - Graal looks very good.
So if the project can be translated with little blood into Java 9 (or written from scratch on it), I would definitely try Graal. And I, for example, was even pleased that the emphasis in the report was made specifically on Graal as a JIT compiler - because, on the whole, an ordinary Java developer needs it in that quality (that is, without Truffel and other things GraalVM, which Oracle has recently combined into a framework for development and runtime for various languages based on JVM). It would be interesting to test the memory costs and see how noticeable the difference between the standard C1 / C2 and Graal is. On the other hand, despite the fact that in our time quite a decent amount of memory is allocated for the application, and its main amount is consumed at startup (and today it is usually the initialization and start of the container, which already launches the application itself), these numbers, apparently,
Here you can download the presentation from the report.
In truth, I personally became so interested in the idea that I plan to repeat all the steps Christian did, but try to run Java benchmark suites directly (for example, DaCapo and SPECjvm2008 - I'm not so good at Java benchmarking, so I would be grateful if someone will suggest more appropriate options in the comments or hp). Well, and closer to the specifics of the work - I'll try to sketch out a simple web application (for example, SpringBoot + Jetty + PostgreSQL), drive under load and compare the numbers. I promise to share the results with the community.