OpenJDK: Project Panama

    Two years ago, a new project, code-named “Panama,” was created at OpenJDK . The main area of ​​research announced the creation of a new interface for working with platform-dependent libraries and data outside the Java heap (off-heap). But the goals of the project are wider: to study the mechanisms of interaction between the JVM and the “external” (non-Java) API.

    Vladimir Ivanov iwanowww is a leading Oracle engineer and works in the HotSpot Java Virtual Machine Development Team. He specializes in JIT compilation and support for alternative languages ​​on the Java platform. Vladimir joined Sun Microsystems (acquired by Oracle in 2010) in 2005 and since then has taken part in a large number of Java-related projects (HotSpot JVM, RTSJ, JavaFX).

    JNI 2.0?

    - Most of the Panama project is working with native libraries from Java code. How can this be done now?

    - It was always possible to work with native code in Java. Native methods were already in the first version of Java, and the standard JNI interface appeared already in version 1.1. But time is passing, the platform is developing, requirements are changing, and looking at JNI now, there is an understanding that it is possible to organize work with native libraries more conveniently and efficiently.

    JNI has a number of drawbacks related to the complexity of use and speed. In order to integrate some library into the application, you need to not only write a wrapper for it in C / C ++, but also provide assemblies for all supported platforms. This does not fit well with the compilation process of modern Java applications and can be a significant barrier to implementation. Also, due to its Java centricity, each call through JNI carries certain overheads, which becomes especially noticeable when working intensively even with small methods. The Panama project is, among other things, an attempt to create a new version of JNI, "JNI 2.0", which is more convenient and productive. And there is already JEP : JEP 191: "Foreign Function Interface" .

    - There is an opinion that JNI was designed so complex that it was not pleasant to use. What do you think about it?

    - This is something from the category of "urban legends." Although on the whole, the opinion, of course, is erroneous, there is some truth in it: investing in improving JNI was not a priority. It was believed that it was more efficient and convenient to write everything in Java. There was an interface that covered> 90% of user needs and it was not necessary to develop it. And with the help of third-party libraries, you can greatly simplify the work with JNI. Just look at JNR , which allows you to fully work with native libraries without writing and a line of C / C ++ code.

    - JNI is already 20 years old, why did the Panama project appear and is developing just now?

    - I would say that we are ripe for this project. Over the years, the strengths and weaknesses of JNI have become apparent, and the Java platform has come a long way in its development. It became clear that it is not always advisable to develop applications entirely in Java, in addition, working with data outside the Java heap has become much more popular. JNI and NIO no longer satisfy all the needs and users have to work with sun.misc.Unsafe . The Panama project aims to solve a number of problems that they face.

    As announced by John Rose (Oracle architect JVM JVM), overseeing the project: any useful library should be easily accessible as part of the Java ecosystem (whether it is written in Java or not).

    For example, there is a linear algebra package LAPACKoriginally written in Fortran. A lot of resources have been invested in optimization, and rewriting in Java is unlikely to win anything. It is much more productive to simply reuse it, as C / C ++ programmers do, for example.

    In general, the first attempt to “look outside” can be considered Project Sumatra , whose goal was to study the prospects for using the GPU to run Java programs. In theory, everything sounds very attractive: run the program on the device where the GPU is available, and the JVM will automatically start using it. But in practice, everything turned out not so rosy, and failed to create an effective mechanism for executing Java bytecode on modern GPUs. There are several Java libraries ( Aparapi and Rootbeer) to work with the GPU from Java, but they offer a fairly low-level approach similar to OpenCL / CUDA.

    Panama gives a different perspective on the problem of using the GPU: it is not necessary to execute Java bytecode for the GPU, it is enough to work with libraries that know what to do with the GPU. Such functionality is possessed, for example, by some BLAS implementations and the MAGMA linear algebra package .

    - What tasks are programmers now solving using JNI?

    - The ecosystem of Java libraries is rich, but not everything is written in Java. I already mentioned linear algebra and LAPACK packages. The only way to use them in a Java program is JNI. Another example is 3D graphics: how to work with OpenGLfrom java? There is no standard Java API, there are platform implementations with the necessary functionality, but a way to integrate with them from Java is required. The answer is again JNI.

    - And what successful projects are currently using JNI?

    - In general, any more or less popular non-Java library has a wrapper version for Java, and of course, all this is implemented using JNI. For example, in the field of computer vision, this is the OpenCV library . If you look at 3D graphics, this is the Java Binding for the OpenGL API and the Lightweight Java Game Library .

    Regarding linear algebra packages, netlib-java provides access to BLAS / LAPACK platform implementations. By the way, present in the latest versionsApache Spark .

    From Java projects, I would mention JRuby, which JNI does not directly use, but relies on JNR to work with the platform-specific API.

    Off-heap access and data manipulation

    - In addition to the interface for calling native libraries, the Panama project includes support for native data structures. Do you consider it as a separate functionality or as a feature necessary to call native libraries?

    - Both. The main problem when working with native code from Java is data exchange. The virtual machine has complete freedom in choosing the representation of Java objects, and, often, this format is not consistent with native libraries. You have to either copy the data back and forth, or try to work with one copy.

    JNI offers an API for accessing Java heap in native code, and to work with off-heap you need to write code in a JNI wrapper. It turns out very expensive: both in terms of performance and the amount of required code.

    Panama is working on a new format (Layout Definition Language ), which allows you to describe fairly complex data structures in a compact and flexible way. LDL descriptions can be automatically extracted from C / C ++ headers, and Java code for working with data is generated according to the description “on the fly”. The JVM can also use this information, for example, to search for pointers to Java objects in the GC. In this case, native code will be able to work with this data directly, without any additional adaptation.

    In combination with pointers and explicit memory management , this fully covers part of the sun.misc.Unsafe functionality used for off-heap solutions.

    But that is not all. With proper support on the JVM side, LDL can be used to describe the structure of Java objects.

    First of all, this will allow you to control the alignment and produce padding fields.
    In a “hot” code, the effects of false sharing and unaligned memory access seriously affect execution speed. There is an @Contended annotation inside the JDK , but for the user, the only way to avoid false sharing is to manually “overlay” the problem field with other fields, in the hope that the JVM will keep their order.

    But, most importantly, it will open the way to a number of exotic structures, such as fused-lines (heading and an array of characters as one object) or tagged-arrays (each element of the array or a primitive value, or a pointer to an object).

    This part of the project has something in common with Valhalla and value types in terms of creating compact Java structures with fast access (both arbitrary and sequential) to data.

    - What do you think of the features of the project will be most in demand by users?

    - Panama has a number of independent research areas. The first is working with native code and off-heap data. This is where the new JNI replacement API comes in. According to my estimates, this part of the project should be the most popular.

    From other directions, I would mention the API for batch processing of data ( Vector API ). Modern processors have vector extensions (SSE and AVX on x86, NEON on ARM) containing instructions for batch processing of data ( SIMD instructions) At the moment, the JVM can do automatic vectorization of code during dynamic compilation, but this does not cover all interesting cases. Work is underway on a specialized API that makes it possible to explicitly describe the operations of batch processing on data.

    Another area is updating Java arrays, also known as Arrays 2.0. Arrays were in Java from the very beginning, and in some aspects are seriously outdated (for example, the 2Gb size limit). There is a need for more effective and flexible mechanisms for describing and working with them.

    - When compared with other changes with the JVM and Java, how important is the Panama project at the moment?

    - Work in Panama is actively ongoing, but is still in the research phase. We have yet to determine what and when to integrate into Java.

    To date, the key project for JDK 9 is the modularization of the platform ( Project Jigsaw ).

    In the context of Panama, VarHandle's look very interesting ( JEP 193: “Variable Handles” ). They work both on fields and arrays, and on off-heap data, and provide a number of exotic read / write modes that cannot be described in terms of the standard Java memory model. Such support is necessary for the effective implementation of non-blocking synchronization, and the java.util.concurrent package already completely migrated from sun.misc.Unsafe to VarHandle in JDK 9 . New primitives proposed in Panama should fit well with the access paradigm via VarHandle's, unifying access to on-heap and off-heap data.

    What's next? The future of JDK

    - And in future versions?

    - Still in active development is Project Valhalla . The Panama project is less by ear, but, in my opinion, is no less important for the Java platform in the long run.

    Talk about FFI and working with off-heap has been going on for quite some time, but recently there has been a keen interest in the Vector API. At the JVM Language Summit conference this year, there was a funny moment: when discussing Panama, Facebook colleagues were very interested in when to wait for the Vector API to appear in Java, and said that they really needed it 3 years ago. It is unfortunate that they were silent for so long. It was necessary to immediately raise the topic, since they come to JVMLS every year. At that time, support for explicit vectorization did not attract much interest.

    - Do you expect that some projects written in Java will be rewritten using the new API?

    - Of course, JNI will remain, but for people who are now using JNI, the new API will be much more attractive. Judging by our experience, with a new interface, the need for a JNI should no longer be.

    We are actively experimenting with the current prototype and are pleased with the results: Clang is used to extract information from C / C ++ headers , and now the whole “binding” is created by a new toolkit from Panama. Simple, convenient, saves a lot of time during migration.

    - That is, the Panama project will be in demand inside the Java platform?

    - Of course, calling native code is also actively used inside the platform, so with the advent of a more convenient and efficient mechanism, we will gradually switch to it in the JDK, but Panama is not positioned as internal. Its goal is to create a new mechanism for working with native code and off-heap data for the Java platform, and this implies the emergence of a new public API.

    - That is, JNI will remain in the language as a legacy framework?

    - Nobody is going to get rid of JNI yet. Backward compatibility is critical to Java and support will continue. It is possible that in future JNI will be marked “for deletion” (as deprecated), but at the moment there are no such plans.

    I would also like to note that JNI will most likely benefit from working on the Panama project. Effective work with native code requires serious support on the JVM side. So rewriting JNIs using the new JVM primitives can provide a significant performance boost.

    - And what else will be available in JDK 9?

    - Of the projects that are not well-known, I would single out the JVMCI interface ( JEP 243: “Java-Level JVM Compiler Interface” ). Using it, you can connect a third-party dynamic compiler to the JVM, and the first user of this API is Graal , developed by Oracle Labs. This is a new JIT compiler written entirely in Java.

    - That is, it will be possible to replace the standard JIT compiler with Graal?

    - Yes, Graal can be used either as a “last level” compiler generating optimized code (replacing the C2 server compiler in Hotspot), or as the only JIT compiler in a virtual machine. It is available now, but a special JVM build is needed.

    In general, experiments with "Java on Java" implementations have been going on for a long time. There was a project of Maxine VM - a virtual machine completely (!) Written in Java. The greatest successes were achieved in the field of dynamic compilers. After all, Graal began with an attempt to rewrite the client compiler from HotSpot to Java. In Maxine, it was even called C1X at first. Finally, the time has come to implement the developments in the platform.

    As a JVM engineer, the trend for rewriting a virtual machine in Java is extremely impressive. On the one hand, we get an implementation on a modern platform, for which there is a convenient toolkit, high-level language constructs, an extensive standard library with excellent support for multi-threaded programming. It is also important that we fully control this platform, so that we have the opportunity to expand it in the direction we need and to solve the problems that arise on our own. On the other hand, we add Java tools for solving low-level tasks. And the Panama project will play one of the key roles in this.

    If you like the “guts” of the JVM as much as we do, then in addition to Vladimir’s report “Native code, Off-heap data and Java”, we recommend that you watch the following Joker 2016 reports:

    Also popular now: