Is the native method expensive? JNI Secret Extension


    Why do Java programmers resort to native methods? Sometimes, to use a third-party DLL library. In other cases, to speed up the critical algorithm due to optimized C or assembler code. For example, for processing streaming media, for compression, encryption, etc.

    But calling the native method is not free. At times, the JNI overhead is even greater than the performance gain. And all because they include:
    1. creating a stack frame;
    2. shifting arguments according to ABI ;
    3. wrapping links in JNI handles ( jobject);
    4. passing additional arguments JNIEnv*and jclass;
    5. capture and release of the monitor, if the method synchronized;
    6. "Lazy" linking of the native function;
    7. tracing the entry and exit of the method;
    8. transfer of a stream from state in_Javato in_nativeand back;
    9. check the need for safepoint;
    10. handling possible exceptions.

    But often native methods are simple: they do not throw exceptions, do not create new objects in heap, do not bypass the stack, do not work with handles and are not synchronized. Is it possible for them not to do unnecessary actions?

    Yes, and today I will talk about the undocumented features of HotSpot JVM for accelerated calling of simple JNI methods. Although this optimization has appeared since the first versions of Java 7, which is surprising, no one has written about it anywhere.

    JNI as we know him


    For example, consider a simple native method that receives an input byte[]and returns the sum of elements. There are several ways to work with an array in JNI:
    • GetByteArrayRegion - copies the elements of the Java array to the specified location in native memory;
      GetByteArrayRegion example
      JNIEXPORT jint JNICALL
      Java_bench_Natives_arrayRegionImpl(JNIEnv* env, jclass cls, jbyteArray array) {
          static jbyte buf[1048576];
          jint length = (*env)->GetArrayLength(env, array);
          (*env)->GetByteArrayRegion(env, array, 0, length, buf);
          return sum(buf, length);
      }
      

    • GetByteArrayElements- the same thing, only the JVM itself allocates the area of ​​memory where the elements will be copied. When you finish working with the array, you need to call ReleaseByteArrayElements.
      GetByteArrayElements example
      JNIEXPORT jint JNICALL
      Java_bench_Natives_arrayElementsImpl(JNIEnv* env, jclass cls, jbyteArray array) {
          jboolean isCopy;
          jint length = (*env)->GetArrayLength(env, array);
          jbyte* buf = (*env)->GetByteArrayElements(env, array, &isCopy);
          jint result = sum(buf, length);
          (*env)->ReleaseByteArrayElements(env, array, buf, JNI_ABORT);
          return result;
      }
      

    • Why, you ask, make a copy of the array? But you can’t work directly with objects in Java Heap directly from the native, since they can be moved by the garbage collector directly while the JNI method is running. However, there is a function GetPrimitiveArrayCriticalthat returns the direct address of the array in heap, but it prohibits the GC from working before the call ReleasePrimitiveArrayCritical.
      GetPrimitiveArrayCritical example
      JNIEXPORT jint JNICALL
      Java_bench_Natives_arrayElementsCriticalImpl(JNIEnv* env, jclass cls, jbyteArray array) {
          jboolean isCopy;
          jint length = (*env)->GetArrayLength(env, array);
          jbyte* buf = (jbyte*) (*env)->GetPrimitiveArrayCritical(env, array, &isCopy);
          jint result = sum(buf, length);
          (*env)->ReleasePrimitiveArrayCritical(env, array, buf, JNI_ABORT);
          return result;
      }
      


    Critical native


    And here is our secret tool. Outwardly, it looks like a regular JNI method, but with a prefix JavaCritical_instead Java_. Among the arguments are absent JNIEnv*and jclass, and instead jbyteArraytwo arguments are passed: jint length- the length of the array and jbyte* data- the "raw" pointer to the elements of the array. Thus, the Critical Native method does not need to call expensive JNI functions GetArrayLengthand GetByteArrayElements- you can immediately work with the array. For the duration of this method, the GC will be delayed.

    JNIEXPORT jint JNICALL
    JavaCritical_bench_Natives_javaCriticalImpl(jint length, jbyte* buf) {
        return sum(buf, length);
    }
    

    As you can see, there is nothing superfluous in the implementation.
    But in order for a method to become Critical Native, it must meet strict restrictions:
    • the method should be staticand not synchronized;
    • among arguments, only primitive types and arrays of primitives are supported;
    • Critical Native cannot call JNI functions and, therefore, allocate Java objects or throw exceptions;
    • and, most importantly, the method should complete in a short time , because it blocks the GC at runtime.

    Critical Natives was conceived as a private Hotspot API for the JDK to speed up the call of cryptographic functions implemented in the native. The maximum that can be found from the description is comments on the task in the bugtracker . An important feature: JavaCritical_functions are called only from hot (compiled) code, therefore, in addition to the JavaCritical_implementation, the method should also have a “spare” traditional JNI implementation. However, for compatibility with other JVMs this is even better.

    How many will be in grams?


    Let's measure what is the savings on arrays of different lengths: 16, 256, 4KB, 64KB and 1MB. Naturally, using JMH .
    Benchmark
    @State(Scope.Benchmark)
    public class Natives {
        @Param({"16", "256", "4096", "65536", "1048576"})
        int length;
        byte[] array;
        @Setup
        public void setup() {
            array = new byte[length];
        }
        @GenerateMicroBenchmark
        public int arrayRegion() {
            return arrayRegionImpl(array);
        }
        @GenerateMicroBenchmark
        public int arrayElements() {
            return arrayElementsImpl(array);
        }
        @GenerateMicroBenchmark
        public int arrayElementsCritical() {
            return arrayElementsCriticalImpl(array);
        }
        @GenerateMicroBenchmark
        public int javaCritical() {
            return javaCriticalImpl(array);
        }
        static native int arrayRegionImpl(byte[] array);
        static native int arrayElementsImpl(byte[] array);
        static native int arrayElementsCriticalImpl(byte[] array);
        static native int javaCriticalImpl(byte[] array);
        static {
            System.loadLibrary("natives");
        }
    }
    
    results
    Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
    Benchmark                         (length)   Mode   Samples         Mean   Mean error    Units
    b.Natives.arrayElements                 16  thrpt         5     7001,853       66,532   ops/ms
    b.Natives.arrayElements                256  thrpt         5     4151,384       89,509   ops/ms
    b.Natives.arrayElements               4096  thrpt         5      571,006        5,534   ops/ms
    b.Natives.arrayElements              65536  thrpt         5       37,745        2,814   ops/ms
    b.Natives.arrayElements            1048576  thrpt         5        1,462        0,017   ops/ms
    b.Natives.arrayElementsCritical         16  thrpt         5    14467,389       70,073   ops/ms
    b.Natives.arrayElementsCritical        256  thrpt         5     6088,534      218,885   ops/ms
    b.Natives.arrayElementsCritical       4096  thrpt         5      677,528       12,340   ops/ms
    b.Natives.arrayElementsCritical      65536  thrpt         5       44,484        0,914   ops/ms
    b.Natives.arrayElementsCritical    1048576  thrpt         5        2,788        0,020   ops/ms
    b.Natives.arrayRegion                   16  thrpt         5    19057,185      268,072   ops/ms
    b.Natives.arrayRegion                  256  thrpt         5     6722,180       46,057   ops/ms
    b.Natives.arrayRegion                 4096  thrpt         5      612,198        5,555   ops/ms
    b.Natives.arrayRegion                65536  thrpt         5       37,488        0,981   ops/ms
    b.Natives.arrayRegion              1048576  thrpt         5        2,054        0,071   ops/ms
    b.Natives.javaCritical                  16  thrpt         5    60779,676      234,483   ops/ms
    b.Natives.javaCritical                 256  thrpt         5     9531,828       67,106   ops/ms
    b.Natives.javaCritical                4096  thrpt         5      707,566       13,330   ops/ms
    b.Natives.javaCritical               65536  thrpt         5       44,653        0,927   ops/ms
    b.Natives.javaCritical             1048576  thrpt         5        2,793        0,047   ops/ms
    


    It turns out that for small arrays the cost of a JNI call is several times greater than the running time of the method itself! For arrays of hundreds of bytes, the overhead is comparable to useful work. Well, and for multi-kilobyte arrays, the method of calling is not so important - all the time is spent actually on processing.

    conclusions


    Critical Natives is a private JNI extension in HotSpot, introduced with JDK 7. By implementing a JNI-like function according to certain rules, you can significantly reduce the overhead of calling the native method and processing Java arrays in native code. However, for long-playing functions, such a solution will not work, since the GC will not be able to start while the Critical Native is running.

    Also popular now: