
Is the native method expensive? JNI Secret Extension

Why do Java programmers resort to native methods? Sometimes, to use a third-party DLL library. In other cases, to speed up the critical algorithm due to optimized C or assembler code. For example, for processing streaming media, for compression, encryption, etc.
But calling the native method is not free. At times, the JNI overhead is even greater than the performance gain. And all because they include:
- creating a stack frame;
- shifting arguments according to ABI ;
- wrapping links in JNI handles (
jobject
); - passing additional arguments
JNIEnv*
andjclass
; - capture and release of the monitor, if the method
synchronized
; - "Lazy" linking of the native function;
- tracing the entry and exit of the method;
- transfer of a stream from state
in_Java
toin_native
and back; - check the need for safepoint;
- handling possible exceptions.
But often native methods are simple: they do not throw exceptions, do not create new objects in heap, do not bypass the stack, do not work with handles and are not synchronized. Is it possible for them not to do unnecessary actions?
Yes, and today I will talk about the undocumented features of HotSpot JVM for accelerated calling of simple JNI methods. Although this optimization has appeared since the first versions of Java 7, which is surprising, no one has written about it anywhere.
JNI as we know him
For example, consider a simple native method that receives an input
byte[]
and returns the sum of elements. There are several ways to work with an array in JNI:GetByteArrayRegion
- copies the elements of the Java array to the specified location in native memory;GetByteArrayRegion exampleJNIEXPORT jint JNICALL Java_bench_Natives_arrayRegionImpl(JNIEnv* env, jclass cls, jbyteArray array) { static jbyte buf[1048576]; jint length = (*env)->GetArrayLength(env, array); (*env)->GetByteArrayRegion(env, array, 0, length, buf); return sum(buf, length); }
GetByteArrayElements
- the same thing, only the JVM itself allocates the area of memory where the elements will be copied. When you finish working with the array, you need to call ReleaseByteArrayElements.GetByteArrayElements exampleJNIEXPORT jint JNICALL Java_bench_Natives_arrayElementsImpl(JNIEnv* env, jclass cls, jbyteArray array) { jboolean isCopy; jint length = (*env)->GetArrayLength(env, array); jbyte* buf = (*env)->GetByteArrayElements(env, array, &isCopy); jint result = sum(buf, length); (*env)->ReleaseByteArrayElements(env, array, buf, JNI_ABORT); return result; }
- Why, you ask, make a copy of the array? But you can’t work directly with objects in Java Heap directly from the native, since they can be moved by the garbage collector directly while the JNI method is running. However, there is a function
GetPrimitiveArrayCritical
that returns the direct address of the array in heap, but it prohibits the GC from working before the callReleasePrimitiveArrayCritical
.GetPrimitiveArrayCritical exampleJNIEXPORT jint JNICALL Java_bench_Natives_arrayElementsCriticalImpl(JNIEnv* env, jclass cls, jbyteArray array) { jboolean isCopy; jint length = (*env)->GetArrayLength(env, array); jbyte* buf = (jbyte*) (*env)->GetPrimitiveArrayCritical(env, array, &isCopy); jint result = sum(buf, length); (*env)->ReleasePrimitiveArrayCritical(env, array, buf, JNI_ABORT); return result; }
Critical native
And here is our secret tool. Outwardly, it looks like a regular JNI method, but with a prefix
JavaCritical_
instead Java_
. Among the arguments are absent JNIEnv*
and jclass
, and instead jbyteArray
two arguments are passed: jint length
- the length of the array and jbyte* data
- the "raw" pointer to the elements of the array. Thus, the Critical Native method does not need to call expensive JNI functions GetArrayLength
and GetByteArrayElements
- you can immediately work with the array. For the duration of this method, the GC will be delayed.JNIEXPORT jint JNICALL
JavaCritical_bench_Natives_javaCriticalImpl(jint length, jbyte* buf) {
return sum(buf, length);
}
As you can see, there is nothing superfluous in the implementation.
But in order for a method to become Critical Native, it must meet strict restrictions:
- the method should be
static
and notsynchronized
; - among arguments, only primitive types and arrays of primitives are supported;
- Critical Native cannot call JNI functions and, therefore, allocate Java objects or throw exceptions;
- and, most importantly, the method should complete in a short time , because it blocks the GC at runtime.
Critical Natives was conceived as a private Hotspot API for the JDK to speed up the call of cryptographic functions implemented in the native. The maximum that can be found from the description is comments on the task in the bugtracker . An important feature:
JavaCritical_
functions are called only from hot (compiled) code, therefore, in addition to the JavaCritical_
implementation, the method should also have a “spare” traditional JNI implementation. However, for compatibility with other JVMs this is even better.How many will be in grams?
Let's measure what is the savings on arrays of different lengths: 16, 256, 4KB, 64KB and 1MB. Naturally, using JMH .
Benchmark
@State(Scope.Benchmark)
public class Natives {
@Param({"16", "256", "4096", "65536", "1048576"})
int length;
byte[] array;
@Setup
public void setup() {
array = new byte[length];
}
@GenerateMicroBenchmark
public int arrayRegion() {
return arrayRegionImpl(array);
}
@GenerateMicroBenchmark
public int arrayElements() {
return arrayElementsImpl(array);
}
@GenerateMicroBenchmark
public int arrayElementsCritical() {
return arrayElementsCriticalImpl(array);
}
@GenerateMicroBenchmark
public int javaCritical() {
return javaCriticalImpl(array);
}
static native int arrayRegionImpl(byte[] array);
static native int arrayElementsImpl(byte[] array);
static native int arrayElementsCriticalImpl(byte[] array);
static native int javaCriticalImpl(byte[] array);
static {
System.loadLibrary("natives");
}
}
results
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
Benchmark (length) Mode Samples Mean Mean error Units
b.Natives.arrayElements 16 thrpt 5 7001,853 66,532 ops/ms
b.Natives.arrayElements 256 thrpt 5 4151,384 89,509 ops/ms
b.Natives.arrayElements 4096 thrpt 5 571,006 5,534 ops/ms
b.Natives.arrayElements 65536 thrpt 5 37,745 2,814 ops/ms
b.Natives.arrayElements 1048576 thrpt 5 1,462 0,017 ops/ms
b.Natives.arrayElementsCritical 16 thrpt 5 14467,389 70,073 ops/ms
b.Natives.arrayElementsCritical 256 thrpt 5 6088,534 218,885 ops/ms
b.Natives.arrayElementsCritical 4096 thrpt 5 677,528 12,340 ops/ms
b.Natives.arrayElementsCritical 65536 thrpt 5 44,484 0,914 ops/ms
b.Natives.arrayElementsCritical 1048576 thrpt 5 2,788 0,020 ops/ms
b.Natives.arrayRegion 16 thrpt 5 19057,185 268,072 ops/ms
b.Natives.arrayRegion 256 thrpt 5 6722,180 46,057 ops/ms
b.Natives.arrayRegion 4096 thrpt 5 612,198 5,555 ops/ms
b.Natives.arrayRegion 65536 thrpt 5 37,488 0,981 ops/ms
b.Natives.arrayRegion 1048576 thrpt 5 2,054 0,071 ops/ms
b.Natives.javaCritical 16 thrpt 5 60779,676 234,483 ops/ms
b.Natives.javaCritical 256 thrpt 5 9531,828 67,106 ops/ms
b.Natives.javaCritical 4096 thrpt 5 707,566 13,330 ops/ms
b.Natives.javaCritical 65536 thrpt 5 44,653 0,927 ops/ms
b.Natives.javaCritical 1048576 thrpt 5 2,793 0,047 ops/ms

It turns out that for small arrays the cost of a JNI call is several times greater than the running time of the method itself! For arrays of hundreds of bytes, the overhead is comparable to useful work. Well, and for multi-kilobyte arrays, the method of calling is not so important - all the time is spent actually on processing.
conclusions
Critical Natives is a private JNI extension in HotSpot, introduced with JDK 7. By implementing a JNI-like function according to certain rules, you can significantly reduce the overhead of calling the native method and processing Java arrays in native code. However, for long-playing functions, such a solution will not work, since the GC will not be able to start while the Critical Native is running.