 March 13, 2014 at 18:55
 March 13, 2014 at 18:55Android SDK vs NDK - performance comparison of similar code sections
In order to improve application performance on Android, he began to gradually rewrite critical sections of code from Java (SDK) to C ++ (NDK). The result turned out to be comparable to what I got a couple of decades ago, making assembler inserts into the turbopascal code. 
I do not set myself the task of describing the work with Android NDK - the experience itself is not enough. Those who are interested should start with this link.
The purpose of this short article is to give a few figures that I obtained empirically by comparing the execution time of certain functions written in Java and then rewritten in C ++. And, perhaps, these figures motivate someone to study this issue more deeply.
Since my application is related to photo processing, the bottlenecks were cycles of bypassing the pixels of the image and certain actions on them. I tested on real devices - Nexus One and Nexus 7 (2012). The experimental results (in ms) are summarized in tables:
The average gain in speed for the Nexus One is 21 times, for the Nexus 7 - 36 times.
On average, the speed gain for Nexus One is 112 times, for Nexus 7 - 82 times.
On average, the speed gain for Nexus One is 4 times, for Nexus 7 - 5 times.
As you can see, the results differ by one or even two orders of magnitude. I specifically cited figures in absolute values so that you can see the real acceleration of work from the use of NDK. The relatively modest results of the last test are due to the fact that the standard functions of the OpenCV library, which are quite well optimized, were used to calculate the overlay. Accordingly, this test clearly shows the real acceleration of the application as a whole.
I’ll touch on the application of the OpenCV library. As I expected, the Java part of the library is a regular wrapper over the NDK. Nevertheless, he conducted the above experiments on rather heavy and long-playing algorithms - such as finding characteristic points in images, grabcut - a method. The difference in speed between Java and NDK was a maximum of 10%, which can be attributed to the error, since I could not get exactly the same images at that moment.
Update It’s rather unpleasant to admit your own mistakes, but what to do.
So, here is a sample code with which I evaluated the performance of the Java implementation of the OpenCV library:
We pass pixel-by-pixel two matrices of the same size, and depending on the value of the corresponding pixel of the one and the other matrix, we calculate the resulting pixel.
Thanks to the comments in the comments to the article, the code was optimized as follows (the figures are one-color):
For testing, I again used real Nexus One and Nexus 7 devices, but fed 3 megapixel pictures to the input in both cases - I wanted to compare the performance of the devices together. The results (average, in ms) are summarized in the table:
Everyone can draw conclusions himself. Code optimization in C ++ was carried out on the same principle as in Java. I don’t provide the code, it is the same as the one above.
I do not set myself the task of describing the work with Android NDK - the experience itself is not enough. Those who are interested should start with this link.
The purpose of this short article is to give a few figures that I obtained empirically by comparing the execution time of certain functions written in Java and then rewritten in C ++. And, perhaps, these figures motivate someone to study this issue more deeply.
Since my application is related to photo processing, the bottlenecks were cycles of bypassing the pixels of the image and certain actions on them. I tested on real devices - Nexus One and Nexus 7 (2012). The experimental results (in ms) are summarized in tables:
Layer overlay (Luminosity mode, color drawing)
| Nexus one | Nexus 7 | ||
|---|---|---|---|
| SDK | Ndk | SDK | Ndk | 
| 2563 | 120 | 4850 | 90 | 
| 2122 | 100 | 4520 | 190 | 
| 2162 | 110 | 4330 | 100 | 
The average gain in speed for the Nexus One is 21 times, for the Nexus 7 - 36 times.
Layer Overlay (Color Dodge Mode, Monochrome)
| Nexus one | Nexus 7 | ||
|---|---|---|---|
| SDK | Ndk | SDK | Ndk | 
| 2673 | thirty | 5720 | 80 | 
| 2572 | 20 | 6230 | 70 | 
| 2573 | 20 | 6110 | 70 | 
On average, the speed gain for Nexus One is 112 times, for Nexus 7 - 82 times.
Overlay transparency gradient
| Nexus one | Nexus 7 | ||
|---|---|---|---|
| SDK | Ndk | SDK | Ndk | 
| 1301 | 321 | 3010 | 470 | 
| 1221 | 330 | 2670 | 620 | 
| 1211 | 300 | 2770 | 610 | 
On average, the speed gain for Nexus One is 4 times, for Nexus 7 - 5 times.
As you can see, the results differ by one or even two orders of magnitude. I specifically cited figures in absolute values so that you can see the real acceleration of work from the use of NDK. The relatively modest results of the last test are due to the fact that the standard functions of the OpenCV library, which are quite well optimized, were used to calculate the overlay. Accordingly, this test clearly shows the real acceleration of the application as a whole.
I’ll touch on the application of the OpenCV library. As I expected, the Java part of the library is a regular wrapper over the NDK. Nevertheless, he conducted the above experiments on rather heavy and long-playing algorithms - such as finding characteristic points in images, grabcut - a method. The difference in speed between Java and NDK was a maximum of 10%, which can be attributed to the error, since I could not get exactly the same images at that moment.
Update It’s rather unpleasant to admit your own mistakes, but what to do.
So, here is a sample code with which I evaluated the performance of the Java implementation of the OpenCV library:
for (int i=0; i255){
			matPix[0] = 255.;
		} else {
			matPix[0] = (255. * matPix[0]) / (256. - topPix[0]);
		}
		mat.put(i, j, matPix);
	}	
} We pass pixel-by-pixel two matrices of the same size, and depending on the value of the corresponding pixel of the one and the other matrix, we calculate the resulting pixel.
Thanks to the comments in the comments to the article, the code was optimized as follows (the figures are one-color):
int size = mat.cols();
byte[] matPix = new byte[size];
byte[] topPix = new byte[size];
for (int i=0; i255){
            mp = 255;
        } else {
            mp = (255 * mp) / (256 - tp);
        }
        matPix[j] = (byte) mp;
    }
    mat.put(i, 0, matPix);
} For testing, I again used real Nexus One and Nexus 7 devices, but fed 3 megapixel pictures to the input in both cases - I wanted to compare the performance of the devices together. The results (average, in ms) are summarized in the table:
| Nexus one | Nexus 7 | |||
|---|---|---|---|---|
| SDK | Ndk | SDK | Ndk | |
| No optimization | 35404 | 245 | 22755 | 160 | 
| With optimization | 340 | 205 | 210 | 120 | 
Everyone can draw conclusions himself. Code optimization in C ++ was carried out on the same principle as in Java. I don’t provide the code, it is the same as the one above.