CooperMaster February 19, 2016 at 16:18

The battle for sound speed on Android x86

Transfer

At the heart of the "pyramid of needs" of those who need Android applications to work with sound, is the speed of the system's response to user actions. Suppose a certain program starts up quickly and shows a beautiful picture with a piano keyboard. For a start - not bad, but if the moments of touching the keys and the beginning of the sound (albeit simply amazing) share a noticeable time, the program will be closed and will not return to it.

Let's talk about the features of high-speed audio playback on Android devices based on Intel Atom (Bay Trail) processors. The approach used can be used on other platforms from Intel. Here we are looking at Android 4.4.4., A similar study for the Android M platform is still in work.

Preliminary information

High-latency audio playback is one of Android’s problems that has a particularly strong impact on audio applications. The large time intervals between the user’s action and the beginning of the sound have a bad effect on programs for creating sound, on games, on software for DJs and karaoke applications. If such applications, in response to certain actions, play sounds with delays that the user finds too large, this seriously spoils his experience.

During the study, we will use the concept of Round-Trip Latency (RTL). In our case, this is the time that separates the moment a user or system performs an action that requires the creation and playback of an audio signal and the moment the sound starts.

Users are faced with a delay in playing sound in Android, for example, they relate to an object, the interaction with which should cause sound, but the sound does not play right away. On most ARM and x86 devices, the round-trip delay is from 300 to 600 ms., Mainly in applications that use standard Android audio output, a description of which can be found in the Design For Reduced Latency document .

Users do not accept this. Allowable round-trip delay should be much lower than 100 ms., And in most cases - and below 20 ms. Ideally, for professional use, the delay should be below 10 ms. Another thing to take into account is that in Android applications that work with sound, the total delay consists of three components. The first is touch latency.

The second is Audio Processing Latency. The third is the delay in queuing the audio buffer (Buffer Queuing).

Here we focus on reducing the delay in sound processing, and not on all three of the above components. However, by improving one of the factors, we will reduce the overall delay.

Sound device in Android

Like other Android mechanisms, the sound subsystem can be represented as consisting of several layers.

Sound subsystem Android.

Here you can find out more about the above scheme.

Note that the level of hardware abstractions (Hardware Abstraction Layer, HAL) of the Android audio subsystem serves as a link between high-level APIs designed for audio processing in android.media and underlying audio drivers and hardware.

OpenSL ES

Using the OpenSL ES API is the most reliable way to efficiently process an audio signal that should be played in response to user or application actions. Delays, and when using OpenSL ES, cannot be avoided, but it is recommended to use this API in the documentation for Android.

The reason for this recommendation is that OpenSL uses a mechanism for buffering audio data in a queue (Buffer Queueing), which increases efficiency when working in the Android Media Framework. All this is implemented on Android native code, that is, it can give higher performance, since such code is not subject to the problems characteristic of Java or the Dalvik virtual machine.

We believe that using OpenSL ES mechanisms is a step forward in the development of audio applications for Android. In addition, the documentation for the Android Native Development Kit contains information that it is planned to improve the implementation of OpenSL when new Android releases are released.

Here we look at using the OpenSL ES API through the NDK. To get started, here are three levels of code that form the basis for developing sound applications for Android using OpenSL

The top level is the Android SDK application development environment, which is based on Java.
A lower level of software environment called Android NDK allows developers to write C or C ++ code that can be used in applications using the Java Native Interface (JNI) mechanism.
The lower level is the OpenSL ES API, which has been supported on Android since version 2.3. The API is built into the NDK.

OpenSL works, like several other APIs, using a callback mechanism. In OpenSL, the callback can only be used to notify the application that a new buffer can be queued (to play or record sound). In other APIs, callback functions also support pointers to buffers with audio data that an application can populate with data or receive data from it. But in OpenSL, by choice, the API can be implemented so that the callback functions act as a signaling mechanism so that all calculations are performed in the stream responsible for processing the sound. This process involves queuing data buffers after receiving the assigned signals.

Google recommends using the Sched_FIFO scheduling policy when working with OpenSL. This policy is based on the ring buffer technique.

Sched_FIFO Planning Policy

Since Android is Linux-based, the Linux CFS Scheduler is involved here. CFS can allocate CPU resources unpredictably. For example, he is able to transfer control to a stream with a higher priority, in his opinion, depriving the authority of the stream, which seems less attractive to him. These are the features of CFS, if it touches a stream that is busy with sound processing, this can cause problems with buffer timings. As a result, there are large delays, the appearance of which is difficult to predict.

The main solution to this problem is to not use CFS for streams that are busy with sound and, instead of the SCHED_NORMAL scheduling policy (also called SCHED_OTHER), which CFS implements, apply the SCHED_FIFO policy.

Planning delay

A scheduling delay is the time that elapses between the moment when the thread is ready to start and the moment when the context switch is completed, that is, the beginning of the thread execution on the processor. The less this delay, the better, and if it is more than two milliseconds, sound problems are guaranteed. Long scheduling delays usually occur when changing processor modes. Among them are starting or stopping, switching between a protected core and a regular core, switching power consumption modes, or adjusting the frequency and power consumption of the processor.

Guided by the above considerations, consider a scheme for implementing sound processing on Android.

Ring buffer interface

The first thing that needs to be done for the proper organization of work is to prepare a ring buffer interface that can be used from the code. To do this, you need four functions:

Function to create a ring buffer.
Buffer write function.
The function of reading from the buffer.
Function to destroy the buffer.

Here is a sample code:

circular_buffer* create_circular_buffer(int bytes);
int read_circular_buffer_bytes(circular_buffer *p, char *out, int bytes);
int write_circular_buffer_bytes(circular_buffer *p, const char *in, int bytes);
void free_circular_buffer (circular_buffer *p);

The desired effect is that when the read operation is performed, the requested number of bytes is read, up to the amount of information that has already been written to the buffer. The write function will write data to the buffer taking into account the remaining free space in it. They return the number of bytes read or written - these numbers range from zero to the number requested by the function call.

The consumer stream (the input / output callback function, in case of playback, or the stream busy with sound processing in the case of recording) reads data from the ring buffer and then performs some operations on the read audio data. At the same time, asynchronously, the provider thread is filling the ring buffer with data, stopping only when the buffer is full. If you select the appropriate size of the ring buffer, these two flows will work smoothly, without interfering with each other.

Sound input / output

Using the interface that we examined above, audio input / output functions can be written using OpenSL callback functions. Here is an example of a function that processes an input stream:

/ обработчик функции обратного вызова, вызывается каждый раз, когда завершена запись в буфер 
void bqRecorderCallback(SLAndroidSimpleBufferQueueItf bq, void *context)
{
 OPENSL_STREAM *p = (OPENSL_STREAM *) context;
 int bytes = p->inBufSamples*sizeof(short);
 write_circular_buffer_bytes(p->inrb, (char *) p->recBuffer,bytes);
 (*p->recorderBufferQueue)->Enqueue(p->recorderBufferQueue,p->recBuffer,bytes);
}
// получает с устройства буфер с данными заданного размера
int android_AudioIn(OPENSL_STREAM *p,float *buffer,int size){
 short *inBuffer;
 int i, bytes = size*sizeof(short);
 if(p == NULL || p->inBufSamples == 0) return 0;
 bytes = read_circular_buffer_bytes(p->inrb, (char *)p->inputBuffer,bytes);
 size = bytes/sizeof(short);
 for(i=0; i < size; i++){
 buffer[i] = (float) p->inputBuffer[i]*CONVMYFLT;
 }
 if(p->outchannels == 0) p->time += (double) size/(p->sr*p->inchannels);
 return size;
}

In the callback function (lines 2-8), which is called every time a new full buffer (recBuffer) is ready, all data is written to the ring buffer. After that, the function is again queued for execution (line 7). The sound processing function (lines 10-21) tries to read the requested number of samples (line 14) into inputBuffer, and then copy this data to the output (converting it to a floating-point format). The function returns the number of copied samples.

Here is an example of a function that outputs sound.

// передаёт буфер заданного размера на устройство
int android_AudioOut(OPENSL_STREAM *p, float *buffer,int size){
short *outBuffer, *inBuffer;
int i, bytes = size*sizeof(short);
if(p == NULL || p->outBufSamples == 0) return 0;
for(i=0; i < size; i++){
p->outputBuffer[i] = (short) (buffer[i]*CONV16BIT);
}
bytes = write_circular_buffer_bytes(p->outrb, (char *) p->outputBuffer,bytes);
p->time += (double) size/(p->sr*p->outchannels);
return bytes/sizeof(short);
}
// обработчик функции обратного вызова, вызывается каждый раз, когда завершено воспроизведение данных из буфера
void bqPlayerCallback(SLAndroidSimpleBufferQueueItf bq, void *context)
{
 OPENSL_STREAM *p = (OPENSL_STREAM *) context;
 int bytes = p->outBufSamples*sizeof(short);
 read_circular_buffer_bytes(p->outrb, (char *) p->playBuffer,bytes);
 (*p->bqPlayerBufferQueue)->Enqueue(p->bqPlayerBufferQueue,p->playBuffer,bytes);
}

The sound processing function (lines 2-13) takes a certain amount of data stored in a floating-point format, converts them to integers, writes the entire outputBufer buffer to a ring buffer, and reports the number of recorded samples. The OpenSL callback function (lines 16-22) reads all the samples and queues them.

In order for all this to work correctly, it is necessary to transmit the output data on the number of samples read from the input, together with the buffer. Here is a loop that converts input to output.

while(on)
samps = android_AudioIn(p,inbuffer,VECSAMPS_MONO);
for(i = 0, j=0; i < samps; i++, j+=2)
 outbuffer[j] = outbuffer[j+1] = inbuffer[i];
 android_AudioOut(p,outbuffer,samps*2);
 }

In this code fragment in lines 5-6, read samples are crawled and copied to the output channels. The conversion of an input mono signal to a stereo output is presented here, which is why the same input data is copied to two consecutive positions of the output buffer. Now, when the buffer is queued, in OpenSL streams, in order to start the callback mechanism, we need to queue the buffer for recording and one more for playback after we start playing the sound. This will ensure that the callback function is triggered when the buffers need to be replaced.

We just looked at a simple example of implementing an audio input / output stream using OpenSL. Each implementation will be unique and will require modifications to the HAL and ALSA driver in order to squeeze everything out of the OpenSL implementation.

Finalization of the sound subsystem of Android on the x86 platform

Various OpenSL implementations do not guarantee that on all devices it will be possible to achieve the desired (up to 40 ms.) Level of delays when passing the audio signal to the "fast mixer" of Android. However, if you make modifications in Media Server, HAL, in the ALSA driver, various devices can, with varying degrees of success, show good results in processing audio with low latency. In the course of a study devoted to studying what is needed to increase the response speed when working with sound on Android, we implemented the corresponding solution on the Dell Venue 8 7460 tablet.

As a result of the experiments, a hybrid system for processing media data was created. In it, the stream processing the input data is controlled by a dedicated fast server that processes the original audio signal. Then the signal is transmitted to the media server, implemented in Android, which uses the stream of "fast mixer". Servers that process input and output data use the OpenSL Sched_FIFO scheduling engine.

Implementation of fast sound processing, drawing provided by Eric Serre (Eric Serre)

As a result of the modifications made, it is possible to achieve a completely acceptable RTL of 45 milliseconds. This implementation relies on the Intel Atom SoC and on the features of the device used in the experiment. The test was conducted on the Intel Software Development Platform and is available through the Intel Partner Software Development Program.

The implementation of OpenSL and the SCHED_FIFO scheduling policy demonstrate the efficient processing of real-time sound. It should be noted that this implementation is not available on all devices, as it was created for the aforementioned tablet computer, taking into account its software and hardware features.

In order to find out how the sound processing technique presented in this material will show itself on other devices, it is necessary to conduct appropriate tests. After conducting such tests, we can provide the results to the partner developers.

conclusions

We discussed the features of using OpenSL to create a callback function and a buffer queue in an application that processes audio on Android. In addition, it reflects the efforts Intel has made to achieve the ability to work with sound with low latency using a modified Media Framework.

In order to implement such a system on your own, follow the recommendations of Google and take into account the features of building applications for fast sound processing, which we described in this article. The results obtained allow us to say that reducing the delay in processing audio data on Android is a very real task, but the battle for the speed of sound on the Android x86 platform continues.

Tags: