Adblock for radio

Original author: Tomek Rękawek
  • Transfer
The author of the article, Polish programmer Tomek Rekavek, is developing the Jackrabbit Oak project as part of the Apache Software Foundation for Adobe. The article was published in the author’s personal blog on February 24, 2016.

Polish Radio-3 (the so-called “Troika”) is famous for good music and intelligent presenters. On the other hand, it suffers from the presence of loud and annoying ad units in the broadcast, where usually any electronics or medicine is advertised. I listen to Troika almost constantly at work and at home, so I wondered: how to remove ads? It seems I managed to find a solution.

Digital signal processing


My goal is to create an application that mutes ads. The commercial unit starts and ends with jingles, so the program must recognize these particular sounds and turn off the sound between them.

I know that this area of ​​mathematics / computer science is called digital signal processing , but to me, DSP always seemed like magic. Well, a great opportunity to learn something new. I spent a day or two trying to figure out which mechanism to use for analyzing the audio stream. And in the end I found what I needed: it is a cross-correlation or cross-correlation.

Octave


Usually all refer to the MATLAB implementation. But MATLAB is an expensive application that simplifies complex mathematical operations, including DSP. Fortunately, there is a free alternative called Octave . It seems that in Octave, it is not difficult to start mutual correlation on two audio files. You just need to run the following commands:

pkg load signal
jingle = wavread('jingle.wav')(:,1);
audio = wavread ('audio.wav')(:,1);
[R, lag] = xcorr(jingle, audio);
plot(R);

The following graph will turn out:



A peak is clearly noticeable, describing the position of jingle.wavc audio.wav. What surprised me was the simplicity of the method: it does all the work xcorr(), the rest of the code is only for reading files and displaying the result.

I wanted to implement the same algorithm in Java, and then I will have a tool that:

  1. reads an audio stream from a standard input (for example, from ffmpeg),
  2. analyzes it in search of jingles,
  3. prints the same stream to stdout and / or disables it.

Using stdin and stdout will allow you to connect a new analyzer to other applications responsible for audio broadcasting and playback of the result.

Reading sound files


First of all, a Java program must read the jingle (saved as a file .wav) into an array. There is some additional information in the file, such as headers, metadata and other things, but we need only sound. A suitable format is called PCM, it’s just a list of numbers representing sounds. Convert WAV to PCM can ffmpeg:

ffmpeg -i input.wav -f s16le -acodec pcm_s16le output.raw

Here each sample is stored as a 16-bit number with inverse byte order (little endian). In Java, this number is called short, and to automatically convert the input stream to a list of values, shortyou can use the class ByteBuffer:

ByteBuffer buf = ByteBuffer.allocate(4);
buf.order(ByteOrder.LITTLE_ENDIAN);
buf.put(bytes);
short leftChannel = buf.readShort(); // stereo streamshort rightChannel = buf.readShort();

Xcorr reverse engineering


To implement the function xcorr()in Java, I studied the Octave source code . Without changing the final result, I was able to replace the xcorr () call with the following lines - they need to be rewritten in Java:

N    = length(audio);
M    = 2 ^ nextpow2(2 * N - 1);
pre  = fft(postpad(prepad(jingle(:), length(jingle) + N - 1), M));
post = fft(postpad(audio(:), M));
cor  = ifft(pre .* conj(post));
R    = real(cor(1:2 * N));

It looks scary, but most of the functions are trivial array operations. The cross-correlation is based on the application of the fast Fourier transform on a sound sample.

Fast Fourier Transform


As a person who had no experience with DSP, I simply view FFT as a function that takes an array with a sound sample description — and returns an array with complex numbers representing frequencies. This minimalist approach worked well: I launched the FFT implementation from the JTransforms package and got the same results as in Octave. I think this is partly a cargo cult , but damn, it works!

Run xcorr on stream


The algorithm above assumes that it audiois an array in which we are looking jingle. This is not quite suitable for radio broadcasting, where we have a continuous stream of sound. To run the analysis, I created a circular buffer slightly longer than the duration of the jingle, which needs to be recognized. The incoming stream fills the buffer, and as soon as it is filled, the cross-correlation test is run. If nothing is found, then the oldest part of the buffer is discarded - and again we expect it to be filled.

I experimented a bit with the length of the buffer and got the best results with the buffer size 1.5 times the size of the jingle.

Putting it all together


Getting a stream in PCM is easy. This can be done using the above ffmpeg. The command below redirects the stream to the standard input java, and then outputs Got jingle 0or Got jingle 1, when the corresponding pattern is found in the stream.

ffmpeg -loglevel -8 \       -i http://stream3.polskieradio.pl:8904/\;stream \       -f s16le -acodec pcm_s16le - \  | java -jar target/analyzer-1.0.0-SNAPSHOT-jar-with-dependencies.jar \    2 \    src/test/resources/commercial-start-44.1k.raw 500 \    src/test/resources/commercial-end-44.1k.raw 700

Standalone version


I also prepared a simple offline version of the analyzer, which itself connects to the stream of the Troika (without an external one ffmpeg) and reproduces the result using javax.sound. Everything fits into one JAR file and contains a basic user interface with the Star and Stop buttons. It can be downloaded here . If you don’t like to run other people's JARs on your machine (which is absolutely correct), then all the sources are on GitHub .



It seems that everything works as it should :)

Further work


The ultimate goal is to disable advertising at the level of a hardware amplifier, receiving a “real” FM signal, rather than some kind of Internet stream. This is covered in the next article .

Update (June 2018)


Hacker News
Talk Wykop
Talk Reddit Talk

Also popular now: