WavPlayer - we are not looking for easy ways, we are paving them

    As you know, telephony involves voice transmission. Nobody needs a full bandwidth of 20Hz-20kHz for voice transmission; up to 3.5kHz is enough for a clear distinguishable and recognizable voice. To be more precise, the speech frequency band used in telephony is from 300 to 3400 Hz. When compressing into a common channel, for accurate allocation, protective frequency intervals along the edges are needed, because the bandwidth is 4 kHz. When digitizing it turns out 8kHz. Now, in connection with the development of the thickness of communication channels, the same Skype and others, boasting "increased" quality, use 16kHz, or even 32kHz, which, however, is really not audible in a normal conversation (but it is very clearly distinguishable when worsened quality of the communication channel, but when it worried the marketers).

    So, almost all the sound files that are used in telephony are recorded with 8kHz digitization. To speed up the processing of large flows, the compression methods used are just as simple and aimed at a decent result when applied to the desired - speech compression. This is a simple digitization ( PCM ), simple delta codecs (ADPCM, G711 ), or tricky codecs ( GSM 06.10 ). These formats are "native" to telephony systems - asterisk, freeswitch (and probably others too). In these formats, data is prepared for playing by the system to people; in the same formats, the system can record records.

    However, now the web is increasingly expanding around the planet, and people want to be able to listen to recordings of conversations, greetings, etc. on the web, where mp3 has become the “native” format ...

    As a result, for the rare “listen to archive” function, the naive solution is to configure the server to transcode the recordings from the telephone format to MP3.

    Everything would be fine, but:
    • in mp3 recordings become either larger or worse;
    • transcoding to mp3 requires load on the server CPU;
    • recoding occurs after the fact, and not on the fly (although there is a cure for this too);
    • transcoded files are essentially only needed for the client.

    Seeing this disgrace, the soul of an engineer fell ill and began to demand to do well. And it’s not “to do it bad, but then how it was”, namely, it’s good and straightforward: in fact, the codecs used in telephony are designed for good results, and it’s extremely cheap. So why do an expensive encoding operation in MP3, then to do an expensive decoding operation from MP3 on the client just because this decoder is already there? Let's just make this simplest decoder on the client, and that’s it!

    I was particularly surprised by the lack of these ready-made decoders. This is how WavPlayer was born : a flash player for telephony files.

    What can he do:
    • GUI with a strip for jumping on a record, a GUI without it, generally no UI
    • API for managing and rendering the entire interface on the JS side
    • Codec support: PCM, G.711u / a, GSM 06.10, IMA ADPCM
    • Support for formats: AU, WAV, several standard RAW

    And recently, users have added proxies to the standard MP3 player so that only WavPlayer can be used to play both native and transcoded archives. (Initially, I did not do this, assuming that it was the JS side's concern to use any of the flash-mp3 players, html5, or use WavPlayer).

    Anyone who reads the descriptions of each of the codecs and formats will understand that the player is as simple as a traffic jam. But if that were so, it would have existed for a long time ... Therefore, I will briefly tell the story of its creation.

    Initially, only one was supposed to play sounds in a flash: playing mp3 inserts. All. Nothing more. Starting with the 10th version, the sampleData event has appeared in the interface flash.media.Sound, which allows generating and playing the generated sound. But as befits a flash, he does it only in his own way: only 44kHz, only stereo, only 32bit floating-point numbers.

    And we have 8kHz / 16kHz integers. If we just take the source data and just give out as-is, we get something poorly readable and very high frequency. Conclusion? It is necessary to interpolate the samples that we have - to do in other words Resample.

    When resampling, it is important to understand that even with a simple doubling of the frequency, you can’t just take and insert the “average” numbers between the two samples - the resulting sound will very “whistle” at high frequencies, because instead of a smooth sinusoid we get a saw. The correct resampling is obtained by restoring the original smooth sound (with minimizing the second derivative), and re-digitizing it at the desired frequency. This way we get the right smooth sound with the right sample rate.

    Since I, of course, know the theory, but in practice I am very lazy, and the task was to “play records” quite acutely, it was necessary to solve quickly. Flash I do not know, and a working machine under Linux. I looked at the size of the flash compiler - for a hundred meters, it became so broken that I decided to find an alternative to quickly and easily draw on a flash. Quick Googling gave a great option - HaXe . A simple C / java-like language that can be translated into several target platforms, including the one I need - a flash. He was taken.

    In general, the first working layout was scrambled in haste :

    There was a fogg project in which ogg files were manually decoded. From there, AudioSink was taken, which implements the push interface instead of pull: the buffer we write to, and when the flash wants the next piece of data, AudioSink sends it to it from the buffer. Not the most optimal and beautiful implementation, but ready-made. As a resampler, Lanczos resampler implementation (the highest quality, based on sinc functions) from OpenJDK was taken forehead. The code is not the most optimal (later it was implemented on a pure Action Script - it was able to accelerate almost 4 times), but it works (and I didn’t need anything else).
    The interface is simple: draw a triangle when it is. On click, play () is launched and a square is drawn. By clicking, two vertical sticks are drawn.
    For decoding G711, the code is taken from Sox, for PCM the code gave birth on its own.

    And, of course, a spoonful of OOP in this barrel of tyrocode: File and Decoder interfaces, which allow the main player to abstract from a specific variation. True, interfaces were born out of necessity, and not systematically, but when was it different? File works like this - the input data of the file is read, and shoved through the push () method to the decoder. As soon as all headers are read, the file creates inside itself a decoder of the appropriate format, and will begin to cram the audio data into it. The ready () method starts to return true, and from that moment on, all other stream metadata methods also become valid, and you can read the audio stream data using the getSamples () request, which will return samplesAvailable () samples.

    The operation of the decoder is also simple - it communicates the size of the sample in bytes so that the file can be cut into necessary packets to feed the decoder. The decoder is sequentially for converting the buffer data into one sample (into a signed float).

    The main problem that arises is the proper feeding of the sampler. Let me remind you that the resampler works on the principle of virtual double conversion - based on the input data with the input sampling frequency, a smooth signal is restored, which is re-digitized at the output frequency. A signal is always needed to restore a signal; therefore, first the decoder must be fed with silence of the desired length, for initialization. And throw out this silence from the first answer - then we will get the correct resampling right from the beginning. In the same way, after our data ends, the resampler must be fed silence after - to get all the restored information.

    And with such a macar our company of soldiers generates exactly how much data is needed at 44 kHz in the right form.

    After the base player started working, it began to be combed a little: first of all, support for more complex codecs, specifically gsm. It immediately became clear that not all were decoded sample-by-bit, batch processing was needed here - so the decoder interface was redone for the input array + offset, output array + offset, which returns how many samples were put to the output. To support Raw files, most of the code is universal, it was moved to a separate general class, so that overriding the minimum - only the required parameters in the initializer. The GSM decoder itself was taken as usual where it was found, simply transformed quickly into the desired syntax. Oddly enough - it all worked out with a bang.

    At the same time, the player control interface was drawn from the JS code + the events of loading, playing, pause were issued, allowing you to draw the state of the player in the browser as you want. The resulting product began to be sawed into production. When they started testing, some problems got out, especially in the deeply adored IE, which the file loaded in pieces seems to be 8k or 4k ... in general, a ton of events was generated, I had to cut the frequency of their generation.

    Unfortunately, it quickly became clear that no one had the desire to make an interface on JS. Then the decision was quickly and kneeled over by gua inside. The player began to generate internal events, and WavPlayerGui was created. His Mini heir remains as before - all button; plus, Full was created, which has the same button on the left and the progress bar on the right, showing the length, volume of the loaded and the current position. Well, that is, a little more squares of pieces, the sizes of which changed in response to events.

    As soon as this appeared, it became clear that in general it should also tick on it. Anyway, listening to the recordings is only completely stupid when you need to listen to the 3rd minute from the 15 minute ... You have to do seek (). The implementation of seek () in this case turned out to be the most difficult task: since we have no way to download the source file from an arbitrary position (we cannot guarantee Range support from the server, and it is not so easy to do this in a flash), we had to limit the possibilities of seek ( ) 'but only within the loaded part. But even in this case, we do not store the full amount of data transcoded at 44 kHz (memory, whimper, sorry), so if necessary to reposition, the following happens:
    • We check to see if seek () is in the range of ready 44kHz data - if so, just make a sic using the prepared data.
    • if not, look for a sample starting from which playback should begin in terms of the original stream
    • silence re-initializer,
    • The input stream is repositioned to the desired position,
    • start playback.


    Then there were a few cosmetic modifications from those who started using it in the public, and again there was a challenge - can IMA ADPCM support be made. The format is rather disgusting, from the point of view of being universal, it turned out that the data is not channel by channel, but mixed up in the same place, so I also had to transmit the decoded channel to the decoder; at the same time, I had to make a bit of universality for all other codecs, because the amount of output data, depending on the input for all others, is fixed and simple; and here ... in general, it depends on - a clear history is required, and decoding cannot be started in any way from an arbitrary place. Accordingly, for seek (), the function works like this:
    • we check if seek () is in the range of ready 44kHz data - if so, just make a sic using the ready data
    • if not, look for a sample starting from which playback should begin in terms of the original stream
    • looking for a sample where you can start decoding
    • silence re-initializer,
    • the input stream is repositioned to the decoding position
    • do decoding and throwing to the position with which to start playing
    • start playback.


    In general, oddly enough, this also works. And at the moment, it is available for use by everyone: it does exactly what it needs, exactly the way it should.
    For a complete thrill, it remains only to somehow make at last the very interface on JS that I assumed our web developers would do; plus to make a simple and understandable example of integration that you can put copy-paste into your site, because most often this integration problem falls on the shoulders of the system administrator, and not the programmer ... So, to be continued.

    Project on Github | Online demo.

    Also popular now: