merlin-vrn April 21, 2014 at 23:22

Internet radio with many leading from different cities and live calls

From May 1 to May 4, 2014 in Voronezh the annual all-Russian festival of Japanese animation will be held for the fifteenth time. The festival has become a tradition for us, visitors come from many cities of Russia.

Participants and visitors have many questions and ideas regarding the festival. And although they have many opportunities to ask these questions to us and get answers, the organizers came up with a way to give them another opportunity - to organize an Internet radio and the ability to call live and ask a question there.

However, this is complicated by the fact that the organizing committee of the festival itself is geographically very distributed. We are located in different cities, including Kobe (Japan), Moscow, Rostov-on-Don, Waldkirch (Germany), Krasnodar, and of course, Voronezh, and it is very long and expensive to gather physically in one place. (It is enough that everyone gathers at the festival itself.) And you also need to organize incoming calls on the air, and preferably also for free. At the same time, I would like the simplest and safest instructions, for example, using existing software on audience computers.

The organizers quite successfully make the festival using voice conferences in Skype. The natural idea was to get together in a conference and somehow wrap it in a radio. And for receiving calls - launch a second Skype on your computer, with a different account, and at the right time after receiving the call, wrap it on the radio and in the conference (and also the conference in it).

All of the following applies to Linux. I deliberately do not give the exact names of the packages, as they may differ in different distributions. I must also immediately warn that I have not been working with Windows for many years, and I have no idea how to do the same in it.

Note about the screenshots: they were not made at the very last moment. They reflect the essence, but not the exact settings that were used. If something is different, believe should be the text, not the pictures.

Software

JACK (the recursive acronym "JACK Audio Connection Kit") is a synchronous sound server with zero delay. It was controlled using qjackctl
Skype , two instances
Virtual sound cards made through the ALSA snd-aloop module
The alsa_in and alsa_out programs that come with JACK
Switch Center - jack_mixer
Gate and compressor for the microphone, as well as the output limiter before the broadcast - from the Calf Studio Gear package
Icecast - broadcast server
JACK's Icecast Feeder - Darkice , and Front End - Darksnow
Player - Audacious

Switching and configuration

So, in the center of the system is the JACK server, which runs on my external sound card. A microphone and headphones are connected to this card. There are no special subtleties here.

Skype

Skype supports ALSA and Pulseaudio (so that it is dead), and does not support JACK. To wrap Skype in JACK, you have to build crutches using the ALSA virtual sound card module - snd_aloop.

Simply loading this module will create one Loopback virtual sound card with two devices (pcm0 and pcm1), each with eight streams (sub0..sub8). If sound is output to the first stream of the first device (pcm0p / sub0), this sound can be recorded from the first stream of the second device (pcm1c / sub0). At the same time, the data format is set by the first of the applications that opened the device: if, say, you started capturing to pcm1c / sub0 in the 44100 Hz signed 32bit little endian mode, then you must play back to pcm0p / sub0 in exactly the same format. The module does not know how to convert anything, it just pretends to be a sound card and shuffles the data back and forth.

Skype in Linux in ALSA mode is a rather moody app. He opens the sound card without fail in 16 bit 48 kHz mode, mono capture, and 16 bit, 48 kHz, stereo playback; if the card refuses to work in this mode, Skype will either report that it is impossible to initialize the sound device, or the sound (usually captured) will be severely distorted.

We need two Skype. For convenience, I would like each of them to work with their own card - then it will be easier to establish switching. If you read modinfo snd_aloop, you can find out that the module accepts several very useful parameters, including index numbers (yes, you can make several of them), names of virtual cards and the number of streams in each. That is, we load the module in this way:

modprobe snd-aloop enable=1,1 index=2,3 id=Chat,Incoming pcm_substreams=1,1

At the same time, two virtual cards will be created in the system, under the numbers 2 and 3, with the names Chat and Incoming, respectively, each of which will contain one stream. You can view the properties of these cards in / proc / asound / card2 and ... / card3. (Such index numbers on my system were dictated by the fact that I already have a built-in card with index 0 and a USB card with index 1.)

The first Skype starts as usual. To launch the second, you need a special command line:

screen skype --dbpath=~/.Skype.vrnfest --secondary

I ran it in screen to untie it from the terminal. Here dbpath is the path to the Skype profile (by default it is ~ / .Skype), and --secondary - so that it starts again.

Each Skype is obviously configured to work with its virtual sound card, with devices 0. There is a subtle point here related to how Skype works with ALSA. To play the ringtone and for the sound that came from the network itself, it can use different devices. If you specify the same thing for “call” and “speakers” in the settings, it will open the device two times, falling on different streams. The hardware card usually mixes them, but snd-aloop does not; in addition, I limited the virtual cards to one thread, and in this case, when you click the “accept call” button, a reset will occur with the comment “failed to initialize the sound device”. Therefore, in both Skype, I indicated for the call a physical built-in sound card.

Now, after Skype is configured, you need to wrap the second ends of the virtual cards in JACK, for which the alsa_in and alsa_out programs are used. Since Skype is capricious, it should be the first to take audio devices to configure them for themselves. Therefore, we call from one Skype to another, and we accept a call. While the call hangs, you can run the bundles:

screen -dmS chat_in alsa_in -d hw:2,1 -j chat_in
screen -dmS chat_out alsa_out -d hw:2,1 -r 48000 -q 1 -c 1 -j chat_out
screen -dmS incoming_in alsa_in -d hw:3,1 -j incoming_in
screen -dmS incoming_out alsa_out -d hw:3,1 -r 48000 -q 1 -c 1 -j incoming_out

As you can see, the JACK client names are also indicated here - so as not to confuse which of alsa_in and alsa_out is connected with what. We use devices of 1 virtual cards.

So, now Skype is wrapped in JACK, and:

from chat_in: capture_1 we take the sound from the conference of the organizers (there is still capture_2, but it will not be used)
in chat_out: playback_1 send sound to the organizers conference
from incoming_in: capture_1 we take the sound of an incoming call
in incoming_out: playback_1 send a sound destined for the caller

Player, microphone, headphones (monitor) and broadcast

With headphones, everything is simple - they are visible as two channels, system: playback_1 and system: playback_2. The microphone is physically connected to the first input of the card running JACK, so it appears as system: capture_1. However, one cannot just pick up and use a microphone. Firstly, I want to cut off unnecessary sounds with a gate, and secondly, to compress the rest to be similar in dynamics to the signal coming from Skype (Skype itself compresses the dynamic range quite strongly). To do this, run calfjackhost and add the Gate and Compressor plugins to it. For my case, I selected the following settings:

Gate: threshold -36 dB, ratio 3, knee 9 dB, attack 20 ms, release 450 ms, max reduction -inf dB
Compressor: gain 10 dB, threshold -18 dB, ratio 3.5, attack 20 ms, release 250 ms, knee 9 dB

Broadcast will go to icecast using darkice, which is configured through darksnow. There is nothing special there, except for the output limiter before darkice, which should level the signal. This is the "Limiter" plugin in calfjackhost, with this setup:

Limiter: input gain: 12 dB, lookahead: 10 ms, limit: -1 dB, release: 300 ms

The player will appear in JACK when the first track starts and will remain so. He himself will connect to the physical output (system: playback), you need to disable it.

jack_mixer and switching

It remains to tie everything together, with the ability to quickly start and cancel the direction of sound flows. For this, the jack_mixer program was used. It is quite primitive and with a plain interface, but its functionality turned out to be (almost) enough.

Run it and add it through the menu:

Three incoming mono channels: mic, chat_in, incoming_in
Incoming stereo channel: player
Four outgoing channels (they are always stereo): monitor, chat_out, incoming_out, radio

This will cause jack_mixer to declare a bunch of inputs and outputs in JACK:

inputs: jack_mixer: chat_in, ...: incoming_in, mic, player_L, player_R
outputs: jack_mixer: chat_in Out, ...: chat_out R, chat_out L, incoming_in Out, incoming_out L, incoming_out R, MAIN L, MAIN R, mic Out, Monitor L, Monitor R, monitor L, monitor R, player Out L , player Out R, radio L, radio R

The setting can be saved to a file for reuse.

It can be seen that for each input there is an output with the same name and postfix "Out". This is the same signal as the input, but after the fader. All stereo channels have outputs L and R. There are two special stereo outputs - MAIN and Monitor (with a capital letter). MAIN is a normal stereo output, and Monitor will get a copy of the output under which the Mon button is clicked on the interface. I didn’t use these two outputs: I didn’t need the monitor functionality, and MAIN would use it if it could be renamed to something more specific.

Also in each channel appears on the Mute button corresponding to each output. From these Mute buttons, a mixer switching matrix is dialed. This is the main functionality for which the program is used.

In general, everything needs to be connected (“commute”):

system: capture_1 -> calf: gate_in_l, calf: gate_out_l -> calf: compressor_in_l, calf: compressor_out_l -> jack_mixer: mic
chat_in: capture_1 -> jack_mixer: chat_in, jack_mixer: chat_out L -> chat_out: playback_1
incoming_in: capture_1 -> jack_mixer: incoming_in, jack_mixer: incoming_out L -> incoming_out: playback_1
jack_mixer: radio L -> calf: limiter_in_l, calf_limiter_out_l -> darkice: left, and the same for the right channel
jack_mixer: monitor L -> system: playback_1, and the same for the right channel
audacious_jack: out_0 -> jack_mixer: player_L, and the same for the right channel

The work of the radio bridge

So, we are gathering a conference in one of the Skype, make sure that no one wheezes, does not give feedback, and so on. Everyone warns others that there is a broadcast and it would be better if they did not make extraneous sounds.

In the player, a playlist is dialed, in which the background music is highlighted (against which the conversations of the presenters and the callers will sound). To do this, I used the feature of several playlists in Audacious (playlists are displayed as tabs that can be renamed).

Conference participants, as well as callers on the air, cannot listen to the broadcast itself. Firstly, it will confuse them, since it comes with a delay for all sorts of buffering, and is 10-30 seconds; in addition, it is highly advisable to use headphones to ensure that there is no feedback (radio> microphone).

Therefore, it is necessary to copy the broadcast content to the conference, that is, the output of the player; the caller needs to copy the sound from the conference, background music is optional.

In the end, I came to the next model. Our system can be in one of three public basic states: conversations of the hosts, a conversation with a dialed listener, and music playback. In fact, during a conversation between hosts or music, an incoming call may be received, during which they will explain to the caller what is possible and what is not possible, they ensure that he also does not create feedback. After this, the leader (in text or voice) is announced that there is a dialer, and when necessary, commute it.

These states are actually the states of the switching matrix. It is immediately obvious that some buttons of the matrix will immediately be brought into a certain state and will never change; others will turn on and off. Constantly included:

mic-> monitor - to hear yourself in the headphones
mic-> incoming_out - either there is no caller anyway and the signal is discarded by snd-aloop, or he should hear the operator
incoming_in-> monitor - either there is no caller anyway and there is silence from the input, or the operator needs to hear it
player-> radio - the player always plays something, be it music or a background for conversations
player-> monitor - so that the operator can hear the music
player-> chat_out - music to the conference

The player input level is adjustable, depending on whether it is a background or a track.

Always disabled:

chat_in-> chat_out - conferences do not need feedback
incoming_in-> incoming_out - the incoming does not need feedback
player-> incoming_out - the incoming does not have to hear background music

In the process of conducting the broadcast, the operator can make comments in the lead voice or text, while the voice should not be broadcast (mic-> radio in the Mute position).

Listening to music

In this state, only music goes on the air. The operator and the conference can communicate if they can shout it, because the music is transmitted to the conference. In addition to permanent, included:

mic-> chat_out - so that the conference hears the statement
chat_in-> monitor - so that the operator can hear the conference

Some time before the end of the track, you need to warn the presenters about this. If an incoming call arrives, we proceed to receive it: turn off the conference and the music in the monitors, receive the call, then inform the conference about this and wait for the end of the track.

Chatter on air

Here is aired music (quietly) and a conference. The routing is as follows:

mic-> chat_out
mic-> radio - if the operator participates in the discussion
chat_in-> radio

If an incoming call arrives, the microphone disconnects from the conference and the air, and the conference disconnects from the monitor. It is worth warning the conference that there is an incoming one so that they do not announce a new track.

Receive an incoming call

It is possible if music plays or real-time conversation takes place (the image reflects the second option). In this case, the operator disconnects the sound communication with the conference and from the radio, and receives a Skype call. Having instructed the caller, he informs the conference of the incoming person and when the presenters are ready to enter him, he commutes him to the conference and to the radio.
The following channels are included (except for those that are always on):

chat_in-> radio if the call is in talk mode

Conversation with the caller

In this mode, a conversation occurs with the caller, which is broadcast. The following channels are included:

chat_in-> radio
chat_in-> monitor
chat_in-> incoming_out so that the caller can hear the conference
incoming_in-> radio
incoming_in-> chat_out so that the conference hears the caller
mic-> chat_out
mic-> radio if the operator also participates in the conversation

Work environment in progress

The easiest way to comment on this is with an illustration:

All other windows that are not needed during maintenance are located on another KDE desktop.

The conference can be gathered by any participant. It’s easier if it is not a radio bridge operator to unload it. From the point of view of the conference, the inbox is on behalf of the operator (for example, the operator’s icon is highlighted in Skype if the caller speaks).

Radio broadcast of the festival, held on April 19

According to this scheme, we organized a pre-festival radio bridge . It began on April 19 at 20 p.m. and was planned for three hours, but in the process it was decided to extend it by an hour.

Before the start of the air, tracks for playback were selected, an approximate air script was written (in the form of the order of the tracks). The air turned out to be more verbose than originally assumed, so the track script did not reach the end.

For all the time, up to one hundred simultaneous listeners listened on the air. With Skype "for incoming calls" during the broadcast, two people were controlled, since text questions were sent to it. A Skype text conference was also organized for visitors via Skype text chat for them. The load on the operator (whom I spoke) turned out to be less than expected, so I managed to communicate with them in this chat too.

The air was recorded, after the air the recording was posted for online listening, like a podcast (after a short erase of the moments in which there were gaps). Three hundred people have downloaded it at the moment.

The operator three or four times in all the time the Internet was interrupted. At this time, the administrator of the relay server promptly connected the backup stream, or started the track that should have been further in the script. The logic of the built-in fallback in Icecast did not suit us.

For the first hour, not a single incoming call was received. Then they started to call, and sometimes people calling themselves sometimes did not believe that they had got through. By the way, it’s a rather funny feeling when the caller recognized you by Skype’s voice and you don’t know this person.

Conclusions and direction for further development

Radio was a new thing for us, and several mistakes were made. The following points can be noted:

Management turned out to be a little more complicated than it could be. This problem is solved, you should write a program with five switch buttons that will control the mixer via OSC or MIDI
Audacious is not the most convenient player for this application. It is lightweight and does not contain the harmful functions of a media library, but it lacked the useful function of a simple crossfade. It is also inconvenient for them to manage programmatically if a separate program for conducting broadcasting is developed
In the process of playing the track, he goes to the conference at full volume, which makes it difficult to communicate in it (without broadcasting). By the end of the broadcast, I figured out how to fix it, but it was too late. We need to add one more stereo input, “loud”, which will be broadcast only on air and only for the duration of the track; the existing one will be declared “silent” and will be displayed constantly and everywhere (in the conference, on the radio and in the monitor). JACK is a sample-accurate server, so you can not be afraid of phase comb effects from adding the same signals.
The loss of the Internet from the operator is the most noticeable of the problems. After the broadcast, a solution was invented and tested how to nevertheless correctly adapt the fallback logic so that it works in an appropriate way
When stopping darkice, jackd fell for some reason. This is very unpleasant, because in this case you have to restart all jack applications and configure the switching again. And darkice stopped when disconnected from the icecast on the Internet. To prevent the Internet from crashing the entire system, a local icecast was raised, and the main one was configured as a relay from the local

However, in general, we coped with the task. Both the organizers and the audience liked it, and they asked to repeat the broadcast and even make such broadcast radio bridges regular.

Thanks for attention!

Tags: