How to write your own VoIP application in the background under Windows Phone

    In this article, I would like to talk about how to write my simple VoIP application with a backend and work in the background on the Windows Phone 8 platform with a minimum of effort.

    Prior to the release of Windows Phone 8, users of voip applications were very disappointed with the work in the background, which, in fact, was practically absent - the maximum that the developers could do to show the user an incoming call while the application is in the background is to show a notification notification, which is barely noticeable heard and quickly disappears. On the one hand, this did not allow the battery to be eaten as if the application worked fully in the background, but on the other hand, it made it an unhelpful tool. Prior to the release of WP8, Microsoft fueled public interest in the new version of the platform with promises to integrate Skype into the operating system and work in the background. Well, they fulfilled their promises - now it has become possible:

    • initiate a Skype call through the phone’s contact book
    • continue talking on Skype even if you intentionally or accidentally minimize the application (earlier if you accidentally hit the search button during a conversation - the conversation ended)
    • and the most interesting: to receive incoming calls with the interface a la normal gsm-call in conditions when Skype is not running (not in foreground) and moreover, it does nothing in the background (does not eat the battery)

    Microsoft did not make this exclusive opportunity (except integration into the contact book) for its product and opened an API, which allows third-party developers to implement the same scenarios without being a privileged partner (as was the case in WP7 with native sdk). And although it’s not possible to integrate into the contact book as well, you can use ContactStore and Protocol handlers to change the URL field in the contact and open the application by click).

    At the end of the article, the source codes of two projects are attached: one of them is an example of Microsoft Chatterbox, which explains how background processes work with back-end simulation with incoming calls and even video; the second is my project with a simple backend that allows you to communicate via voip on two devices and uses voip push notifications, but first things first.



    VoIP application architecture with work in background


    If you set out to write a full-fledged voip application, then unfortunately (or fortunately) you cannot do without a native component in C ++ (because the normal API for working with audio devices is not available from the managed part) In short, voip is an application, which can work in the background should consist of two processes:

    • Foreground is actually a normal process in which the application interface “runs”
    • Background is the second process, which essentially consists of four agents:
      • VoipHttpIncomingCallTask - starts when an incoming call arrives via push channel (a special type of push notification will be described below).
      • VoipForegroundLifetimeAgent - starts when our application becomes active and works until the application is minimized or closed.
      • VoipCallInProgressAgent - Runs on a call signaling that the process has been allocated more processor resources to support the call. Thus, (de) video and audio encoding must begin after this event.
      • VoipKeepAliveTask - runs periodically every 6 hours. In fact, it is needed in order to periodically remind your server that the application is still installed on the phone
    • Out-of-process is an interprocess component designed to solve the problem of communication between the first two. This is actually the same second process.

    Graphically, it looks like this:

    image

    How to write your own VoIP application?


    Let's start in order:

    1. Transport


    First, let's figure out the level of transport of our data. Of course, this is a very simple example that I built in a day, so there won't be any mega-crazy pieces here - you yourself understand: writing transport, recording and playing audio is not enough - you also need this to work quickly without delays even on weak communication channels - but this topic of not a single book. And so, for transport we will use a very convenient class from the new API - DatagramSocket (it is simple and works on UDP which is more logical than audio / video streams (we don’t have to wait for confirmation of delivery of each audio packet, right?). Thanks to async \ await with him is very simple:

        const string host = "192.168.1.12";
        const string port = "12398";
        var socket = new DatagramSocket();
        socket.MessageReceived += (s, e) =>
            {
                //читаем входящие данные в строку
                var reader = e.GetDataReader();
                string message = reader.ReadString(reader.UnconsumedBufferLength);                     
            };
        await socket.BindServiceNameAsync(host);
        var stream = await socket.GetOutputStreamAsync(new HostName(host), port);
        var dataWriter = new DataWriter(stream);
        //отправляем строку на удаленный хост
        dataWriter.WriteString("Hello!");
        await dataWriter.StoreAsync(); //передает данные в системный буфер ОС для отправки
    

    I'm so used to async \ await that I also used the same class for the server side (see here for how to use the WinRT API in the desktop endpoint ). The protocol is also very simple: COMMAND! BODY - enough for our example.

    2. Voice recording


    In the Managed part for recording data from a microphone, there are two classes:

    • XNA Microphone
    • AudioVideoCaptureDevice

    In our example, we will use the first one (it is still available with WP7) since I personally could not figure out how to play audio from the second without using the native api), but, of course, you will have to use the second method to implement a serious voip application (StartRecordingToSinkAsync, which gives a clean uncompressed stream of data from the microphone). And so, recording data from the microphone is organized in just a couple of lines:

        _microphone = Microphone.Default;
        _microphone.BufferDuration = TimeSpan.FromMilliseconds(500);
        _microphoneBuffer = new byte[_microphone.GetSampleSizeInBytes(_microphone.BufferDuration)];
        _microphone.BufferReady += (s, e) =>
            {
                _microphone.GetData(_microphoneBuffer);
                //отправка байтов на сервер
            };
        _microphone.Start();
    

    3. Play audio


    In our example, we will use very non-optimal, but working and small code:

        _soundEffect = new SoundEffect(e.Data, _microphone.SampleRate, AudioChannels.Mono);
        _soundEffect.Play();
    

    Unfortunately, there are no alternatives and through the managed part there is no way to play audio on the speaker for calls, but only on the speaker, therefore, the appearance of echo and other noise (this is a simple example).

    4. VoIP push notifications


    The killer feature of our example will be that if you install this application on two devices, you can make calls through the application to another device without having to be in the foreground of the application on that device. First you need to register a Push URI for both devices on the server along with some user ID (in Skype, this is an arbitrary name, in Viber - the user's phone number). Then, when device A wants to call device B - it sends a command to the server, the server finds a push uri for device B and sends to MPNS xml with some data about the caller with the prerequisite that the header contains the request X-NotificationClass = 4. There were only three classes of Push notifications before WP8
    • Tile
    • Raw
    • Toast

    but as you can see, with WP8 a new fourth class has been added - VoIP. MPNS through its channels sends this packet to the client and raises the ScheduledTaskAgent specially launched for this purpose . If this agent works correctly, the user will be presented with an incoming call screen (similar to a regular GSM call). So what should ScheduledTaskAgent do ?

        var incomingCallTask = task as VoipHttpIncomingCallTask;
        if (incomingCallTask != null)
        {
            //десериализуем XML с номером и именем звонящего
            Notification pushNotification;
            using (var ms = new MemoryStream(incomingCallTask.MessageBody))
            {
                var xs = new XmlSerializer(typeof(Notification));
                pushNotification = (Notification)xs.Deserialize(ms);
            }
            VoipPhoneCall callObj;
            var callCoordinator = VoipCallCoordinator.GetDefault();
            //запрос на отображения gsm-call-like интерфейса
            callCoordinator.RequestNewIncomingCall("/MainPage.xaml?incomingCall=" + pushNotification.Number,
                pushNotification.Name, pushNotification.Number, new Uri(defaultContactImageUri), "Voip.Client.Phone", 
                new Uri(appLogoUri), "Я VoIP-push!", new Uri(logoUrl), VoipCallMedia.Audio, 
                TimeSpan.FromMinutes(5), out callObj);
            callObj.AnswerRequested += (s, e) =>
                {
                    s.NotifyCallActive(); //запустит наше приложение
                    //далее небольшой воркэрунд для примера:
                    //как я писал выше, у нас нет возможности проигрывать
                    //аудио на внутреннем динамике используя managed code, а NotifyCallActive включает
                    //именно его без возможности проигрывать звуки на внешнем,
                    //так что таким способом мы отключаем внутренний, и включаем внешний
                    await Task.Delay(3000);
                    s.NotifyCallEnded();                    
                };
            callObj.RejectRequested += (s, e) => s.NotifyCallEnded();
        }
    

    It is worth noting that VoIP push, unlike all other types, can fly both in an open application and if it is closed - Skype only accepts incoming calls through push, even if it is currently in foreground - actually a controversial decision, t .to. voip push sometimes slows down. Alas, in our example, we won’t be able to raise the conversation if the voip push arrives when the application is launched - we do not have a native interprocess component in our example to inform the main process about this (and yes, OnNavigatedTo, From will not work when the incoming call UI appears, although it will be possible to call the Obscured event on the frame, but we won’t be able to get the caller’s number) - therefore, in my example, the host should exit the application to correctly pick up the conversation.

    Conclusion


    All this was enough to write a simple VoIP application in a day. Alas, it can only talk through the speaker, it can not turn off the screen when brought to the ear (proximity sensor) and continue talking if the application is minimized - all this requires a native component, which is described in detail in the Microsoft Chatterbox example - my example is simpler, but with the server side. Initially, I only wanted to talk about VoIP pushing, but it turned out a little more. Of course, for the implementation of full-fledged VoIP applications, it is better to look towards the rapidly developing WebRTC, which, by the way, is already officially working in chrome in Android, but, hopefully, my example will be useful to someone.

    Sources:

    Also popular now: