Video calls from browser to SIP

    In a previous article, I highlighted the issue of available ways to organize voice communication in a browser. This time the task will be more complicated: we want to make video calls from a browser to a remote subscriber sitting at a softphone or device that supports SIP. It may be necessary, for example, here's why:
    • we want to make an online consultation system for online stores, which will allow website visitors to conduct a video conversation with a consultant sitting at the usual messenger.
    • We want to complement the Polycom-based teleconferencing system with the ability to connect participants who have nothing except a browser.


    Technology


    I will not completely repeat all the calculations from the previous article, but proceed straight to the conclusions. If we want to:
    • all desktop browsers supported
    • no need to install additional software
    • the system was resistant to network interference
    • delays were minimal
    then at the moment we have no choice but to make decisions based on Adobe Flash Player and the RTMFP protocol , no matter how sad this may sound. A bright future is just around the corner: Google promised to soon include support for the very interesting WebRTC technology in Chrome , which I will write about in a separate article. In the meantime, we use what users already have.

    Video support in Adobe Flash Player


    Flash is currently able to play streams compressed by several codecs:
    • H264
    • Sorenson Spark H263
    • On2 vp6
    • Flash screen video
    Of all this “wealth,” we are only interested in H264, because it’s almost impossible to find support for the other options in softphones and SIP devices.

    With the capture and encoding of video from the camera, everything is much worse. Support for H264 coding appeared only in the recently released version of FP11, and before that, the only option was Sorenson Spark. The 11th version, unfortunately, has not yet been installed by the vast majority of users, so you have to reckon with those who have only FP10.



    We must also not forget with whom we are dealing. Adobe managed to “break” the playback of certain types of H264 streams in Flash Player versions 11.0 - 11.2. The problem is the playback of streams packetized in packetization-mode: 0, namely this mode is used by most softphones. Details about the bug can be found in the bugtracker of the company.

    The result is the following picture. To successfully connect to the SIP client via H264, we need:
    • transcode Sorenson -> H264 in one direction if the user has an FP version lower than 11
    • perform transcoding H264 -> H264 (to change the packetization) to one side, if the user FP 11 with the aforementioned bug
    • allow traffic as is, in all other cases.
    A combination of ffmpeg and libx264 is well suited for transcoding . For the performance of transcoding, it is extremely important that the server supports MMX, SSE and similar technologies as late as possible. Video codecs can use them, accelerating at the same time at times.

    Video is not a voice

    At first glance, it might seem that the only difference between video and voice transmission is the width of the channel used. This is certainly true, but there are a number of significant differences.

    The audio stream is usually divided into frames of 10-20 ms, each of which is encoded and decoded separately from the rest. For video, this would be too wasteful, therefore, for most frames, not the image itself is encoded, but its difference from the previous frame. For even better compression, the difference is taken with a slightly “shifted” previous frame to compensate for the movement of objects. In general, you can write separate series of articles about video compression, and I will not dwell on this.

    Another thing is important. If we lost one frame of the audio stream, then we can just mask it, for example, losing the previous frame again, and few will notice it. But such a trick will not work in the video, because subsequent frames must be superimposed on the lost one. From here artifacts appear that themselves will not disappear, unless you ask the remote side to send an independently compressed key frame. In SIP, this can be done in two ways: at the signaling level through SIP INFO, and at the media level through RTCP.

    Further, it is necessary to take into account the restriction on the MTU of the channel between the participants in the conversation, which is usually approximately 1500 bytes (you can not rely on IP fragmentation in the case of NAT). Any audio frame will fit into such a limitation, but the video frame is most often not. Hence the need for breaking frames into pieces, which is called packetization, with which a bug in some versions of Flash Player is associated.

    Result


    As a result, if you carefully go through all the rakes and spread the necessary amount of hay everywhere, you can get a completely working solution. We managed to integrate support for video calls from the browser into our cloud platform RTCKit , which in turn allows you to embed this functionality in any web service in a matter of hours, saving a lot of time. You can test all this without registering on our test page . The resolution for the video there is limited to 352x288. We tested the Jitsi and LinPhone softphones

    RTCKit

    , it would be interesting to hear reviews about other clients with H264 support. We will try to withstand the load from the habr effect! An important note: if you call through RTCKit from browser to browser, and at the same time you have quite friendly NAT, then instead of everything described, RTMFP Peer-2-Peer technology is used.

    In future articles, we will cover the topic of voice and video conferences, call routing, and interaction with mobile devices. Stay tuned.

    Also popular now: