A small investigation: how YouTube uses WebRTC for streaming

Original author: Chad Hart
  • Transfer

WebRTC is the JavaScript API in modern browsers for video calls. And also for voice calls, screen sharing, breaking through NAT, revealing a local address and other interesting things. In the past couple of years, major players have begun to switch from proprietary APIs and browser extensions to WebRTC: it works with Skype for Web, partially Hangouts, and now YouTube’s broadcast capabilities directly from the browser. So far, only in chrome and with a five-second delay - but the great trouble is the beginning. Under the cut, we offer a translation of the detective story adapted for Habr, where WebRTC experts parse the client part of YouTube and tell us what and how the developers from Google did it.

Last Thursday. When logging into your YouTube account, I found a new camera icon with the “Go Live” prompt in the upper right corner (translator's note: it seems that not all users have been rolled out yet. YouTube Red subscribers were noted in the comments, they have one). Naturally, I immediately clicked on it, and it seems that now we can stream directly from the browser . It smelled of WebRTC, so I habitually opened chrome: // webrtc-internals / - and yes, it was WebRTC. As developers, we were always interested in the large-scale use of technology, so I immediately contacted the master-reverse engineer Philip “Phip” Hankel and asked him to delve into the insides of YouTube. Further we can get acquainted with the results of his work.


Chrome’s homepage , webrtc-internals , served us well back in 2014, when we studied how Hangouts works , and nothing prevented us from using it again. Since the new registration on YouTube is not available for broadcasts within 24 hours, we used the dump , kindly provided by Tsakhi Levent-Levy (translator's note: yes, the same Tsakhi who spoke with us on Intercom and whom we regularly translate). You can use this tool to upload a dump to your Chrome and see what is happening through the eyes of WebRTC.

Judging by what we saw, the new YouTube feature uses WebRTC only on the client side to capture the video stream. And from the server side they have something of their own. What does it mean? So not realtime. Although our old and good friend Chris Cranky says that the delay is less than five seconds . We really expect him to pull out some interesting technical details.

In the meantime, we delve into the technical details that we could pull out ...

GetUserMedia calls


After importing the dump, at the very beginning we see the getUserMedia JavaScript API calls made by YouTube. From the calls you can see that the service modestly wants a camera in 1080p resolution:


They also make a separate call to getUserMedia to get the microphone.

In this screenshot, the very first call to getUserMedia is not visible , which immediately requests a camera and microphone so that the user sees only one browser confirmation window instead of two.

RTCPeerConnection Calls


Having examined the getUserMedia calls , you can proceed to the RTCPeerConnection calls . If you want to learn more about WebRTC, I recommend reading the results of the previous study “ How Hangouts Works ” or more general information about webrtc-internals on our TestRTC blog .



ICE Servers


The log shows that the RTCPeerConnection object was created with empty lists of ICE servers (translator's note: it is not surprising that this only works in Chrome so far. Hedgehog would not have allowed such an object to be created).

{
  iceServers: [],
  iceTransportPolicy: all, 
  bundlePolicy: balanced,
  rtcpMuxPolicy: require,
  iceCandidatePoolSize: 0
}

Further, it will be clear why TURN servers are not needed for this use case (translator's note: ICE is a “framework”, text instruction on how to do peer-to-peer with sad IP addresses 192.168 ..., TURN servers in the framework are not the most important The most important thing is the STUN servers that answer the fundamental question “what is my external IP address?” Without specifying at least one STUN server, most WebRTC implementations simply will not work).

Next, the client adds MediaStream using the addSteam API . Funny that this API is declared depricated. It is strange that the authors do not use the new addTrack API , which is available starting from the 64th version of Google Chrome, and in older versions - using the adapter.js polyfile

Alarm and setLocalDescription


After creating the RTCPeerConnection object , the client creates a WebRTC “offer” with a list of all audio and video codecs available to Chrome. An offer without modifications is set as a description of the local endpoint using setLocalDescription . By the way, the absence of modifications means that simulcast (simultaneous broadcasting of several streams with different video quality, allows not to transcode everything on the server, reduces delays and load) is not used.

In accordance with the logic of WebRTC, after calling setLocalDescriptionchrome offers several "candidates" - options for how a remote computer can try to connect to a local one. Most likely they are not used, as the client (Chrome) will connect to the server (YouTube backend).



Update : Finding the signaling server and the protocol used was not very difficult. The filter by the keyword "realtimemediaservice" of the Chrome network log shows us an HTTP request and a response to it. No complicated schemes, trickle-ice optimizations of connection speed and other magic, everything is as simple as possible.

setRemoteDescription


The next step is to call setRemoteDescription based on the information received from the server. Where, as we recall, WebRTC is not used. And here everything becomes interesting! The SDP used in setRemoteDescription looks like it was made on the other side by Chrome or a WebRTC library with a complete list of codecs at the ready. And we know for sure that YouTube does not use ice-lite, as Hangouts does .

In the package received from the SDP server, the H.264 codec is indicated as preferred (number 102 , see here , if you are interested in how SDP text packets are arranged):

m=video 9 UDP/TLS/RTP/SAVPF 102 96 97 98 99 123 108 109 124


The study of statistics (partially displayed after loading the dump) confirms that the H.264 codec is used, anyone who is curious can look in the dump for the keyword "send-googCodecName".

In addition to the SDP response, the server passes to Chrom several candidates for establishing a connection:

a=candidate:3757856892 1 udp 2113939711 2a00:1450:400c:c06::7f 19305
    typ host generation 0 network-cost 50
a=candidate:1687053168 1 tcp 2113939711 2a00:1450:400c:c06::7f 19305
    typ host tcptype passive generation 0 network-cost 50
a=candidate:1545990220 1 ssltcp 2113939711 2a00:1450:400c:c06::7f 443
    typ host generation 0 network-cost 50
a=candidate:4158478555 1 udp 2113937151 66.102.1.127 19305
    typ host generation 0 network-cost 50
a=candidate:1286562775 1 tcp 2113937151 66.102.1.127 19305
    typ host tcptype passive generation 0 network-cost 50
a=candidate:3430656991 1 ssltcp 2113937151 66.102.1.127 443
    typ host generation 0 network-cost 50

We can observe IPv4 and IPv6 UDP candidates, “ICE-TCP” candidates (yes, in the event of a drought WebRTC can go over TCP, although it doesn’t like to do that) and proprietary for Chrome “SSL-TCP”, which we saw earlier in Hangouts . In this situation, the TURN server does not improve the chances of establishing a connection, since in both cases it will be connecting Chrome to a real IP address. Apparently, therefore, the TURN server is not used.

Codecs


There is no simulcast. Which, in general, is expected: in Chrome there is no H264-simulcast codec. But there is a bug report with a sad lack of feedback . In general, H.264 is a reasonable choice: the encoding side can use a video card to facilitate the process, and most players can play this format without transcoding.

Nevertheless, you won’t be able to do without transcoding, since without a simulcast the server will have to create streams with a lower bitrate and resolution for “weak” clients. Most likely, YouTube already has the transcoding function, as part of the infrastructure that they have been using for streaming for a long time.

WebRTC Statistics


Statistics per se reveals nothing new. The most interesting graph is “picture loss indications”, PLI is the data that the server sends (from the translator: WebRTC statistics are interesting because both local statistics and remote statistics are collected at each end of the connection. We wrote about this last week):

image


pliCount increases every 10 seconds and, accordingly, every 10 seconds, the client sends a keyframe to the server. Perhaps this is done to make it easier for YouTube servers to record or transcode videos.

Total


YouTube uses WebRTC as a user-friendly way to get a video stream from a camera for streaming. Most likely this will not affect professional streamers with expensive tuned rigs, but the entry barrier for beginners will lower it significantly.

Unfortunately, the feature does not work in Firefox. This is one example of Google launching solutions that work only in Chrome. Nils Ochlmeyer of Mozilla tried to get it to work by faking the user agent, but ran into using the deprecated registerElement API in JavaScript . Nevertheless, from the point of view of WebRTC, everything should work, so we will return to this issue after fixing the front-end bugs.

UpdateUnfortunately, an additional study showed that the JavaScript code for this feature also uses the webkitRTCPeerConnection API legacy instead of the modern RTCPeerConection . We look forward to when the prefix is removed in Chrome .

Also popular now: