Experience using WebRTC. Yandex lecture

    What is better to use when developing software - native or web technologies? Holivar about this will not end soon, but few people will argue that the native functions are useful to duplicate for use in browsers or WebView. And if once the application for calls existed only separately from the browser, now they are easy to implement and on the web. Developer Grigory Kuznetsov explained how to use WebRTC technology for P2P connections.


    - As you all know, quite a few applications have recently been appearing, which are based on direct data exchange between two browsers, that is, P2P. These are all kinds of messengers, chat rooms, dialers, video conferencing. It can also be applications that perform some kind of distributed computing. The limits of fantasy are not limited.

    How do we make such a technology? Imagine that we want to make a call from one browser to another. And let's fantasize what steps we need to achieve this goal. First of all, it seems that the call is our picture, our voice, the image, and you need to get access to media devices connected to the computer: to the camera and to the microphone. After you get access, you need your two browsers, two clients, each other to find. We need to help them somehow connect, reach out, pass on meta-information.

    When you reach out, you need to start transferring data in P2P mode, that is, to ensure the transmission of media streams. We have all the necessary items, we are ready to realize our cool new bike. But this is a joke, we are engineers and we understand that this is expensive, unjustified and risky. Therefore, as classic engineers, let's first think about what solutions already exist.

    First of all - the old dying technology Adobe Flash. She’s really dying, and Adobe will stop supporting her by 2020. The technology will really allow you to access your media devices, inside it you can implement all the necessary mechanics to help browsers connect, so that they start transmitting P2P information, but you will reinvent your bike again, because there is no single standard, unified approach to implementing this method data transmission.

    You can write a plugin for your browser. This is how Skype works for browsers that do not support more advanced technologies. You will have to sell your bike, because there is no single standard, and this is bad for users, because the user will have to install a plugin in the browser, perform additional actions. Users do not like and do not want to do this.

    And there is WebRTC technology - Google Hangouts, Facebook Messenger work with it. Voximplant uses it so you can make your calls. Let's dwell on it in more detail. This is a new developing technology, it appeared in 2011 and continues to evolve. What does she allow to do? Get access to the camera and microphone. Establish a P2P connection between two computers and two browsers. Naturally, it allows you to transfer media streams in real time. In addition, it allows you to transfer information, that is, any binary date you can also transfer P2P, you can make your distributed computing system.

    Important point: WebRTC does not provide browsers with a way to find each other. We can form all the necessary meta-information about us loved ones, but how can one browser know about the existence of another? How to connect them? Consider an example.



    There are two customers. The first client wishes to make a call to the second client. WebRTC gives all the necessary information to identify themselves. But the question remains, how can one browser find another, how to send this meta information, how to initialize the call. This is given to developers, we can use absolutely any method, take this meta-information, print it on paper, send it by courier, another one will use it, and everything will work.

    And we can come up with some kind of signaling mechanism. In this case, it is a third-party mechanism that will allow us, if we know about our customers, to ensure the transfer between them of some information that is necessary to establish a connection.

    Consider an example using a signal server. There is a signaling server that keeps a constant connection with our clients, for example, over web sockets or using HTTP. The first client generates meta-information and sends it to the signaling server using web sockets or HTTP. It also sends some part of the information with whom it is he who wants to connect, for example, a nickname or some other information.

    The signal server by this identifier determines which client exactly needs to redirect our meta information, and forwards it. The second client takes it, uses it, sets it up for himself, forms the answer, and sends it to the signaling server using the signaling mechanism, which in turn relays it to the first client. Thus, both clients currently have all the necessary date and meta information to establish a P2P connection. Is done.

    Let's take a closer look at what exactly the clients exchange, they exchange the SDP datagram, Session Description Protocol.



    It is, in fact, a text file that contains all the necessary information to establish a connection. There is information about the IP-address, the ports that are used, what kind of information is chased between clients, what it is - audio, video, what codecs are used. Everything we need is there.

    Pay attention to the second line. There is the client's IP address, 192.168.0.15. Obviously, this is the IP address of a computer that is located on a local network. If we have two computers, each of which is on the local network, each of which knows its IP address within this network, they want to call. Will they be able to do this with such a datagram? Obviously not, they do not know the external IP addresses. How to be?



    Let's step aside and see how NAT works. On the Internet, many computers are hidden behind routers. There are local networks within which computers know their addresses, there is a router that has an external IP address, and outside all these computers stick with the IP address of this router. When a packet from a computer on the local network goes to the router, the router looks to where it needs to be redirected. If it is on another local network, then it simply relays it, and if you need to send it outside, to the Internet, then a routing table is created.



    We fill in the internal IP address of the computer that wishes to forward the packet, its port, set the external IP address, the IP address of the router, and do the port substitution as well. What is it for? Imagine that two computers are accessing the same resource, and we need to correctly route the response packets. We will identify them by port, the port will be unique for each of the computers, while the external IP address will be the same.

    How to live if there is NAT, if computers stick out under one IP-address, and inside they know about each other through others?

    ICE - Internet Connectivity Establishment comes to the rescue. It describes how to bypass NAT, how to establish a connection if we have NAT.

    This framework uses the attribution of a so-called STUN server.



    This is such a special server, referring to which, you can find out your external IP-address. Thus, in the process of establishing P2P connections, each client must make an inquiry to this STUN server in order to find out its IP address, and form additional information, IceCandidate, and by means of a signaling mechanism also this IceCandidate to exchange. Then customers will know each other with the correct IP addresses, and will be able to establish a P2P connection.

    However, there are more complex cases. For example, when a computer is hidden behind a double NAT. In this case, the ICE framework prescribes the use of a TURN server.



    This is such a special server that turns the client-client connection, P2P, into a client-server-client connection, that is, it acts as a repeater. The good news for developers is that regardless of which of the three scenarios the connection was set up for, whether we are on the local network, whether you need to contact a STUN or TURN server, the API technology will be identical for us. We simply specify the configuration of ICE and TURN servers at the beginning, indicate how to access them, and after that the technology does everything for us under the hood.



    A brief summary. To establish a connection, you need to select and implement some kind of signaling mechanism, a certain intermediary, that will help us send meta information. WebRTC will give us all the necessary meta for this.

    We have to fight with NAT, this is our main enemy at this stage. But to get around it, we use the STUN server to find out our external IP address, and we use the TURN server as a repeater.

    What exactly are we transmitting? About media streams.



    Media streams are channels that contain tracks within themselves. Tracks within the media stream are synchronized. Audio and video will not diverge, they will come with a single timing. You can make any number of tracks inside the media stream, the tracks can be managed separately, for example, you can mute the audio, leaving only the picture. You can also transfer any number of media streams, which allows you, for example, to implement a conference.

    How to access media from a browser? Let's talk about the API.



    There is a getUserMedia method that accepts a set of constraints as input. This is a special object where you specify which particular devices you want to access, which particular camera, which microphone. Specify the characteristics you want to have, exactly what resolution, and there are also two arguments - successCallback and errorCallback, which is called in case of success or failure. In more modern implementations of technology, promises are used.

    There is also a convenient method enumerateDevices, which returns a list of all media devices connected to your computer, which gives you the opportunity to show them to the user, draw some kind of selector so that the user can choose which particular camera he wants to use.



    The central object in the API is the RTCPeerConnection. When we perform a connection, we take the class RTCPeerConnection, which returns a peerConnection object. As a configuration, we specify a set of ICE servers, that is, STUN and TURN servers, which we will contact during the installation process. And there is an important onicecandidate event that triggers every time we need the help of our alarm mechanism. That is, the WebRTC technology made a request, for example, to the STUN server, we learned our external IP address, a new formed ICECandidate appeared, and we need to send it using a third-party mechanism, the event was triggered.



    When we establish a connection and want to initialize the call, we use the createOffer () method to form the initial SDP, offer SDP, the same meta information that needs to be sent to the partner.

    To set it in PeerConnection, we use the setLocalDescription () method. The interlocutor receives this information using the signaling mechanism, sets it with the help of the setRemoteDescription () method and generates a response using the createAnswer () method, which is also sent to the first client using the signaling mechanism.



    When we got access to the media, got the media stream, we transmit it to our P2P connection using the addStream method, and our interlocutor finds out about it, the onaddstream event is cleared from it. He will receive our stream and be able to display it.



    You can also work with data streams. Very similar to the formation of the usual peerConnection, just specify RtpDataChannels: true and call the createDataChannel () method. I will not dwell on this in detail, because this kind of work is very similar to working with web sockets.

    A few words about security. WebRTC works only on HTTPS, your site must be signed with a certificate. Media streams are also encrypted, using DTLS. The technology does not require installing anything extra, no plug-ins, and this is good. And it will not be possible to make a spy application, the site will not eavesdrop or spy on the user, he will show the user a special promt, request access from him and get it only if the user allows access to the audio and media devices.



    As for browser support, IE remains and will remain red. At the end of last year, support for Safari was added, that is, all modern browsers already know how to work with this technology and we can safely use it.

    I want to share a set of all kinds of utilities that will help you if you want to work with WebRTC. First of all it is adapter . Technologies are evolving all the time, and there is a difference in browser APIs. The adapter library eliminates this difference and makes work easier. Convenient library for working with data streams - Peerjs . You can also look at the open implementations of the STUN and TURN servers . A large set of tutorials, examples, articles is on the page awesome-webrtc, highly recommend.

    Last useful when debag utility - webrtc-internals. During development, you can type a special command in the address bar - for example, in Chrome browser, this is Chrome: // webrtc-internals. You will see a page with all the information about your current WebRTC connection. There will be both call sequences in the methods, and all the datagrams exchanged between browsers and graphs that somehow characterize your connection. In general, there will be all the information that will be needed during debugging and development. Thanks for attention.

    Also popular now: