DevConf: how VKontakte went to its platform for live broadcasts
DevConf 2018 will be held May 18 in Moscow at Digital October. And we continue to publish reports from last year's conference. Next in turn is a report by Alexey Akulovich from VKontakte, where he will tell about what attentive readers have already guessed by sabzh.
In 2015, we used a third-party solution. We embed his player on a site like youtube and it worked. It didn’t work perfectly, but at that moment it suited us in terms of the volume of broadcasts that could be launched through it, as well as in the quality and delays that it gave. But we grew out of it pretty quickly.
The first reason is delays. When viewers write questions in a chat, and the broadcast delay is 40 seconds, then in most cases this is unacceptable. The main delay occurred in the transmission of the signal between the streamer computers and the viewer (a useful hub-post on this subject) The main streaming video transmission protocols: RTMP, RTSP, HLS, DASH. The first two are protocols without repeated requests, i.e. we connect to the server and it just injects the stream data. The delay is minimal, it can be less than a second, i.e. this is all good.
HLS and DASH are HTTP-based protocols that make a new request for each piece of data. And that creates problems. When we started playing the first fragment, we need to immediately request the second, so that when finished with the first, the second should be downloaded and parsed. This is necessary to ensure continuous playback. Thus, the minimum delay for these protocols is two fragments. One piece is about a few seconds. Therefore, to achieve an acceptable delay with these formats does not work.
We have two options for using the stream. The first is through our applications, when we have full control over resolution, codecs, etc. And we can give this signal to the audience without processing. The second option, when we give the opportunity to stream with anything, any software. Of course, in this case we have no control over codecs, resolution, quality. We receive this input signal and must give it to the audience in good quality. For example, if one player streams a game in 4K quality, and the viewer tries to watch on the phone in 3G and does not see anything, then we will be to blame. Therefore, a third-party signal must be processed by us and given to the viewer in the desired resolution.
Based on the above, we came to the following protocols:
At that time, we were aware of such decisions.
Red5 was considered because the team knew about him, but from a bad side. As a result, they did not even test it.
Erlyvideo. Domestic development. Quite popular, developers spoke at conferences. But they were not at all interested in cooperation. They said: download, sort it out. We wanted to run everything as quickly as possible, so we decided not to mess with. Left for later, if nothing better is found.
Wowza at that time was used in a friendly project and had the opportunity to ask. And we took her for tests.
Pros: she really knows how much.
Minuses:
Quite a cons:
Another domestic development. She was much more friendly. All the features we needed were either ready or planned for the near future. It was written in C ++ and could theoretically run even on a Raspberry Pi, i.e. in terms of consumption was a cut above java. They also had a library for implementing the player on mobile devices (iphone, android).
What does it look like?
Or setting transcoding and signal processing. You can drag and drop components directly with the mouse and everything is set up visually. Without restarting.
So, the pros:
Cons:
When we stream in HLS, the player calls via HTTP to the manifest file with a list of qualities. The peculiarity is that Nimble generates this file on the fly, based on what request it received. If they came via https, then it gives https links. If on http, then http. With us, he stood not directly for the audience, but for nginx, and the problem is that the viewer comes via https, and he gives an http link and the player cannot play. The only solution is a sub-filter at the nginx level, which changes the address of the links. Crutch.
The second minus is that for management through the API, requests are sent not to the service itself, but to a third-party Nimble service. At the same time, access to it is via whitelist IP without subnets, and we wanted to go there from a subnet of about 128 IP. The form in the panel allows you to enter one IP. I had to make proxies for this API. God knows what the problem is, but it is.
There is still a problem with the asynchronous Nimble API. The binary server synchronizes with the scheduled API. Those. if we add a new streamer, it creates a stream, quality settings, but they will be updated only after 30 seconds.
The streamer gives us an RTMP stream. And we have to give it to the viewer in RTMP and HLS. We put inbound traffic machines that route it to a specific work machine.
We did this so that we can perform operations with these servers, such as updating software or restarting. We remembered that nginx has an rtmp module that allows you to route rtmp traffic. And we set it as an incoming node.
Thus, we closed all our internal kitchen from streamers. Traffic goes to nginx, and then he proxies it where necessary. At the level of the module itself, it is possible to rewrite the rtmp link and redirect the stream there. This is an example from the documentation (without forums and xml!):
He goes to the address on_publish, he rewrites the address to a new one and the stream goes to this address. Several incoming servers sit on the same IP and at the balancer level the traffic is distributed across them.
With the distribution of the same. We wanted to hide the insides from the audience. To prevent viewers from going directly to the machine that processes the video. By analogy with incoming, we have distribution servers. Nginx is also used there. For RTMP, the same rtmp module is used. And for HLS, proxy_cache with tmpfs is used to store m3u8 and HLS fragments.
In the summer of 2016, International (Dota 2 tournament) came to us. And we realized that our circuit ... is bad :)
We had several distributing cars and the audience came to them more or less evenly. It turned out that the same traffic went from the working machine to many distributors and we quickly ran into the outgoing network on the working machine. To get started, we just made an extra layer of caching servers.
They did it in emergency mode. Live is coming. The network is not enough. Just added machines that reduced outgoing traffic from working machines. It was a half-decision, but we at least began to drag the broadcast of DotA. The second solution is the consistent delivery of machines. We do not just give the stream to a random distribution server, but try to give it from one machine while it is coping.
This made it possible to offload work machines and distribution servers on traffic. In order to direct the user to the necessary machines, we put a daemon on each distributing server that polls all the machines in its layer. If at the given moment the given machine is overloaded, but the daemon says to nginx: redirect traffic there now.
To be able to redirect the user, we made the so-called rtmp redirect. The link leads to https. If the machine is not loaded, then it will give a redirect to rtmp. Otherwise, to another https. And the player knows when he can play, and when he needs to redirect.
The final scheme came out like that. The streamer streams to one of the incoming machines, which are the same for it, since they are behind the balancer on the same IP. The incoming server selects a working machine that stores and processes the stream, and also shows it to viewers through two layers of the distribution servers.
At the same time, not all working machines are on Nimble. Where we do not need transcoding, we use the same nginx with the rtmp module.
At the moment, about 200 thousand streams begin every day (at the peak of 480 thousand). About 9-14 million viewers every day (at the peak of 22 million). Each stream is recorded, transcoded and available as a video.
In the near future (which has probably already arrived, last year’s report), it is planned to expand to a million viewers, 3 Tb / s. Switch to SSD completely, since working machines run into a disk very quickly. We will probably replace Nimble and nginx with our bike, since there are still disadvantages that I have not mentioned.
How does a conference attendance differ from viewing / reading a report? The fact that at the conference you can approach Alexei (and not only him!) And find out the specific details that interest you. Communicate, share experiences. As a rule, reports only set the direction for interesting conversations.
Come listen to reports and chat. Readers of Habr registration at a discount .
In 2015, we used a third-party solution. We embed his player on a site like youtube and it worked. It didn’t work perfectly, but at that moment it suited us in terms of the volume of broadcasts that could be launched through it, as well as in the quality and delays that it gave. But we grew out of it pretty quickly.
The first reason is delays. When viewers write questions in a chat, and the broadcast delay is 40 seconds, then in most cases this is unacceptable. The main delay occurred in the transmission of the signal between the streamer computers and the viewer (a useful hub-post on this subject) The main streaming video transmission protocols: RTMP, RTSP, HLS, DASH. The first two are protocols without repeated requests, i.e. we connect to the server and it just injects the stream data. The delay is minimal, it can be less than a second, i.e. this is all good.
HLS and DASH are HTTP-based protocols that make a new request for each piece of data. And that creates problems. When we started playing the first fragment, we need to immediately request the second, so that when finished with the first, the second should be downloaded and parsed. This is necessary to ensure continuous playback. Thus, the minimum delay for these protocols is two fragments. One piece is about a few seconds. Therefore, to achieve an acceptable delay with these formats does not work.
We have two options for using the stream. The first is through our applications, when we have full control over resolution, codecs, etc. And we can give this signal to the audience without processing. The second option, when we give the opportunity to stream with anything, any software. Of course, in this case we have no control over codecs, resolution, quality. We receive this input signal and must give it to the audience in good quality. For example, if one player streams a game in 4K quality, and the viewer tries to watch on the phone in 3G and does not see anything, then we will be to blame. Therefore, a third-party signal must be processed by us and given to the viewer in the desired resolution.
Based on the above, we came to the following protocols:
- For non-processing transmissions: RTMP to ensure non-delay transmissions
- With processing / fallback: HLS, since there have already been scheduled delays for processing / transcoding.
At that time, we were aware of such decisions.
Red5
Red5 was considered because the team knew about him, but from a bad side. As a result, they did not even test it.
Erlyvideo
Erlyvideo. Domestic development. Quite popular, developers spoke at conferences. But they were not at all interested in cooperation. They said: download, sort it out. We wanted to run everything as quickly as possible, so we decided not to mess with. Left for later, if nothing better is found.
Wowza
Wowza at that time was used in a friendly project and had the opportunity to ask. And we took her for tests.
Pros: she really knows how much.
Minuses:
- Forum-oriented configuration. It has documentation, but to find something, you need to google. And in Google, all links lead to forums. And all the solutions that we found, we found on the forums.
- XML is everywhere. Even had to set pieces of xml in the browser interface.
- In order to receive a callback on such simple things as “user started broadcasting”, “finished”, “authorization check” you need to write a module for wowza in java.
Quite a cons:
- 4-5 broadcasts were launched on the test machine (16Gb RAM + 4Gb swap) and some watched themselves (without any user traffic) and wowza took up all the memory and the machine started to sputter. I had to restart it every day.
- Sometimes wowza “beat” streams when streamers reconnected. Those. she wrote a record to disk, but then she herself could not play it. We wrote to support, but they did not help. Perhaps this was the reason for the rejection of wowza. Because the rest could be survived.
Nimble streamer
Another domestic development. She was much more friendly. All the features we needed were either ready or planned for the near future. It was written in C ++ and could theoretically run even on a Raspberry Pi, i.e. in terms of consumption was a cut above java. They also had a library for implementing the player on mobile devices (iphone, android).
What does it look like?
- The binary of the server itself. With closed source. Free.
- Paid licenses for transcoder.
- Paid external control panel, which is much more convenient than the wowza panel. You don’t need to climb the forums. Everything is configured with the mouse.
Or setting transcoding and signal processing. You can drag and drop components directly with the mouse and everything is set up visually. Without restarting.
So, the pros:
- Cool panel.
- Resource consumption is incomparably better than wowza.
- Convenient, good API.
- Total costs cheaper than wowza.
Cons:
When we stream in HLS, the player calls via HTTP to the manifest file with a list of qualities. The peculiarity is that Nimble generates this file on the fly, based on what request it received. If they came via https, then it gives https links. If on http, then http. With us, he stood not directly for the audience, but for nginx, and the problem is that the viewer comes via https, and he gives an http link and the player cannot play. The only solution is a sub-filter at the nginx level, which changes the address of the links. Crutch.
The second minus is that for management through the API, requests are sent not to the service itself, but to a third-party Nimble service. At the same time, access to it is via whitelist IP without subnets, and we wanted to go there from a subnet of about 128 IP. The form in the panel allows you to enter one IP. I had to make proxies for this API. God knows what the problem is, but it is.
There is still a problem with the asynchronous Nimble API. The binary server synchronizes with the scheduled API. Those. if we add a new streamer, it creates a stream, quality settings, but they will be updated only after 30 seconds.
Current architecture
The streamer gives us an RTMP stream. And we have to give it to the viewer in RTMP and HLS. We put inbound traffic machines that route it to a specific work machine.
We did this so that we can perform operations with these servers, such as updating software or restarting. We remembered that nginx has an rtmp module that allows you to route rtmp traffic. And we set it as an incoming node.
Thus, we closed all our internal kitchen from streamers. Traffic goes to nginx, and then he proxies it where necessary. At the level of the module itself, it is possible to rewrite the rtmp link and redirect the stream there. This is an example from the documentation (without forums and xml!):
He goes to the address on_publish, he rewrites the address to a new one and the stream goes to this address. Several incoming servers sit on the same IP and at the balancer level the traffic is distributed across them.
With the distribution of the same. We wanted to hide the insides from the audience. To prevent viewers from going directly to the machine that processes the video. By analogy with incoming, we have distribution servers. Nginx is also used there. For RTMP, the same rtmp module is used. And for HLS, proxy_cache with tmpfs is used to store m3u8 and HLS fragments.
In the summer of 2016, International (Dota 2 tournament) came to us. And we realized that our circuit ... is bad :)
We had several distributing cars and the audience came to them more or less evenly. It turned out that the same traffic went from the working machine to many distributors and we quickly ran into the outgoing network on the working machine. To get started, we just made an extra layer of caching servers.
They did it in emergency mode. Live is coming. The network is not enough. Just added machines that reduced outgoing traffic from working machines. It was a half-decision, but we at least began to drag the broadcast of DotA. The second solution is the consistent delivery of machines. We do not just give the stream to a random distribution server, but try to give it from one machine while it is coping.
This made it possible to offload work machines and distribution servers on traffic. In order to direct the user to the necessary machines, we put a daemon on each distributing server that polls all the machines in its layer. If at the given moment the given machine is overloaded, but the daemon says to nginx: redirect traffic there now.
To be able to redirect the user, we made the so-called rtmp redirect. The link leads to https. If the machine is not loaded, then it will give a redirect to rtmp. Otherwise, to another https. And the player knows when he can play, and when he needs to redirect.
The final scheme came out like that. The streamer streams to one of the incoming machines, which are the same for it, since they are behind the balancer on the same IP. The incoming server selects a working machine that stores and processes the stream, and also shows it to viewers through two layers of the distribution servers.
At the same time, not all working machines are on Nimble. Where we do not need transcoding, we use the same nginx with the rtmp module.
What's next?
At the moment, about 200 thousand streams begin every day (at the peak of 480 thousand). About 9-14 million viewers every day (at the peak of 22 million). Each stream is recorded, transcoded and available as a video.
In the near future (which has probably already arrived, last year’s report), it is planned to expand to a million viewers, 3 Tb / s. Switch to SSD completely, since working machines run into a disk very quickly. We will probably replace Nimble and nginx with our bike, since there are still disadvantages that I have not mentioned.
How does a conference attendance differ from viewing / reading a report? The fact that at the conference you can approach Alexei (and not only him!) And find out the specific details that interest you. Communicate, share experiences. As a rule, reports only set the direction for interesting conversations.
Come listen to reports and chat. Readers of Habr registration at a discount .