Video hosting development on Erlang

We present to your attention the report of Maxim Lapshin made by him at the Application Developer Days conference . We put together video and audio, presentation slides, and a transcript of the report. The latter required tremendous effort, but it is clearly worth it. A forty-minute report can be “heard” several times faster.

He brought the video and presentation into a single video, and also recorded a transcript of Stas Fomin (a man and a ~~steam~~ locomotive :)).

annotation

Maxim Lapshin ( erlyvideo ), a developer of scalable web services, talked about developing a video streaming server on Erlang. We are talking about the open-source project ErlyVideo - an increasingly reliable, scalable and free server for broadcasting any video - from security cameras to video conferencing. The technology is of particular interest, because it was the choice of such a little-known language as Erlang that provided high reliability, scalability and speed of development.

Erlang is a reliable object language for creating network services. The concepts of processes and data immunity adopted in it make it the only platform in which both garbage collection and a fixed object death time exist simultaneously. The semantics of the language is one of the simplest among the most common on the market.

These features make Erlang an excellent choice for servicing statefull clients: video streaming server (erlyvideo), the most common jabber server (ejabberd), poker servers (OpenPoker), etc. The report examined why Erlang is so convenient to do.

Video

Podcast

Link to the podcast .

Presentation

Link to PDF with slides .

Transcript

A transcript of the video was recorded by Stas Fomin.

What is streaming?

Good afternoon, my name is Maxim Lapshin, I am the author of the ErlyVideo video streaming server written in Erlang , and today I would like to tell you what the product is, why it was made ... what it is, why I did it all.

So, what is streaming in general.

Slide:
YouTube is not streaming

What is YouTube? A huge number of different videos are stored there, but this is not a streaming project, there is no streaming there. There are just ten-second videos that are given to you by nginx-ohm, well, or another web server.

Each browser enters, receives a video and plays it. But this has nothing to do with streaming video. Where then is all this used, what is it all for?

Custom tv

Slide:
User uploads video files
Makes a playlist
At the request of others, the playlist starts playing
If nobody needs, then the video is not played

Let's look at the task of custom television, which I recently had. So, the idea is this: the user uploads his files, almost like on YouTube, then forms a playlist from them, which he wants to show as a TV program on ordinary television.

After which, other users come at some time, not necessarily at the right time. They come whenever they want, they see his interesting playlist, they like his program, they want to see what he shot. And then, when all the users disperse, they are not interested, they need to release all these resources that have been downloaded.

Why not do it all on regular nginxe?

It seems like a run-in technology, the same YouTube works, but no, it does not work.

What does a streamer do?

Slide:
What does a streamer do?
Unpacks video and audio from file containers
Packs in a shipping container
Sends frames synchronously with real time

The problem is that it is necessary to give the available files to the video stream. Because it is necessary monotonously to all users to show the same thing.

For this task, you just need a streamer, that is, a streaming video server that will do this.

He will pick up the necessary files from the containers in which they lie on your disk.
He will pack them in a transport container, through which it will be possible to deliver the video to those users who came to watch it.
And it is very important to understand that it sends frames synchronously with real time.

So, if you logged in, you get real-time in half an hour, and you get half an hour of real-time, which are from the file. If you just download the file, you could download much faster.

Retreat about codecs

A small digression so that everyone understands what codecs and containers are, so that there is no confusion. A codec, this is what you’ve got it into ... this is a format for representing raw data that is captured from the camera’s matrix, or from a microphone.

A container is what already encoded data is packed into. For example, if you met h264 or AAC, then these are video and audio codecs, respectively. And MP4 ... - there is no such codec, it is a container in which you can remove absolutely any video and absolutely any sound.

Slide:
Codec - presentation format for compressed audio and video data
Container - a packaging format for one or more streams of audio and video in a file or stream
H.264 / AAC - Best Codecs
MP4 - the smallest file container

User TV Stages

Slide:
User TV Stages
Download playlist
Unzip file
Pack frames in a transport container (RTMP, MPEG-TS, ...)
Clean up when customers leave
Allow updating code without disconnecting clients

So, what is the server doing, showing user television? He downloads playlists, unpacks files, repackages them, cleans everything, and a very important thing that I did not mention separately is that in this task it is very important to update the code without disconnecting clients, because there are up to a thousand, ... many thousands customers who came to watch an interesting video. If we want at this moment to roll out an update for them, then these thousands of customers will come to us again, reconnect.

Besides the fact that this is banal, just an inconvenience for users, it is also very, very, very expensive, because our traffic will clog completely.

Traditional solutions

What do people usually do such decisions on? Traditional solutions, these are traditional tools - Java, C ++, they have products that deal with video streaming. This is, for example, Red5, a free, paid Wowza written in Java, or it is rtmpd written in C ++.

Parsing mp3 in Java

What is the problem…? Well, here is an example I gave, this is a small piece of code, how RTMP in Java is parsed, this is a small piece of code, one hundredth of a file.

This is how any server in Java looks like, you can take a closer look - this is a small, small, part of the file. Delve into this is very difficult.

if (id3v1 instanceof ID3V1_1Tag) {
   try {
     // Add the track property
     graph.add(mp3Resource, processor.resolveIdentifier(IdentifierProcessor.TRCK),
         factory.createLiteral("" + ((ID3V1_1Tag) id3v1).getAlbumTrack()));
   } catch (GraphException graphException) {
     thrownew ParserException(
         "Unable to add track number to id3v1 resource.",
         graphException);
   } catch (GraphElementFactoryException graphElementFactoryException) {
     thrownew ParserException(
         .... ещѐ 600 строк кода
         graphElementFactoryException);
   }
 }

Parsing mp3 to Erlang

That's all you need to write in Erlange to decode MP3. Everything. Five lines. It is already unpacked and can be sent to users.

decode(<<2#11111111111:11, VsnBits:2, LayerBits:2, _:1, BitRate:4, _/binary>> = Packet) ->
  Layer = layer(LayerBits),
  Version = version(VsnBits),
  <<Frame:(framelength(bitrate({Version,Layer}, BitRate))/binary, Rest/binary>> = Packet,
  {ok, Frame, Rest}.

Accordingly, what we get - from the very beginning, from unpacking the file, something is not right, in Java and C ++, there is a lot of code, a lot of overhead logic we write in our code.

But everything becomes ... all this syntactic sugar, it all becomes completely unimportant when thousands of customers come to us.

And we are having problems of a completely new nature, without regard to whether it is convenient or inconvenient to write code there.

What is the problem? Well, this is all as always: memory management, so that it doesn’t flow, and segfaults are not caught, it is control over the resources of customers who have come to us, who need to be remembered, and who needs what and when, in order to effectively free memory.

In the case of C ++, we have another problem, Java allows us to somehow protect the code due to the lack of direct work with memory. In C ++, an error is in one place, especially if you have a multi-thread application, it can destroy the whole application and you will never debug this bug. You can be guaranteed to be sure that in any C ++ program, especially multi-threading, there is a bug that you just haven’t found yet, you don’t even expect it to be there.

And another problem is that when you start thousands of clients, it is difficult for you to organize I / O. It’s not enough for you just to use threads, and just write to the socket, you need to use complex libraries or events that use different complex mechanisms.

Slide:
Problems of classic solutions with thousands of customers
Memory Management: Leaking or Premature Release
Customer Resource Control
Chaotic system failure in one place
Input / output when serving thousands of customers

What is it? The Red5 server crashes under a hundred users. Eh, here, unfortunately, it didn’t turn red.

Already on one hundred users, the server crashes and does not serve clients. Why? Yes, because it is poorly written, people didn’t take it into account when developing it, the I / O issue, and now, it just stops serving.

In the case of Wowza, we have other problems that my clients were having - they have a Wowza flow, despite the fact that there is a garbage collector in Java, somewhere there is some link left, resources are not freed, the server is swelling, and looks scary like that.

How is this done? Well, for example, we have a streaming server that serves another message delivery channel. The user logs in, the object that was created for him is registered on some channel in the list, the link to it is remembered, the test for the object will be held, the user will be disconnected, but we forgot, we forgot to remove the link to it. Everything.

His data remained forever, we can’t get the information that you need to disable. And that's it, to restart the server.

Slide:
epoll / kqueue is complicated for long connections due to memory management.

As for I / O, the epoll / kqueue mechanisms, for which there are libevent libraries, are the only way to serve thousands of sockets, they are very, very complicated, for ... when you have complicated business logic ... because efficient memory management in event-model, in my opinion, insanely difficult.

So, we get such a construction with a C ++ server. You are guaranteed to start your working day with the fact that you are raking out the core-ki that have survived overnight on the file system and it is good if you have enough hard disk.

Roots of problems

In some ways, the roots of these problems that lie are common to traditional solutions. The first is shared memory.

Unfortunately, the picture is not visible again.

Shared memory that is shared between all objects that are in the system. Anyone can go anywhere, take a link to anything, and in the end, we get such a design when all objects cross-reference each other, and it’s much more difficult for us to control the memory of who captured what and who needs what .

You need to understand that, these problems do not interest us when we write a site in PHP. They do not interest us, for the reason that your application, which works when servicing a web service, lives for one second maximum. In one second, everything that it used can be destroyed, because it all becomes unnecessary, we already have a new request, a new connection.

Slide:
Web-based approach → “let it flow, we'll beat it soon” → does not work.

This will not happen here, clients connected for hours, days or more. And you need your code to work efficiently, without leaking, for weeks.

Erlang solves these problems radically

Erlang turned out to be a platform that, surprisingly, radically solved these problems, and almost completely closed the problems that I spoke about.

This was done 90% due to its concept of processes.

Slide:
Processes
Parallel threads
Isolated Memory Area
Messaging exchange
Immutable variables
No data outside processes

Processes in Erlange, it is something like threads, in ordinary systems. They are lightweight, they take up much less space, and most importantly, they are completely isolated.

Every process in Erlange is a box from which nothing flows out. And we know for sure that all the memory that is in the system is guaranteed to belong to some process. There can be no data outside the process. That is, always, if there is some gigabyte chunk of memory, it seems to be nobody's, but we know what process it owns and we can nail it to free this data.

Question: But binaries are transferred between processes by reference?

Well, these are the intricacies of the implementation, but in fact, these binaries can always be traced. They are being investigated.

Question: What and how?

We know that the process knows which binaries and what size it refers to.

Therefore, what we get: all the data that is in the system is stored exclusively inside enumerated processes. You can go through all the processes in the system to find out who ate all the memory and stop this mockery of the system.

Slide:
All data is stored inside enumerated objects.

The next feature of the process approach to organizing data inside and execution threads is that errors that occur are hard ??? process. If we have a mistake that we didn’t handle, that we didn’t want to intercept, we decided to let it flow further, we were not interested in its fate, this is a fatal mistake, our process ends. And most importantly, it is very, very similar to releasing, destroying an object, because it is a well-known procedure, that is, we know that since an error has occurred, then the process has stopped. It will not be in an hour, not in two days, it will be right now. And it is important to understand that the processes that ordered ... that want to monitor its condition, other processes, neighboring ones, they will receive information about this, that their neighbor has died.

Slide:
Error Handling
They can be caught
If you do not catch, the process ends
Neighbors learn about it through messages
Guaranteed Stripping Resource

As a result, it turns out that we can monitor the status of processes. For example, we start a separate process that serves connections with the user, start monitoring him, and if there was any error, our error in the code, most likely, the process that is following him will find out, “yeah, we have the process serving the socket has died. " So, in principle, there is no further point in servicing the user, there is still nothing to serve him, and all the processes that were created to serve him must be cascaded.

Accordingly, in the system that is offered, in the platform that comes with this language, there is a system of supervisors, ready-made mechanisms, a very streamlined set of programs, there are practically no errors in them, well, that is, I don’t know that they have found errors lately, they work stably and allow you to restart your processes.

Why is this needed? For example, you have one of the most important processes in the system, this is a daemon, it is a process that listens on a socket. It is fixed on a socket, and accepts connections from the system. You can be sure that it will work guaranteed, otherwise your whole system may fall off (???). Now, if it falls off, it makes no sense to torment your server.

Slide:
Process Tracking
Communication
Supervisors
appmon

Unfortunately, I can’t show on my laptop, I don’t have an adapter, but I would like to show such a thing as app monitor. This is supplied, again with the platform, a mechanism that allows you to graphically see a list of all the processes that you have with their tree. That is, we can ... this is a very useful thing, when you can see that a user is coming to you, you have created an object for it, it requested some resources, just climbed under them (???) into some processes ... The user leaves, the processes remain - in fact, this is a leak of processes, and with the help of app monitora, all this is seen very clearly.

But, unfortunately, I will not show you. :(

Erlang's real hot code update

And in erlang there is probably the only one of all existing platforms, a real, hot code update. It looks like this - clients do not disconnect, they continue to work, in the case of a video streamer, they continue to receive video, in the case of online games, the connection is not lost, and the code is already serving a new one.

Slide:
Without disconnecting clients!

Other systems that allow this to do I do not know.

What are the results of using Erlang?

And what happened in the end, after I decided to use the erlang to create my server? It turned out our video streaming server Erlyvideo, which is now in the top two of the best in its field, in terms of the set of features implemented, in terms of development speed, stability and efficiency.

Slide:
Erlyvideo:
Multiprotocol server
Holds thousands of clients on one server
Existing infrastructure for plugins

For example, it serves thousands of clients from a single server quite normally, now it is in production with BD (???) and it works.

It turned out to be very effective and simple, due to the dynamic typing of the language, which is naturally dynamic, because we cannot find out what this other process is, therefore all communication between the processes comes down to the exchange of messages. Therefore, this language ... we can talk about the dynamic typification of this language.

Therefore, it turned out a very convenient infrastructure for plugins, which is also very actively developing.

But this is a very painful topic for any product, how to correctly build a plug-in system. It is very unclear where to make these places, where you can stick this plugin.

As a result, Erlyvideo perfectly solves the stated problem, the server can stand for weeks, and without restarts and without any memory leaks, there are no problems with this. For example, I have been standing for months, and it doesn’t swell, keeping my memory at the same level.

findings

Slide:
The tasks of streaming video have specifics that distinguish them from the web:
Efficient and high-level tools needed at the same time
Erlang fits perfectly into this niche
Practical use has shown the effectiveness of choice

In the end, what conclusions can be drawn from the use of Erlanga for this task?

The tasks of transmitting video streaming on the Internet have their own specifics, which distinguishes them very much from the web. And unfortunately, many solutions are run-in and reliable, and so understandable ... and it seems that you can easily find programmers for them, they carry, in their structure, the roots of the problems that were initially addressed in Erlange.

You will not have these problems, because ... just because of the specific organization of the code.

Therefore, in the task of servicing statefull customers, erlang fits very well. And, practical application has shown the extreme effectiveness of this choice.

Fitment erlang

Well, it’s clear that the niche of streaming video is a rather narrow thing, there are only five of these products on the market, and in principle, this is probably enough. It makes no sense to write another streaming video server on erlang.

Slide:
Video streaming (erlyvideo)
Jabber server (ejabberd)
Bank processing (Privat Bank)
Online Games (Online Poker)

However, it has other niches of applicability, for example, the best Jabbera server is made on the same erlang, it is ejabberd, it is so cool that in Yandex, for example, despite the terrible antipathy to erlang, they decided to use it, yes, Yash? Does Grisha strongly dislike Erlang (bobuk)? He cursed and spit very much, but they had no choice - this says a lot about the product. Also, for example, it is reliably known that banks make banking processing systems. I don’t know, of course, which particular details, of course, the details are not disclosed, but I know that Privat-Bank has a number of companies that transfer their long-lived processing systems to Erlang, because they find it convenient.

And of course, online games. People regularly turn to me, after our success with unwinding the top VKontakte toy, that is, people turn to "we should do it so that it works well and conveniently for us."

I look at their problems, and I understand that they should sell their toy on an erlang, most likely. Because there is not a lot of business logic, but it is very specific, and they scrape off all the problems that I described, using conventional technologies, rails there, or Java.

And there’s even an online implementation of poker on erlang.

Questions?

So, in general, I probably have everything, this is the report I got. If you have any questions, I will be happy to answer them.

Slide:
Max Lapshin
max@maxidoors.ru
erlyvideo.org

The session of answers and questions did not fit into the size of the habro-publication, therefore it should be searched on the conference website using the “transcript” link.

Tags: