Monitoring latency during online video broadcasts and TV bridges

About a week ago, there was an interesting article about the methods of organizing video broadcasts with the minimum possible delay, and a number of legitimate questions were raised in the comments, many of which I did not see a full and meaningful answer. In my post, I would like to supplement the material of my colleagues and share my thoughts with readers on the following questions:

Why is there a minimum delay in general?
How can you simply and clearly measure the delay when broadcasting a video signal?
What elements of the video path affect the increase in delay?



Our top result - FullHD signal flew to the server and back in less than half a second.

Interesting? Then read on.

So why do you need a minimum delay ?

My company organizes online video broadcasts, and lately we have been very actively developing the topic of telemedicine - arranging broadcasts of surgical workshops, and organizing almost full-fledged telebridges between operating rooms and conference rooms: a picture from external cameras and medical devices is transmitted from the unit endoscopes, laparoscopes, robotic surgeons), viewers sit in comfortable chairs at the other end of the city, look at the FullHD picture on a healthy screen, and ask doctors questions as necessary. In this scenario, comfort in communication was very important for customers - everyone got used to the phone and Skype, and even a 3-4-second delay significantly complicates the interaction between the hall and the operating room, and very distracting surgeons who perform the most real operations.

Here are the typical conditions in which we have to work:

- 720p or 1080i signal, most often in SDI format, received either directly from the camera or medical stand, or from the software output of the switcher;
- most often a fairly slow Internet, or its complete absence - a considerable part of the projects we do through 4G networks;
- lack of external IP;
- Highly dynamic and extremely detailed picture, the need to maintain adequate color reproduction;

I will say right away: we studied, tested, and buried safely the options for Skype and VKS (video conferencing) systems.

The main problems of Skype are the impossibility of manually adjusting the video quality parameters, and its “too smart” coding algorithm, which can independently begin to degrade the picture quality if it suddenly decides that it lacks the width of the Internet channel. Well, to get a picture from Skype on the SDI output of a switcher is a separate sorcery, not every Muggle is subject to it ...

Things were not going smoothly with the videoconferencing system either - significant requirements for channel bandwidth, the presence of external IPs, the absence of professional video and audio inputs, a completely horse-drawn price tag for both purchase and rental, and at the same time “no” quality. Yes, it is possible that videoconferencing allows beautiful uncles and aunts to be shown in suits sitting in a fashionably decorated rally room, but when our competitors started broadcasting from a laparoscopic stand through a videoconferencing unit at one status medical event, the picture came in terrifying - the system simply could not quickly code very dynamic video signal with high detail, and instead of FullHD on the screens was a completely infernal kaleidoscope, most of all reminiscent of the trailer for the movie "Pixels".

The quality of the picture on our mass broadcasts conducted through our own Wowza-based servers was much better than on Skype and the VKS, plus we had a decent fleet of encoders - powerful compact computers with SDI video capture cards, which allowed us to do several projects without nerves at the same time.

I set my engineers to “squeeze” the maximum possible speed out of Wowza, and immediately the question arose - how and how to measure the delay ? Frankly, we thought for a long time, and therefore the result looks even more funny, once again confirming that everything ingenious is simple.



We took as a basis the classic “countdown” used (or rather, long ago no longer used) in the film and television production, making it a little more informative and detailed.

The measurement procedure is ridiculously simple: we turn on the video in the player, run it through the entire video path, put the screens of the transmitting and receiving computers next to it, take pictures of both screens on the phone, subtract the smaller from the larger digit, and get the delay duration accurate to the frame. Accordingly, if the transmission and reception points are removed, then you can send a signal from site A, receive it at site B, immediately send it back to site A, take a similar picture, and divide the result into two. A spinning timecode and a red square hovering back and forth allow you to visually monitor possible jambs of the video stream transmission such as “sticks” and “blasts” of the picture.

Using this simple tool, we carried out a total revision of our video path, conjoined with the server settings, and literally milliseconds squeezed all the delays, getting a very good result - when broadcasting through the dedicated video, we got the FullHD video throughput speed with a bitrate of 4-5 Mbit / s in 11-16 frames (about half a second). When broadcasting over 4G networks and when transmitting over long distances (for example, tested St. Petersburg - Astana), the delay increased by about half a second. Here, of course, the complex routing between the points of transmission and reception is already beginning to affect.

For obvious reasons, I will not disclose the nuances of "tuning" the broadcast server, but I want to draw attention to an important nuance - often the "iron" elements of the video path give noticeable delaysthat few people think about when preparing a project. For example, we make a teleconference with a conference room in a hotel where all the projectors are connected to VGA, and you have the entire receiver path to SDI or HDMI - you can be sure that the mixer and VGA conversion will add at least half a second to you. Antediluvian projector connected by "composite"? Second. They put a cheap camera with an HDMI output at the transfer point, and fastened the SDI-HDMI converter to it with tape? Lost three frames. Count how many converters, splitters and other pieces of iron you have in your path, and you will get very impressive numbers, which often negate all the efforts of translation engineers. The conclusion is simple - optimize the path by removing all unnecessary signal transformations.

And for testing, you can safely use our video, it can be freely downloaded atthis link .

PS For lovers of simple mazes - a block diagram of the switching from one of our medical projects.


Also popular now: