We are promised a real-time video without friezes and twitching.
Every time I launch Skype, Zoom or Hangouts, I look forward to a fresh batch of jambs with video and sound. Technology rarely disappoints me: croaking, background noises, loss of voice, video splitting into “squares”, frozen frames and other joys of video conferencing haunt video calls, as far as I can remember. Interest is in many ways professional: in addition to programmable telephony for regular phones, web pages and mobile applications, we ship to Voximplant for video developers. I want Full HD, in real time, without friezes, in any browser and a conference for 50 people. Interestingly, in the laboratory it works just like that. But in some park on 3G, a video consultation with a doctor can turn into a step-by-step strategy: the packages are lost! Modern technology stack for nowdoes not allow on an equal footing to fight the "blinking" Internet, but research is constantly being conducted. Under the cut - adapted for Habr translation about Salsify : a video codec fusion and a network protocol that minimizes problems when transmitting video in real time.
A team from Stanford conducted an experiment: it replaced the entire patchwork of modern video conferencing technologies with one compression and transmission protocol.
Video conferencing: alliglag, fffffreeze and twitching
After a while, problems go away by themselves. Sometimes - along with the image, leaving a black screen instead. The delivered troubles live in the range of “wait a couple of minutes, the grid blinks” to “the tele-operation can be completed, the patient has died.” Scientists from Stanford approached the problem fundamentally, having developed from scratch both the network stack, and the codec, and the data transfer method with a single goal: to do better than Skype, FaceTime, Hangouts, and Chrome + WebRTC.
Stanford graduate student Sajjad Foladi, heading the study, presented the results at the NSDI'18 profile conference . The ideas underlying the solution "from scratch" are available to all comers and can be used in commercial solutions. Of course, if someone wants to replace the entire stack.
“Video transmission over the Internet has evolved for decades. Now the technology stack is more like a quilt, ”says computer science associate professor Keith Winstein . “Sajjad showed how you can assemble these pieces in a different way to get better quality video with less delay.”
But about the timing of the introduction of the Winstein more cautious. “Now we are thinking of changes so that one day the transmission of live video will become more reliable. It will be very useful in telemedicine and robotic operations, ”he says. "But in the software that is used now, all these changes are difficult to make."
New approach, new name
The Stanford team called its framework “Salsify” (Kozlobornik, such a “flower”, remotely resembling a dandelion in his youth - a translator's note). The framework solves the problem caused by the fact that “real-time video transmission” is now made of two different technologies. This is a “codec” that compresses video and a “network protocol” that transmits small pieces of data over the network and tries to guess when it is necessary to send the next pieces so that it is not thrown out anywhere along the way, because the network is overloaded and everything is bad. The problem is that these two components evolved separately from each other, often by different companies, and then were combined in products such as Skype or FaceTime.
Folady is sure: to solve the problem with friezes and lags, the codec and the network stack must work together. After all, it is important not just to send a packet over the network. You need the correct data in this package! And not a piece of video 3 seconds ago, which will still be thrown out on the receiving side as "too old." As the project manager says, “when the transport protocol and the codec lose synchronization, problems start.” Therefore, the team made a new codec, which is integrated as much as possible with the transport protocol. One algorithm controls the compression of video frames, the formation of network packets and their sending. Thus, the video stream "knows" about the state of the network in real time and tries to "fit" into it as far as possible.
Even one frame sent at a wrong time can lead to jerks and friezes. Salsify will never send a frame if it can cause network problems.
See and believe
Researchers conducted many tests comparing Salsify with Microsoft Skype, Google Hangouts, Apple FaceTime and Google Chrome + WebRTC. On average, Salsify reduces the delay by four times (!!!), and the image quality becomes 60% better (by the method of changing structural similarity, SSIM). Ready side-by-side comparison with Chrome 65 WebRTC and made a separate website dedicated to the project. Open source project : you can download, learn, use the developments.
Everyone has problems with video conferencing. It's very cool to work on a project that aims to change the situation.