The new algorithm created by scientists allows you to create almost perfect "talking heads" with real people

Researchers learned how to edit videos by putting any words and sentences into a person’s mouth on a video . The technology processes the video in such a way that it looks very natural and organic, you can notice the fake only if you suspect editing.
The new algorithm was created by a joint team of researchers from Stanford, the Max Planck Institute, Princeton and Adobe. Editing consists only in creating the text that the person from the video should pronounce. The rest of the work is done by the neural network. It is difficult to notice a fake because facial expressions and patterns of movements of the “speaker” are preserved, the technology allows masking traces of interference.
In order to achieve this, the creators of the algorithm taught him how to analyze video. The neural network selects the necessary gestures, elements of facial expressions and words with articulation, and then combines individual frames so that the modified video looks intact. The result is, in fact, a computer model that performs the actions necessary for the technology owner.
The movements of the lips, tongue, all articulatory elements are original, the neural network “cuts” them from the original video. After that, the video does not look too natural, because it contains a large number of cuts and pauses. Therefore, the technology “smooths out” the resulting option so that it looks as natural as possible.
Before use, the neural network needs to be trained - it needs to “feed” at least 40 minutes of video with the person or people whose speech will be replaced. True, this is relevant only for English-language videos, since there are only 44 phonemes in English, so it is much easier to train a neural network using English as an example than Russian or Japanese. However, over time, this technology can be used to edit videos with people who speak any other languages. Below is a video that serves as a demonstration of the capabilities of the described technology.
Of course, this work raises a number of questions. One of them is information and media security. If any words can be put into the mouth of any person, and the result will look very natural, is the technology dangerous? The authors of the development claim that yes, it can be used by attackers. But, for example, graphic editors have existed for a very long time, with their help you can also fake anything, but the world and we continue to exist with it.
In addition, the authors say that they understand that the same technology can be used by unscrupulous politicians. The latter will be able to avoid the need to make speeches in front of the camera if they are replaced with “talking heads” formed from earlier speeches captured on video.
In order to detect a fake, the authors of the idea suggest using specialized watermark and some other techniques that will make it possible to recognize the forgery.
Of course, the fact of video modification is easy to prove if there is an original video. In addition to this, the authors plan to develop methods for protecting media content by adding “digital fingerprints” to the original version, which are easy to detect and understand whether the video is original or fake.
The full text of the study can be found here .