
Researchers restored sound from vibrations of objects in the video
Sound is oscillations of a certain frequency that propagate in the space surrounding the source. These waves reach nearby objects and make them experience vibrations. A group of researchers at the Massachusetts Institute of Technology was able to partially restore the original sound with some distortion based on these vibrations seen in the video.
In their work, Abe Davis, Michael Rubinstein, Neil Wadhwa, Gautam Mysore, Fredo Durand and William Freeman used a camera that records video at a frequency of several thousand frames per second, and such common and subject to vibration objects as foil packet chips, leaves of indoor plants, face boxes with napkins or a glass of water. It will be quite difficult to find such a video camera in everyday life, but their other technique has shown that sound recovery is also possible using ordinary recording at a frequency of 60 frames per second.
The quality of the restored sound makes it possible to separate individual words and has relatively high signal-to-noise ratios. Recovered audio recordings even allow you to vaguely distinguish between human speech or use music recognition services.
In the video above, at 00:45 or on the project page , the source sound is shown (the researchers used the song “ Mary had a lamb ” well known to anyone interested in the history of sound recording ) and the restored sound, while the vibrations in the high-frequency video are not visible to the naked eye - vibrations reach less than one hundredth of a pixel.
Then, in the video at 1:50, the original sound recorded by the microphone of the cell phone and the restored sound of human speech are demonstrated. In this case, the camera was located at some distance from the packet of chips vibrating from the sound waves, and glass was located between it and the object, which increased the complexity of the task. Researchers again used the first of the songs recorded on the Thomas Edison phonograph.
At 2:35, it is shown that music recognition services are able to “recognize” restored audio recordings, in particular, Queen's Under Pressure song was recognized.
The above results were obtained from cameras with a shooting frequency of thousands of frames per second. But it was also shown that artifacts of shooting ordinary household video cameras (in particular, the rolling shutter ) can sometimes be used to produce sound with a frequency much higher than the frame rate of the original video recording.
The results of the changed technique can be seen at 3:35, the researchers were able to restore the frequency more than five times higher than the frame rate of the video. The same MIDI file with the melody of a children's song was used.
More information and audio recordings are available on the project page.. A team of researchers promises to publish the project code in the near future.