Preparing videos for sound design. Which codec to choose

  • Tutorial
The material in the article reflects the personal experience of the author and does not purport to be scientifically accurate. I will be glad to any corrections and additions. For those who do not want to read further, the correct answer to the main question is: MJPEG.

Introduction

Customers often prefer archaic ways to transfer project materials, and cases where a 5-minute film is sent as an attachment in the mail, compressed to 20 MB, are not uncommon. Material for familiarization becomes material for work, which entails a number of unobvious problems, the main of which are low detail of the image (caused by excessive compression) and the use of video codecs that are not intended for audio editing.

Low detail, pixelation and overall blurring of the picture complicate the work of the sound engineer at the very initial stage, when there is an assessment of the plot and visual elements of the film that can be voiced. This gives rise to such a problem as an aesthetic mismatch, when, for example, a plastic (by design) object is voiced as metal or glass.

The low quality of the picture also complicates the sound engineer’s determination of the beginning and end of dynamic visual events, which entails their fuzzy synchronization with audio. But more often than not, the lag or the lead over the sound of the picture is associated with the features of the video codec that was used during the audio editing process, which will be discussed below.

general information

But first, some information about what a video file is all about. In short, this is a container - a metafile that contains several streams of data. Streams include audio, video, images, subtitles, menus, chapter information, metadata, tags, etc. Inside the container, there can be several streams of the same type at once (for example, 2 video tracks, 3 audio tracks, subtitles in several languages), and each of them can be compressed by different codecs. Here it is worth recalling the terms:

  • mux - packing multiple threads into one container
  • demux - extract streams from a container into separate files
  • remux - replacing one or more threads in a container

All these operations are without loss of quality, i.e. they do not affect the contents of streams.

AVI, FLV, MOV, MP4, MKV, OGG, TS, WebM are not video codecs, but containers - the extension of the video file does not reflect the nature of the content in any way. Video codecs are DivX, XVid, H.264, MPEG, MJPEG, Theora, VP9 and they come in three types: lossy , lossless and intra-only . Codecs determine image quality and suitability for audio editing. About intra-only will be discussed below, and the principles of the first two types are well described in this article . In short, the codec divides the video stream into groups of frames ( G roup O f Pictures) and to reduce the file size, only the first frame (i-frame) is completely saved in each GOP, and the rest (b- and p-frames) contain only information about changes in the picture. As a result, the structure of each GOP looks something like this: ibbpbbpbbp. The stronger the video is compressed, the higher the threshold for passing changes to b- and p-frames. The longer the GOP, the more problems there will be with rewinding (sticky frames, etc.). The conclusion follows: for audio editing, lossy and lossless codecs are conditionally suitable only if the video was converted with a small GOP value.

Synchronization

Stream synchronization is carried out by means of timestamps that are generated by the codec during (de) encoding. If an error occurs at this moment, the codec skips such frames and assigns the timestamp of the problem packet to the next non-problem one. As a result of this, a “beaten” stream is out of sync with the rest. When you convert this file to lossy / lossless format again, the effect may be enhanced.

Intra-frame

A distinctive feature of intra-frame codecs is that each frame of the stream is a key (i-frame). There are no defective intermediate frames. One of the popular codecs of this type is MJPEG (Motion JPEG) . It converts video into a sequence of independently compressed JPEG images.

MJPEG Pros:

  • fast conversion speed
  • smooth rewind
  • suitable for audio editing

Minuses:

  • file size can be quite large

You can convert any video file to MJPEG using the ffmpeg utility . The command will be something like this:

ffmpeg -i input.avi -c:v mjpeg -q:v 1 -c:a copy output.mov

In order not to perform this operation every time from the command line, create such a script (for Windows) and just drag and drop video files onto it (several can be done at once):

for %%A in (%*) do ffmpeg -i %%A -c:v mjpeg -q:v 1 -c:a copy "%%~nA"_mjpeg.mov

For convenience, a shortcut to this script can be dropped into the SendTo folder (in the properties of the shortcut you will need to clear the “Start in” field).

Finally

In advance, ask the client to send the video in good quality (not very compressed lossy or lossless), then convert any sent video to MJPEG and make voice acting for this format. When the sound is ready, remux the client video by adding your audio track to the container. Some containers have limitations, for example, MP4 does not support audio streams in PCM (WAV / AIFF), in this case the sound will have to be converted to MP3 or ALAC. A detailed compatibility table is on Wikipedia .

Also popular now: