Ffmpeg based video player
Hello, Habr!
This article will focus on the development of a simple player using libraries from the FFmpeg project.
I did not find articles on this subject on the Habré, so I decided to fill this gap.
Video decoding will be carried out using FFmpeg libraries, display - using SDL.
With FFmpeg, you can perform a large number of video processing tasks: encoding and decoding, multiplexing and demultiplexing. This greatly facilitates the development of multimedia applications.
One of the main problems, like most open source projects, is the documentation. It is very small, and the one that is not always relevant, because This is a fast-paced project with ever-changing APIs. Therefore, the main source of documentation is the source code of the library itself. From old articles, I advise you to read [1] and [2]. They give an idea of working with libraries in general.
FFmpeg is a set of utilities and libraries for working with various media formats. There is probably no reason to talk about utilities - everyone has heard about them, but you can dwell on the libraries in more detail.
To display the video on the screen, we will use SDL. This is a convenient and cross-platform framework with a fairly simple API.
An experienced reader may notice that such a player already exists right in the FFmpeg distribution, its code is available in the ffplay.c file, and it also uses SDL! But its code is quite difficult for beginner FFmpeg developers to understand and contains a lot of additional functionality.
A similar player is also described in [1], but it uses functions that are no longer in FFmpeg or that they are deprecated.
I’ll try to give an example of a minimalistic and understandable player using the current API. For simplicity, we will only display video, without sound.
So, let's begin.
First of all, we include the necessary header files:
In this small example, all the code will be in main.
First, initialize the ffmpeg library with av_register_all () . During initialization, all file formats and codecs available in the library are registered. After that, they will be used automatically when opening files of this format and with these codecs.
Now initialize the SDL. As an argument, the SDL_Init function takes a set of subsystems that should be initialized (a logical OR is used to initialize several subsystems). In this example, we only need the video subsystem.
Now we will open the input file. The file name is passed as the first argument on the command line.
The avformat_open_input function reads the file header and stores information about the found formats in the AVFormatContext structure . The remaining arguments can be set to NULL, in which case libavformat uses automatic parameter detection.
Because Since avformat_open_input reads only the file header, the next step is to get information about the streams in the file. This is done by the avformat_find_stream_info function .
After that, format_context-> streams contains all existing file streams. Their number is equal to format_context-> nb_streams .
You can display detailed information about the file and about all streams using the av_dump_format function .
Now we get the number of the video stream in format_context-> streams . By this number we can get the codec context ( AVCodecContext ), and then it will be used to determine the type of package when reading the file.
The codec information in the stream is called the "codec context" ( AVCodecContext ). Using this information, we can find the required codec ( AVCodec ) and open it.
It's time to prepare a window for outputting video using the SDL (we know the size of the video). In general, we can create a window of any size and then scale the video using libswscale. But for simplicity, let's make the window the size of a video.
In addition to the window itself, you must also add an overlay to which our video will be displayed. SDL supports a large number of methods for drawing images on the screen and one specifically designed for displaying video - it is called the YUV overlay. Yuvit is a color space, like RGB. Y - represents the luma component, and U and V are the color components. This format is more complex than RGB because part of the color information is discarded and there can be only one U and V sample for every 2 Y samples. The YUV overlay takes an array of YUV data and displays it. It supports 4 different formats, but the fastest of them is YV12. There is another YUV format - YUV420P. It is the same as YV12 except that the U and V arrays are reversed. FFmpeg is able to convert images to YUV420P, and most video streams are already contained in this format or can be easily converted to it.
Thus, we will use the YV12 overlay from the SDL, convert the video in FFmpeg to the YUV420P format, and change the order of the U and V arrays when displayed.
Converting pixel formats, as well as scaling in FFmpeg, is done using libswscale.
The conversion is performed in two stages. The first step is to create a transformation context ( struct SwsContext ). Previously, a function with the friendly name sws_getContext was used for this . But now it is deprecated, and the creation of context is recommended to be done using sws_getCachedContext . We will use it.
Well, here we come to the most interesting part, namely the video display.
Data from a file is read in batches ( AVPacket ), and a frame ( AVFrame ) is used for display .
We are only interested in packages related to the video stream (remember, we saved the number of the video stream in the video_stream variable ).
The avcodec_decode_video2 function decodes a packet into a frame using the codec that we received earlier ( codec_context ). The function sets a positive value for frame_finished if the frame is decoded as a whole (that is, one frame can occupy several packets and frame_finishedwill be installed only when decoding the last packet).
Now you need to prepare the picture for display in the window. First of all, we block our overlay, since we will write data to it. The video in the file can be in any format, and we configured the display for YV12. Libswscale comes to the rescue. Earlier, we set up the img_convert_context conversion context . It's time to apply it. The main libswscale method is of course sws_scale . He performs the required conversion. Pay attention to the inconsistency of the indices when assigning arrays. This is not a typo. As mentioned earlier, the YUV420P differs from the YV12 only in that the color components are in a different order. we set libswscale to convert to YUV420P, and SDV expects YV12 from us. Here we will do the substitution of U and V so that everything is correct.
We display the image from the overlay in the window.
After processing the package, you need to free the memory that it occupies. This is done by the av_free_packet function .
So that the OS does not consider our application to hang, and also to complete the application when closing the window, we process one SDL event at the end of the cycle.
Well, now the standard procedure for cleaning all used resources.
We pass to assembly. The simplest option using gcc looks something like this:
We start. And what do we see? Video is playing at tremendous speed! To be precise, playback occurs at the speed of reading and decoding frames from a file. Really. We did not write a single line of code to control the frame rate. And this topic is already for another article. There are many things that can be improved in this code. For example, add sound playback, or make the reading and display of the file in other streams. If the Habrasociety is interested, I will tell about it in the following articles.
The whole source code.
Thank you all for your attention!
Continuation: Finalization of the ffmpeg video player
This article will focus on the development of a simple player using libraries from the FFmpeg project.
I did not find articles on this subject on the Habré, so I decided to fill this gap.
Video decoding will be carried out using FFmpeg libraries, display - using SDL.
Introduction
With FFmpeg, you can perform a large number of video processing tasks: encoding and decoding, multiplexing and demultiplexing. This greatly facilitates the development of multimedia applications.
One of the main problems, like most open source projects, is the documentation. It is very small, and the one that is not always relevant, because This is a fast-paced project with ever-changing APIs. Therefore, the main source of documentation is the source code of the library itself. From old articles, I advise you to read [1] and [2]. They give an idea of working with libraries in general.
FFmpeg is a set of utilities and libraries for working with various media formats. There is probably no reason to talk about utilities - everyone has heard about them, but you can dwell on the libraries in more detail.
- libavutil - contains a set of auxiliary functions, which include random number generators, data structures, mathematical procedures, basic multimedia utilities and much more;
- libavcodec - contains encoders and decoders for audio / video codecs (quickly pronounce this phrase ten times in a row);
- libavformat - contains multiplexers and demultiplexers of multimedia containers;
- libavdevice - contains input and output devices for capturing and rendering from common multimedia frameworks (Video4Linux, Video4Linux2, VfW, ALSA);
- libavfilter - contains a set of filters for conversion;
- libswscale - contains well-optimized functions for performing image scaling, conversion of color spaces and pixel formats;
- libswresample - Contains well-optimized functions for performing oversampling of audio and conversion of sample formats.
To display the video on the screen, we will use SDL. This is a convenient and cross-platform framework with a fairly simple API.
An experienced reader may notice that such a player already exists right in the FFmpeg distribution, its code is available in the ffplay.c file, and it also uses SDL! But its code is quite difficult for beginner FFmpeg developers to understand and contains a lot of additional functionality.
A similar player is also described in [1], but it uses functions that are no longer in FFmpeg or that they are deprecated.
I’ll try to give an example of a minimalistic and understandable player using the current API. For simplicity, we will only display video, without sound.
So, let's begin.
The code
First of all, we include the necessary header files:
#include
#include
#include
#include
#include
In this small example, all the code will be in main.
First, initialize the ffmpeg library with av_register_all () . During initialization, all file formats and codecs available in the library are registered. After that, they will be used automatically when opening files of this format and with these codecs.
int main(int argc, char* argv[]) {
if (argc < 2) {
printf("Usage: %s filename\n", argv[0]);
return 0;
}
// Register all available file formats and codecs
av_register_all();
Now initialize the SDL. As an argument, the SDL_Init function takes a set of subsystems that should be initialized (a logical OR is used to initialize several subsystems). In this example, we only need the video subsystem.
int err;
// Init SDL with video support
err = SDL_Init(SDL_INIT_VIDEO);
if (err < 0) {
fprintf(stderr, "Unable to init SDL: %s\n", SDL_GetError());
return -1;
}
Now we will open the input file. The file name is passed as the first argument on the command line.
The avformat_open_input function reads the file header and stores information about the found formats in the AVFormatContext structure . The remaining arguments can be set to NULL, in which case libavformat uses automatic parameter detection.
// Open video file
const char* filename = argv[1];
AVFormatContext* format_context = NULL;
err = avformat_open_input(&format_context, filename, NULL, NULL);
if (err < 0) {
fprintf(stderr, "ffmpeg: Unable to open input file\n");
return -1;
}
Because Since avformat_open_input reads only the file header, the next step is to get information about the streams in the file. This is done by the avformat_find_stream_info function .
// Retrieve stream information
err = avformat_find_stream_info(format_context, NULL);
if (err < 0) {
fprintf(stderr, "ffmpeg: Unable to find stream info\n");
return -1;
}
After that, format_context-> streams contains all existing file streams. Their number is equal to format_context-> nb_streams .
You can display detailed information about the file and about all streams using the av_dump_format function .
// Dump information about file onto standard error
av_dump_format(format_context, 0, argv[1], 0);
Now we get the number of the video stream in format_context-> streams . By this number we can get the codec context ( AVCodecContext ), and then it will be used to determine the type of package when reading the file.
// Find the first video stream
int video_stream;
for (video_stream = 0; video_stream < format_context->nb_streams; ++video_stream) {
if (format_context->streams[video_stream]->codec->codec_type == AVMEDIA_TYPE_VIDEO) {
break;
}
}
if (video_stream == format_context->nb_streams) {
fprintf(stderr, "ffmpeg: Unable to find video stream\n");
return -1;
}
The codec information in the stream is called the "codec context" ( AVCodecContext ). Using this information, we can find the required codec ( AVCodec ) and open it.
AVCodecContext* codec_context = format_context->streams[video_stream]->codec;
AVCodec* codec = avcodec_find_decoder(codec_context->codec_id);
err = avcodec_open2(codec_context, codec, NULL);
if (err < 0) {
fprintf(stderr, "ffmpeg: Unable to open codec\n");
return -1;
}
It's time to prepare a window for outputting video using the SDL (we know the size of the video). In general, we can create a window of any size and then scale the video using libswscale. But for simplicity, let's make the window the size of a video.
In addition to the window itself, you must also add an overlay to which our video will be displayed. SDL supports a large number of methods for drawing images on the screen and one specifically designed for displaying video - it is called the YUV overlay. Yuvit is a color space, like RGB. Y - represents the luma component, and U and V are the color components. This format is more complex than RGB because part of the color information is discarded and there can be only one U and V sample for every 2 Y samples. The YUV overlay takes an array of YUV data and displays it. It supports 4 different formats, but the fastest of them is YV12. There is another YUV format - YUV420P. It is the same as YV12 except that the U and V arrays are reversed. FFmpeg is able to convert images to YUV420P, and most video streams are already contained in this format or can be easily converted to it.
Thus, we will use the YV12 overlay from the SDL, convert the video in FFmpeg to the YUV420P format, and change the order of the U and V arrays when displayed.
SDL_Surface* screen = SDL_SetVideoMode(codec_context->width, codec_context->height, 0, 0);
if (screen == NULL) {
fprintf(stderr, "Couldn't set video mode\n");
return -1;
}
SDL_Overlay* bmp = SDL_CreateYUVOverlay(codec_context->width, codec_context->height,
SDL_YV12_OVERLAY, screen);
Converting pixel formats, as well as scaling in FFmpeg, is done using libswscale.
The conversion is performed in two stages. The first step is to create a transformation context ( struct SwsContext ). Previously, a function with the friendly name sws_getContext was used for this . But now it is deprecated, and the creation of context is recommended to be done using sws_getCachedContext . We will use it.
struct SwsContext* img_convert_context;
img_convert_context = sws_getCachedContext(NULL,
codec_context->width, codec_context->height,
codec_context->pix_fmt,
codec_context->width, codec_context->height,
PIX_FMT_YUV420P, SWS_BICUBIC,
NULL, NULL, NULL);
if (img_convert_context == NULL) {
fprintf(stderr, "Cannot initialize the conversion context\n");
return -1;
}
Well, here we come to the most interesting part, namely the video display.
Data from a file is read in batches ( AVPacket ), and a frame ( AVFrame ) is used for display .
We are only interested in packages related to the video stream (remember, we saved the number of the video stream in the video_stream variable ).
The avcodec_decode_video2 function decodes a packet into a frame using the codec that we received earlier ( codec_context ). The function sets a positive value for frame_finished if the frame is decoded as a whole (that is, one frame can occupy several packets and frame_finishedwill be installed only when decoding the last packet).
AVFrame* frame = avcodec_alloc_frame();
AVPacket packet;
while (av_read_frame(format_context, &packet) >= 0) {
if (packet.stream_index == video_stream) {
// Video stream packet
int frame_finished;
avcodec_decode_video2(codec_context, frame, &frame_finished, &packet);
if (frame_finished) {
Now you need to prepare the picture for display in the window. First of all, we block our overlay, since we will write data to it. The video in the file can be in any format, and we configured the display for YV12. Libswscale comes to the rescue. Earlier, we set up the img_convert_context conversion context . It's time to apply it. The main libswscale method is of course sws_scale . He performs the required conversion. Pay attention to the inconsistency of the indices when assigning arrays. This is not a typo. As mentioned earlier, the YUV420P differs from the YV12 only in that the color components are in a different order. we set libswscale to convert to YUV420P, and SDV expects YV12 from us. Here we will do the substitution of U and V so that everything is correct.
SDL_LockYUVOverlay(bmp);
AVPicture pict;
pict.data[0] = bmp->pixels[0];
pict.data[1] = bmp->pixels[2]; // it's because YV12
pict.data[2] = bmp->pixels[1];
pict.linesize[0] = bmp->pitches[0];
pict.linesize[1] = bmp->pitches[2];
pict.linesize[2] = bmp->pitches[1];
sws_scale(img_convert_context,
frame->data, frame->linesize,
0, codec_context->height,
pict.data, pict.linesize);
SDL_UnlockYUVOverlay(bmp);
We display the image from the overlay in the window.
SDL_Rect rect;
rect.x = 0;
rect.y = 0;
rect.w = codec_context->width;
rect.h = codec_context->height;
SDL_DisplayYUVOverlay(bmp, &rect);
After processing the package, you need to free the memory that it occupies. This is done by the av_free_packet function .
}
}
// Free the packet that was allocated by av_read_frame
av_free_packet(&packet);
So that the OS does not consider our application to hang, and also to complete the application when closing the window, we process one SDL event at the end of the cycle.
// Handling SDL events there
SDL_Event event;
if (SDL_PollEvent(&event)) {
if (event.type == SDL_QUIT) {
break;
}
}
}
Well, now the standard procedure for cleaning all used resources.
sws_freeContext(img_convert_context);
// Free the YUV frame
av_free(frame);
// Close the codec
avcodec_close(codec_context);
// Close the video file
avformat_close_input(&format_context);
// Quit SDL
SDL_Quit();
return 0;
}
We pass to assembly. The simplest option using gcc looks something like this:
gcc player.c -o player -lavutil -lavformat -lavcodec -lswscale -lz -lbz2 `sdl-config --cflags --libs`
We start. And what do we see? Video is playing at tremendous speed! To be precise, playback occurs at the speed of reading and decoding frames from a file. Really. We did not write a single line of code to control the frame rate. And this topic is already for another article. There are many things that can be improved in this code. For example, add sound playback, or make the reading and display of the file in other streams. If the Habrasociety is interested, I will tell about it in the following articles.
The whole source code.
Thank you all for your attention!
Continuation: Finalization of the ffmpeg video player