CaptureManager SDK
This article presents my “Hobby” project - CaptureManager for the Windows desktop platform. This project is a simple set of functionality (SDK) to include support for a wide range of video and audio sources in the application under development.
CaptureManager is based on the Microsoft Media Foundation, a new generation of media technology that has replaced the outdated DirectShow. The Microsoft Media Foundation was first included in Windows Vista and received support for video and audio sources starting with Windows 7. The advantage of the Microsoft Media Foundation is the new model of the media processing pipeline that is optimal for multiprocessor systems, and its continued development and support from Microsoft.
In the CaptureManager project I wanted to solve a number of problems that I encountered while writing applications using the Microsoft Media Foundation:
I believe, and I think, many will agree that the above problems are significant and it would be desirable to resolve them. This was the reason for the start of the CaptureManager project (as well as the task of capturing video from two webcams and recording this video in one media file).
In short, what is a CaptureManager :
CaptureManager functionality is presented in demo programs available on GitHub - CaptureManager-SDK-Demos :
More information about the project can be found on the CaptureManager SDK website . On NuGet, there is a C # CaptureManager shell .
CaptureManager is based on the Microsoft Media Foundation, a new generation of media technology that has replaced the outdated DirectShow. The Microsoft Media Foundation was first included in Windows Vista and received support for video and audio sources starting with Windows 7. The advantage of the Microsoft Media Foundation is the new model of the media processing pipeline that is optimal for multiprocessor systems, and its continued development and support from Microsoft.
In the CaptureManager project I wanted to solve a number of problems that I encountered while writing applications using the Microsoft Media Foundation:
- Implementation of COM functionality. Strange as it may sound, with Microsoft Media Foundation technology, Microsoft has stepped back from its same application model - from COM. Of course, all the class interfaces in the Microsoft Media Foundation are all also derived from IUnknown and are associated with a GUID. But the classes themselves are created through direct “C” function calls from statically linked system libraries. This is different from the DirectShow implementation, which requires a call to CoCreateInstanceand circulation through COM abstraction. In my opinion, this decision by Microsoft is a drawback - firstly, it is difficult to integrate Microsoft Media Foundation into projects written not in C / C ++, for example, C # projects, which, by the way, on Windows interact with COM objects almost seamlessly generating the required interface definitions from the TLB. Secondly, there is an increased risk of losing application compatibility with the next version of Windows when a function is migrated from one statically linked library to another - with Microsoft Media Foundation this has happened once: Library Changes in Windows 7 - “Starting in Windows 7, certain Media Foundation functions are exported from different DLL files than previous versions. ".
- In my opinion, the Microsoft Media Foundation is overloaded with features and interfaces - it would be nice to hide most of them behind an additional level of abstraction to optimize the task of capturing and recording video and audio data.
- A significant drawback, in my opinion, is the limitation in the support of recording video and audio in the Microsoft Media Foundation. The Microsoft Media Foundation provides two mechanisms for working with media: through the graph topology and SourceReader-SinkWriter . The first involves the assembly of the required configuration from the converter nodes and allows you to flexibly configure the desired configuration. The second offers to receive portions of media from SourceReader and send them to SinkWriterin the context of the application being developed. The graph topology is very convenient, in my opinion, and makes it easy to generate the required record configuration at the request of the user. However, this solution from Microsoft does not allow solving the recording task - the fact is that the object for creating a recording work session based on a topology with the IMFMediaSession interface from the MFCreateMediaSession function is optimized for playing media data, and does not perform a number of required operations - for example, at the end recording to a file, you need to perform metrics calculation - calculate the average stream speed and calculate the playback duration - but IMFMediaSession from the MFCreateMediaSession functionit does not - for the playback task, the operation of calculating metrics is meaningless. There is also a problem with timing - IMFMediaSession from the MFCreateMediaSession function considers the start of playback from zero time - this is logical when playing a media file. However, video and audio sources such as webcams or microphones use the current system time - according to the documentation of the Microsoft Media Foundation, they should be initialized to zero time, but they do not fulfill this requirement.
I believe, and I think, many will agree that the above problems are significant and it would be desirable to resolve them. This was the reason for the start of the CaptureManager project (as well as the task of capturing video from two webcams and recording this video in one media file).
In short, what is a CaptureManager :
- Full-fledged COM In-Process Server - or as it is sometimes called - ActiveX . It includes TLB and can be integrated into projects in C ++, C #, Python along with DirectShow.
- CaptureManager is associated with Microsoft Media Foundation libraries, but uses “delayed linking” - Microsoft Media Foundation libraries are loaded in the CaptureManager code and linked to the corresponding functions at runtime. If it is not possible to find the function in libraries, it is replaced by a cork function that returns an error code - E_NOTIMPL . Thus, CaptureManager can reduce the risk of a crash of the target application in a situation of migration of functions from one Microsoft Media Foundation library to another.
- CaptureManager has a simplified set of interfaces. An important feature is the generation of data describing media sources, codecs and media containers in XML document format - it is much easier to process an XML document than numerous Variant and PropVariant , especially on high-level APIs like WPF.
- CaptureManager includes a number of video and audio sources that are not in the original Microsoft Media Foundation: Screen Capture - for capturing images from a display (or multiple displays), AudioLoop Capture - for capturing audio from an audio output, DirectShow-Crossbar Capture - for capturing video from video capture cards.
- CaptureManager includes a "frame battery", which allows you to get a series of extreme frames.
- CaptureManager includes its own implementation of the IMFMediaSession interface, optimized for the recording task - i.e. implemented a complete refusal to call the MFCreateMediaSession function.
- CaptureManager includes functionality for changing the parameters of the video processor of the webcam and camera parameters (focus, exposure, etc.).
CaptureManager functionality is presented in demo programs available on GitHub - CaptureManager-SDK-Demos :
- CPPDemos:
- EVRWebCapViewerViaCOMServer is a simple C ++ application for demonstrating the functionality of viewing video sources through the CaptureManager renderer.
- OpenGLWebCamViewerViaCOMServer is a simple C ++ application for demonstrating the functionality of viewing video sources through the OpenGL renderer.
- TextInjectorDemo is a simple C ++ application for demonstrating the functionality of mixing a test with a video stream from a camera.
- WaterMarkInjectorDemo is a simple C ++ application for demonstrating the functionality of mixing images with a video stream from a camera.
- EVRVieweingAndRecording is a simple C ++ application for demonstrating the functionality of recording from video and audio sources into one media file.
- NativeMediaFoundationPlayer is a simple C ++ application to demonstrate the playback of multiple video files in a common renderer.
- CSharpDemos:
- WPFMultiSourceRecorder is a simple C # application for demonstrating the functionality of recording from one, two or more video and audio sources into one common media file.
- WPFMediaFoundationPlayer is a simple C # application for demonstrating the playback of multiple video files in a common renderer.
- WPFVideoAndAudioRecorder is a simple C # application for demonstrating the functionality of recording from video and audio sources into one media file.
- WPFIPCameraMJPEGMultiSourceViewer is a simple C # application for demonstrating the functionality of capturing video from several Internet cameras and playing them in a common renderer.
- WPFMultiSourceViewer is a simple C # application for demonstrating the functionality of capturing video from several several and playing them in a common renderer.
- WPFViewerEVRDisplay is a simple C # application to demonstrate the functionality of integrating CaptureManager renderer into a WPF application.
- WPFIPCameraMJPEGViewer is a simple C # application to demonstrate the functionality of capturing video from an Internet camera.
- WPFImageViewer is a simple C # application to demonstrate the functionality of capturing images from a file.
- WindowsFormsDemo is a simple C # application for demonstrating the functionality of viewing and recording video sources.
- WPFWebCamSerialShots is a simple C # application for demonstrating the "frame battery" functionality.
- WPFWebCamShot is a simple C # application for demonstrating the functionality of capturing frames from a video source.
- WPFRecorder is a simple C # application for demonstrating the functionality of viewing and recording video sources.
- WPFWebViewerEVR is a simple C # application for demonstrating the functionality of viewing video sources through the CaptureManager renderer.
- WPFWebViewerCallback is a simple C # application for demonstrating the functionality of capturing frames from a video source through copying from the CaptureManager stream .
- WPFWebViewerCall is a simple C # application for demonstrating the functionality of capturing frames from a video source through a direct call to the CaptureManager methods .
- WPFSourceInfoViewer is a simple C # application to demonstrate the functionality of obtaining information about available video and audio sources.
- WPFMultiSourceRecorder is a simple C # application for demonstrating the functionality of recording from one, two or more video and audio sources into one common media file.
- PythonDemos:
- CaptureManagerSDKPythonDemo is a simple Python application to demonstrate the functionality of viewing and recording video sources.
- CaptureManagerSDKPythonDemo is a simple Python application to demonstrate the functionality of viewing and recording video sources.
- QtMinGWDemos:
- CaptureManagerSDKQtMinGWDemo is a simple C ++ Qt application to demonstrate the functionality of viewing and recording video sources.
- CaptureManagerSDKQtMinGWDemo is a simple C ++ Qt application to demonstrate the functionality of viewing and recording video sources.
- UnityDemos:
- UnityWebCamViewer is a simple application to demonstrate the functionality of working with a video source in Unity3D.
- UnityWebCamViewer is a simple application to demonstrate the functionality of working with a video source in Unity3D.
More information about the project can be found on the CaptureManager SDK website . On NuGet, there is a C # CaptureManager shell .