Capture analog video using STM32F4-DISCOVERY

    image
    In this article I will talk about how you can capture analog black and white video signals using the STM32F4-DISCOVERY board, and about the features of transferring it to a computer using USB.

    Transferring images to a computer via USB


    Using the STM32F4-DISCOVERY board, you can create various USB devices - the USB peripheral module in the used microcontroller has great functionality. But examples of unusual designs with its use on the network are few - in most cases, USB is used to implement the classes HID (emulation of keyboards, mice and joysticks) and CDC (emulation of the COM port). The built-in USB host is usually used to connect USB flash drives.

    I wanted to make some unusual USB device, for example, a web-camera. You can implement it in two ways - write your own USB device class, and a driver for it, or, which is much simpler, use the standard USB USB device class for USB video devices class) Drivers for such devices are even integrated into Windows XP. The basic UVC description can be found in this document (I used UVC version 1.0, although there is a newer 1.1).
    There are very few examples of implementing UVC on a microcontroller on the Internet. The correct compilation of device descriptors is quite complex (descriptors describe all of its functionality). Even a small error in the descriptor can lead to the fact that the device appears to be detected, or even to the BSOD. You can copy the descriptors from your existing webcam, but they can be unnecessarily complicated - cameras often contain a microphone, allow you to capture a single image (Still image capture in UVC terminology), and allow you to change a large number of camera settings. It’s easy to get confused in all this, so I wanted to make the project as simple as possible.
    After a long search, I accidentally stumbled upon such a Chinese project. This is Tetris for STM32F103, and a computer is used to display the picture, which sees the controller as a UVC device. The project even implements MJPEG encoding. The project is quite interesting, but the code there is incredibly confusing, with an almost complete absence of comments. I took the descriptors from it, and tweaked them a bit to fit my requirements.

    When compiling descriptors, among other things, you need to specify the parameters of the transmitted image. I settled on an image size of 320x240 pixels and an NV12 image format. The UVC standard allows you to deliver only two formats of uncompressed images: NV12 and YUY2.
    The second format is more common, but NV12 is more suitable for encoding black-and-white images and takes up less space. In this format, data is encoded according to the YUV 4: 2: 0 type (four bytes of color information fall on four pixels). First comes the information about the brightness of the entire image (320 * 240 bytes in my case), then the color information (bytes U and V alternate): The

    image

    total image will occupy (320 * 240 * 3/2) bytes. This format has a drawback - not all programs can work with it. Guaranteed to work with this format freeware ContaCam, Skype also worked fine.
    In order to upload test images to the controller, a special converter was written that produces .h files with encoded image data. In addition to NV12, the converter can encode images in YUY2 format.
    A detailed description of how to properly configure descriptors and transmit the data stream in case of uncompressed images can be found in a separate document: " Universal Serial Bus Device Class Definition for Video Devices: Uncompressed Payload "

    As a basic project, I took my USB microphone project. It also implemented the transfer of data to a computer through an isochronous endpoint. Work with USB is implemented using the library from the controller manufacturer (STSW-STM32046). After replacing the descriptors, VID / PID (as I understand it, you can install any), the controller was detected as an image processing device. The next stage is the transfer of a video information stream to the computer (for starters, a test image stored in the controller's memory).

    Preliminarily, it is worth mentioning the various USB requests (Request) that need to be processed. When the controller receives a request from the computer (host) for some types of requests, the USB library calls the usbd_video_Setup function, which should process the request.
    Most of this function is taken from the microphone code - this is the processing of Standard Device Requests. Here you can pay attention to the switching between alternative interfaces that occurs when a SET_INTERFACE request is received. A UVC device must provide at least two alternative interfaces, one of which (Zero Bandwidth, goes under 0 number), the computer switches the USB device when it is not needed, thereby limiting the data flow on the bus. When any program on the computer is ready to receive data from the device, it sends a request to it to switch to another alternative interface, after which the device starts receiving IN Token Packets from the host, signaling that the host is waiting for data to be transmitted.
    There is another type of request - Class Device Requests, specific for this class - UVC. They are used to obtain data from the camera about its condition and control its operation. But even in the simplest implementation, when no camera parameters can be changed, the program must process requests: GET_CUR, GET_DEF, GET_MIN, GET_MAX, SET_CUR. All of them are transmitted before turning on the camera from the computer. According to the UVC specification, the computer asks the camera for the modes in which it can operate, and then transmits an indication in which mode the camera should work. And there are two types of such requests: Probe and Commit. In my case, this data is not used in any way, but if the request is not processed (do not pick up the sent data or do not respond), then the program on the computer will “freeze” and the controller will need to restart.

    In the process of creating the project, it was discovered that the USB library sometimes does not process data transfer requests to the host correctly - after transferring a small amount of data, data transfer stops, and you can only restart it by restarting the computer. This applies to the transmission of video information (through 1 endpoint), and control information (through 0 endpoint). This is fixed by pre-clearing the FIFO of the desired endpoint before writing to it.

    After all the necessary requests have been transferred, and the computer has sent a request to switch the alternative interface to the main mode, you can start transmitting video data. The computer starts issuing every millisecond to the IN Token Packet bus, upon receipt of which the controller calls the usbd_video_DataIn function, from which you need to call the library data transfer function DCD_EP_Tx.
    Video data is transmitted in packets, at the beginning of each packet there should be a header with a length of 2 bytes (the UVC specification supports the use of longer headers with additional information). The first byte of the header is always 2 - this is the total length of the header. The second byte allows the host to detect the beginning of the frame and their change - the first bit of this byte must be switched in the first packet of the new frame. In subsequent packets of this frame, the value of this bit shall remain the same. The remaining bits can be left equal to zero. The remainder of the package is video data. Their length in the package can be arbitrary (but not more than a certain size).
    I specially selected the length of the video data in the packet so that the size of the image in bytes was divided into it without a remainder - so all packets are the same length.

    It turns out this result:

    image

    And what about the performance?
    The controller supports the USB Full Speed ​​standard, which gives a theoretical speed of 12 Mbps. Thus, the maximum you can count on is the frame transmission time will be (320 * 240 * 3/2) / (12 * 10 ^ 6/8) = 76 ms, which gives 13 FPS. However, USB is a half-duplex protocol, and the microcontroller has its limitations. The controller transmits data via USB using FIFO, moreover, the controller has 1250 bytes of this memory, and it needs to be divided between all control points. The memory allocation is indicated in the "usb_conf.h" file, and the sizes are indicated in 32-bit words.

     #define RX_FIFO_FS_SIZE                          64
     #define TX0_FIFO_FS_SIZE                         16
     #define TX1_FIFO_FS_SIZE                         232
     #define TX2_FIFO_FS_SIZE                         0
     #define TX3_FIFO_FS_SIZE                         0
    

    For FIFO receiving commands from the computer, at least 64 words must be allocated; for FIFO, the transfer of control information to the computer through 0 endpoint requires another 16 words. Everything else can be allocated to the first endpoint for video transmission. The total is (64 + 16 + 232) * 4 = 1248 bytes. Since there is a limit of 232 words (928 bytes), the packet size (VIDEO_PACKET_SIZE) was set to (768 + 2) bytes. Thus, one frame consists of (320 * 240 * 3/2) / (768) = 150 packets that will be transmitted 150 * 1ms, which gives 6.6 FPS.
    The real result coincides with the calculated one:

    image

    Not very much, but when transferring an uncompressed image with the same size, you can’t get more. Therefore, I decided to try to compress the image on the microcontroller.

    Transition to MJPEG


    The UVC standard supports various types of compression, one of which is MJPEG . In this type of compression, each source image frame is compressed according to the JPEG standard. The resulting compressed frame can be sent to the computer as described above. The descriptor and data transfer features for MJPEG are described in the document " Universal Serial Bus Device Class Definition for Video Devices: Motion-JPEG Payload ".

    Transferring a static image prepared on a computer turned out to be quite simple - we convert a regular JPEG file to an .h file, add it to the project, transfer it in batches, as before. Since the size of the compressed image can be arbitrary, the length of the last data packet is also variable, so it needs to be calculated.
    With a compressed image size of 30,000 bytes, it will consist of (30000/768) => 40 packets that will be transmitted 40ms, which corresponds to 25 FPS.
    For compression in JPEG, I decided to use the encoder taken here . It is adapted for ARM, and is designed only for a black and white image, which suited me, so I was going to take data from a black and white camera.
    This encoder started working on STM32F4 right away; I didn’t do any adaptation for Cortrx-M4. The test bmp file was compressed in 25ms, which corresponds to 40 FPS. In order to read the compressed image from the controller, I used the STM32 ST-LINK Utility program. First, during debugging of the program, you need to find out the starting address of the array into which the compressed image will be placed, and then specify it in this program. A read dump can be saved immediately as .jpg.
    Next, I added the ability to work with two output arrays to the encoder for double buffering, and combined it with a USB data output project.
    CCM Memory Usage Feature
    In the controller used, RAM is divided into several blocks. One of them - (64 Kbytes) is called CCM, and it cannot be accessed through DMA. I decided to put here two arrays for storing the compressed image.
    In order to use this memory, in the IAR you need to edit the .icf file used, adding the lines to it:
    define symbol __ICFEDIT_region_RAMCCM_start__ = 0x10000000;
    define symbol __ICFEDIT_region_RAMCCM_end__   = 0x1000FFFF;
    .......
    define region CCMRAM_region   = mem:[from __ICFEDIT_region_RAMCCM_start__   to __ICFEDIT_region_RAMCCM_end__];
    .......
    place in CCMRAM_region {section .ccmram};
    


    Arrays in the code must be declared like this:
    #pragma location = ".ccmram"
    uint8_t outbytes0[32000];
    #pragma location = ".ccmram"
    uint8_t outbytes1[32000];
    


    The resulting design worked, however, only in the ContaCam program and in the browser (checked here ). On a static image, we managed to get 35 FPS.
    Example of a compressed image (image size 17 Kbytes):

    image

    The image is upside down, since the information in bmp files is stored just like that.

    But other programs either did not work at all, or gave such an image:

    image

    This is due to the fact that the UVC standard does not support the transfer of black and white images using MJPEG.
    The requirements for JPEG image are:
    • Color encoding - YCbCr
    • Bits per pixel - 8 per color component (before filtering / subsampling)
    • Subsampling - 422

    Thus, it was necessary to redo the existing encoder to form pseudo-color images - in fact, only brightness data (Y) is encoded in such an image, and zeros are transmitted instead of color data (Cb and Cr). I had to get acquainted with the structure of the JPEG format deeper.

    Transition from black and white to pseudo color


    How the encoder worked before:
    1. A JPEG file header is generated.
    2. Block (8x8 pixels) processing of the original image.
    2.1 Each block is read from memory, its discrete cosine transform (DCT) is performed
    2.2 The resulting 64 values ​​are quantized and the result is packed using Huffman codes.
    3. A data end marker is generated and the size of the compressed image is calculated.
    You can read more about JPEG here and here .
    The color information in the compressed image is stored in the JPEG header, so it needs to be changed. It is necessary to change the SOF0 and SOS sections, indicating the use of three components in them, for the luminance component thinning 22, for color 11. Everywhere I specified 0. As an identifier for the quantization tables,
    now you can change the information encoding methodology. Since the color information is decimated, then two color information blocks must correspond to two color information blocks. Thus, at first four blocks of brightness information are coded sequentially, after which it is necessary to encode two more blocks of color information (an example from the above article ):

    image

    In the library used, quantization, final compression of the processed data, and writing them to memory are performed by a separate function, so for the formation of color information, it is enough to reset the array of DCT coefficients and call this function twice.
    However, there is an important feature in JPEG coding - it is not the DC coefficients at the beginning of each block that are encoded, but the difference between the current DC coefficient and the DC coefficient of the previous block of the corresponding component . In the library, this difference was initially calculated before quantization, so we had to modify the above function so that during processing of the Cr and Cb channels the difference was not calculated - zeros go in these components as well.
    As a result, the picture began to display correctly in all used video capture programs. The disadvantage of this pseudo-color coding is that its speed has somewhat decreased. Compression of the test image began to take 35 ms, which gives 28 FPS.

    Analog video capture


    Now that there is a way to transfer video data to a computer at an acceptable speed, you can also tackle the video signal. From the very beginning of experiments with USB, I intended to implement the capture of a video signal from an analog video camera using the debugging board itself.
    Since I previously made a home-made TV on a microcontroller , the method of capturing a black and white video signal was not new to me. Of course, the STM32F4 controller is very different from ATxmega, so the approach to video capture had to be changed.

    The PAL format itself has already been repeatedly described on various resources, so I will focus on its main points, and only for the black-and-white version.
    The frame frequency of this format is 25 Hz, but it uses interlaced scanning - that is, when transmitting a frame, even and then odd lines are transmitted first. Each such set of rows is called a field. Fields in this format come with a frequency of 50 Hz (20 ms). In one field, 312.5 lines are transmitted (of which only 288.5 contain video information). All lines are separated by clock pulses that follow with a period of 64 μs. The video signal in the line itself takes 52 μs.
    Fields are separated by personnel and equalizing clock pulses. An important feature of equalizing clock pulses is that their period is two times less than the row period — 32 μs, so that they can be easily distinguished from clock pulses.

    image

    Thus, in order to capture an image in the controller’s memory, you need to write a program that can detect clock signals, extract equalizing pulses from them, and start the conversion of the ADC to before starting the video transmission of each line.

    Now we should elaborate on the method of digitizing the video signal.
    The STM32F4 controller has three separate ADCs, each of which can operate at a speed of 2.4 MSPS with a resolution of 12 bits. With a decrease in bit depth, the speed increases, but this is not enough to obtain a resolution of 320 * 240. However, the controller allows you to combine several ADCs - you can configure capture by all ADCs at the same time, or you can set the capture delay between ADCs, as a result of which the overall capture speed increases.

    What will be the capture speed when using two ADCs at once (Interleaved dual mode)?
    For clocking the ADC, the APB2 bus is used, the clock frequency of which, when the controller is initialized, is set to half the system frequency (168 MHz / 2) = 84 MHz. For the ADC, this is too much, so when setting up the ADC, you have to set the pre-selector to 2. The resulting frequency of 42 MHz is still higher than the maximum acceptable for the datasheet (36 MHz), but my ADC works well even at this frequency.
    If each ADC with the set bit depth of 8 bits would work separately, then the maximum conversion speed would be (42 MHz / (3 + 8)) = 3.81 MSPS. By setting a delay between the data capture time of 6 cycles, you can get a speed of 7 MSPS, and at 7 cycles - 6 MSPS.
    I have chosen the last option. It turns out that the entire line (64 μs) takes 384 bytes, and the active part of the line with the video signal (52 μs) takes 312 bytes (pixels).
    The ADC transmits the results of the conversion to memory using DMA. When using two 8-bit ADCs, the data is transferred to the memory in the form of 16-bit words at the time of completion of the conversion of the second ADC. In principle, it would be possible to capture the contents of almost the entire frame into memory - for this you need (384 * 240) = 92.16 KB. But I took a different path - data capture starts after the controller detects a synchronization pulse, and stops after capturing 366 bytes (183 DMA transfers). Why such a number is chosen - I will tell further. As a result, video data occupies (366 * 240) = 87.84 KB of RAM.

    Consider the method of detecting a clock signal. Ideally, it is better to detect it with a special chip, or at least a comparator, but this complicates the design. Since I still have one unused ADC, I decided to use it to detect clock signals. Each ADC includes a special module “Analog watchdog”. It can generate an interrupt if the digitized value is outside the specified limits. However, this module is not able to respond to a change in the edge of the digitized signal — it will generate an interrupt until the input signal or module settings change. Since I needed to detect signal fronts, I had to reconfigure this module each time it was interrupted. I did not implement in the program automatic detection of thresholds for Analog watchdog,

    In order to detect equalizing pulses, one of the controller timers is used, operating at a frequency of 1 MHz. The timer runs continuously, and in the Analog watchdog interrupt handler (when a leading edge of the clock is detected), its current value is read and compared with the previous one. Thus, it is possible to distinguish horizontal sync pulses from equalizing ones. After the equalizing clock pulses have finished, the controller skips 17 horizontal clock pulses, and when a leading edge of the clock pulse is detected, it starts capturing video data of the current line. Since the input to the interrupt handler in this controller can occur in variable time, and also because the ADC 3 runs slower than the first two together, the time between the clock front and the start of capture may differ, which leads to “jitter” of the lines. That is why the capture of video data begins with the leading edge of the clock, and the line takes 366 bytes - so part of the clock falls into the frame, and it can be removed programmatically for each line.
    The waveform shows how the video signal is being captured (during DMA operation, the “yellow” channel is set to 1):

    image

    Capture starts only after the video data appears:

    image

    Not all lines in one field are captured, since a limit of 240 lines is set.

    image

    The result is this unprocessed image (obtained using ST-Link Utility):

    image

    After the image is captured in the controller’s memory, it must be processed - for each line, remove the offset associated with the capture of the clock signal and subtract the level value from the pixel brightness values black. I did not try to optimize this piece of code, so it takes 5 ms to complete.

    After the image is processed, it can be encoded in JPEG, while the transfer of the already encoded data of the previous image via USB begins.
    Thus, a frame is captured for 20 ms, processing takes 5 ms, and encoding with data transfer is 35 ms, which gives a total of 60 ms, or a frame rate of 16.6 FPS. As a result, it turns out that one frame (actually a field) is captured, and two are skipped. Since the PAL scan is interlaced, it turns out that an even or an odd field is alternately captured, which leads to an image jitter by one pixel. You can get rid of this by adding an additional delay between capture frames - then another field will be skipped, and the output frame rate will drop to (50/4) = 12.5 FPS.

    A bit about the video source


    Initially, I planned to use the KPC-190S video surveillance camera as a signal source (this camera is almost 15 years old). It cannot be said that it provides good image quality - it is quite noisy, with not very high contrast, and small amplitude (it can be seen from the oscillogram that it is close to 1 V). For a slight adjustment of the output signal, the camera is connected to the controller through a resistor divider on a variable resistor. The only signal output of the board to which the camera is connected is PC2 (all ADCs are connected to it).
    Appearance of a design:



    Since the camera did not provide good image quality, I decided to try to take the signal from the Canon A710 camera. It has an analog output for connecting to a TV, which displays everything that is displayed on the camera screen. The signal quality of the camera is better, but the video signal is color. In order to remove the color component from the signal, I used this filter:

    image

    In addition, the clock pulses at the camera output have a negative polarity, so in order for them to be detected by the ADC of the controller, we had to add an additional bias voltage to the signal using an adjustable power source . I also had to slightly change the response thresholds of the Analog watchdog.
    Appearance of a design with a connected camera:



    An example of the image received from the camera:

    image

    Video of the operation of the device:



    Project source codes: github.com/iliasam/STM32F4_UVC_Camera

    Also popular now: