WS2812B protocol on STM32 without empty loops or interrupts. And how to make the right rainbow



    There are already a couple of articles on Habré about working with WS2112B RGB LEDs, but for some reason they all use a rather archaic way to form a bit sequence. The method consists in forming accurate time intervals using empty program cycles. Perhaps this is the cost of using Arduino, but of course, we have long switched to ARM Cortex-M4 in the face of STM32 and can afford to make it more beautiful.

    So, let me remind you of the “protocol" WS2112B.



    The LED strip on the WS2112B has only one digital input - DIN, connected to the first LED on the strip. A special pulse sequence encoding bits is supplied to it, as shown in the figure. Each LED has one digital output - DOUT connected to the DIN input of the next LED in the strip. Each LED needs to transmit 24 bits (8 bits per color: red R, green G and blue B). Thus, in order to ignite all the LEDs, it is necessary to transmit 24 * N bits, where N is the number of LEDs in the strip.

    Having received the bits, the LEDs light up and burn statically until they receive a new bit sequence. Each bit sequence starts with setting the DIN to the log. zero for a time of at least 50 μs.

    As you can see, the bits are encoded with sufficiently short pulses with tight tolerances. A microcontroller trying to shape them with software delays is required to at least prohibit all interrupts so that a reset or a bad bit is not accidentally generated. CPU time resources are also wasted here, in order to light 100 LEDs, the processor needs to work out 3 ms. If you update the state of the LEDs with a frequency of 100 Hz, then such a “protocol” will take 30% of the processor time.

    There are suggestions for using the SPI interface to transmit the bitstream to the WS2112B. But here the obstacle may be the insufficient correspondence of the clock frequency of the system bus and the strong errors of the pulse durations.

    Meanwhile, in STM32 and in general, all Cortex-M chips have an excellent mechanism for direct memory access (DMA). Bits can be generated using timers in pulse width modulation mode, and each subsequent bit can be extracted from RAM using DMA.

    The figure below shows the interaction of DMA and TIM4 timer in the STM32F407VET6 chip. Debugging was carried out on my industrial controller with just such a chip, but with the same success everything can be repeated on any chip of the STM32 family. In this case, it was pin 8 of the GPIOB that was free for me, which I took advantage of.



    The following is the initialization text for the timer and controller:
    #define     BIT(n) (1u << n)
    #define     LSHIFT(v,n) (((unsigned int)(v) << n))
    #define LEDS_NUM    80
    #define COLRS       3
    INT16U DMA_buf[LEDS_NUM+2][COLRS][8];
    /*------------------------------------------------------------------------------
      Timer4 генерирует импульсы на светодиодную полосу
      Тактирование таймера идет от PCLK1 72 MHz
      Канал 3 таймера используется в режиме Compare с загрузкой по DMA регистра CCR3 для формирования битовых сигналов
     ------------------------------------------------------------------------------*/
    void Timer4_init(void)
    {
      TIM_TypeDef *tim = TIM4;
      RCC_TypeDef *rcc = RCC;
      rcc->APB1RSTR |= BIT(2);    // Сброс таймера 4
      rcc->APB1RSTR &= ~BIT(2);   
      rcc->APB1ENR |= BIT(2);     // Разрешаем тактирование таймера 4
      tim->CR1 = BIT(7);          //  1: TIMx_ARR register is buffered.
      tim->CR2 = 0;               
      tim->PSC = 0;               // Предделитель генерирует частоту 72 МГц
      tim->ARR = 90 - 1;          // Перегрузка таймера каждые 1.25 мкс
      tim->CCMR2 = 0
                   + LSHIFT(6, 4) // OC3M: Output compare 3 mode | 110: PWM mode 1 - In upcounting, channel 1 is active as long as TIMx_CNTCNT = 0;
      tim->CCR3 = 0;
      tim->DIER = BIT(11);        // Bit 11 CC3DE: Capture/Compare 3 DMA request enable. Разрешаем запросы DMA
      tim->CR1 |= BIT(0);         // Запускаем таймер
      tim->CCER = BIT(8);         // Разрешаем работы выхода, чтобы возникали сигналы для DMA
    }
    /*------------------------------------------------------------------------------
      Инициализация канала 2 DMA1 Stream 7
      Используется для пересылки шаблоной битов потока управления светодиодной лентой на WS2812B в таймер TMR4 работающий в режиме генерации PWM 
     ------------------------------------------------------------------------------*/
    void DMA1_Stream7_Mem_to_TMR4_init(void)
    {
      DMA_Stream_TypeDef *dma_ch = DMA1_Stream7;
      RCC_TypeDef *rcc = RCC;
      rcc->AHB1ENR |= BIT(21);               // Разрешаем DMA1
      dma_ch->CR = 0;    // Выключаем стрим
      dma_ch->PAR = (unsigned int)&(TIM4->CCR3) + 1;  // Назначаем адрес регистра данных ADC
      dma_ch->M0AR = (unsigned long)&DMA_buf;
      dma_ch->NDTR = (LEDS_NUM + 2) * COLRS * 8;
      dma_ch->CR =
                   LSHIFT(2, 25) + // CHSEL[2:0]: Channel selection |    010: channel 2 selected
                   LSHIFT(0, 23) + // MBURST: Memory burst transfer configuration | 00: single transfer
                   LSHIFT(0, 21) + // PBURST[1:0]: Peripheral burst transfer configuration | 00: single transfer
                   LSHIFT(0, 19) + // CT: Current target (only in double buffer mode) | 0: The current target memory is Memory 0 (addressed by the DMA_SxM0AR pointer)
                   LSHIFT(0, 18) + // DBM: Double buffer mode | 0: No buffer switching at the end of transfer
                   LSHIFT(3, 16) + // PL[1:0]: Priority level | 11: Very high.  PL[1:0]: Priority level
                   LSHIFT(0, 15) + // PINCOS: Peripheral increment offset size | 0: The offset size for the peripheral address calculation is linked to the PSIZE
                   LSHIFT(1, 13) + // MSIZE[1:0]: Memory data size | 00: 8-bit. Memory data size
                   LSHIFT(1, 11) + // PSIZE[1:0]: Peripheral data size | 00: 8-bit. Peripheral data size
                   LSHIFT(1, 10) + // MINC: Memory increment mode | 1: Memory address pointer is incremented after each data transfer (increment is done according to MSIZE)
                   LSHIFT(0, 9) +  // PINC: Peripheral increment mode | 0: Peripheral address pointer is fixed
                   LSHIFT(1, 8) +  // CIRC: Circular mode | 1: Circular mode enabled
                   LSHIFT(1, 6) +  // DIR[1:0]: Data transfer direction | 01: Memory-to-peripheral
                   LSHIFT(0, 5) +  // PFCTRL: Peripheral flow controller | 1: The peripheral is the flow controller
                   LSHIFT(1, 4) +  // TCIE: Transfer complete interrupt enable | 1: TC interrupt enabled
                   LSHIFT(0, 3) +  // HTIE: Half transfer interrupt enable | 0: HT interrupt disabled
                   LSHIFT(0, 2) +  // TEIE: Transfer error interrupt enable | 0 : TE interrupt disabled
                   LSHIFT(0, 1) +  // DMEIE: Direct mode error interrupt enable | 0: Direct mode error interrupt disabled
                   LSHIFT(0, 0) +  // EN: Stream enable | 1: Stream enabled
                   0;
      dma_ch->FCR =
                    LSHIFT(0, 7) + // FEIE: FIFO error interrupt enable
                    LSHIFT(1, 2) + // DMDIS: Direct mode disable | 1: Direct mode disabled. Разрешаем чтобы была возможность пересылки из байт в двухбайтовый регистр 
                    LSHIFT(1, 0) + // FTH[1:0]: FIFO threshold selection | 01: 1/2 full FIFO
                    0;
      dma_ch->CR |= BIT(0); //  1: Stream enabled
    }
    




    After this initialization, the automatic transfer of the bit stream from the DMA_buf array located in RAM to the external output of 8 GPIOB begins. A 50 microsecond reset pause is automatically generated. The processor does not participate in forwarding in any way, even interrupts are not used. To ignite any LED, you just need to write the corresponding word into the DMA_buf array at the corresponding offset. This is done in the project by the LEDstrip_set_led_state function.

    This is not to say that this mechanism does not affect the processor at all. His work is somewhat slowed down. Because it shares with DMA shared access to RAM and the system bus. But measurements showed that this slowdown in this case does not exceed 0.2%

    To write the project, the development environment MDK-ARM Professional Version: 4.72.1.0 was used. The processor frequency is 144 MHz, the PCLK1 frequency is 72 MHz. Easily ported to the STM32 MCU Discovery Kits series boards. The whole project is posted here.

    The project did not use libraries from ST or any other third-party libraries. The project is very compact, everything is written through direct access to registers, this makes the text shorter, clearer and makes it easier to transfer to other development environments.

    And about the rainbow


    The fact is that simply by linearly incrementing bytes in a word in RGB color format (bit representation - 00000000 RRRRRRRR GGGGGGGG BBBBBBBBB) it is impossible to depict a beautiful rainbow on an LED strip with a hundred LEDs. It is even more difficult for this rainbow to adjust the brightness, making simple manipulations on a 32-bit word with RGB information. For such manipulations use the HSV format. For example, an entire rainbow will be represented by simply linearly incrementing the H component. Then convert HSV to RGB and output to LEDs.
    There are two HSV to RGB converters in the project, one integer and the other using floating point calculations. Visually, I did not see any differences. Yes, unfortunately, the STM32 has nowhere to be distinguished.

    Also popular now: