How to transfer data between microcontrollers at 100 Mbps

    I got this kind of problem: I need to transfer data between two STM32F407 microcontrollers at least at a speed of 100 Mbps. It would be possible to use Ethernet (MAC-to-MAC), but the trouble is it’s busy, it is from this data that they are taken ...
    From the idle peripherals there is perhaps SPI - but it is only 42 Mbps.

    Oddly enough, nothing ready was found on the network. And I decided to implement a parallel 8-bit clock register. And what - the frequency can be set to 10 MHz (that is, of course, the clock itself is twice as fast, but 20 MHz is not something complicated) - so with such a low frequency you won’t have to worry about the wiring of the board. And the speed will be 100 Mbps.

    No sooner said than done. In general, the system looks like this. We use a timer on the transmitting side, one of the comparison signals is output to a pin - this will be a clock signal, and the second will be used to start one burst for DMA.

    I have a bus at 82 MHz (due to current consumption at a higher frequency :), a timer at the same frequency: so that with a period of ARR = 8 it turns out about 10 MHz (so it will be about 80 Mbps, well, okay).

    DMA will transfer one byte from memory (with auto-increment, of course) directly to the register output port - in my case, PORTE came up - its first 8 bits just fit as the address of the DMA receiver.

    On the receiving side, we will use the clock signal on both edges to clock the timer, with a period of 1, and we will use the update signal to start forwarding for the DMA, which reads data from the port (the PORTE port again approached) and writes to the memory with auto-increment.

    Now it remains to configure everything correctly (code below) and run. Termination on both sides is determined by the interrupt from the DMA.

    However, for completeness, of course you need to include checks for transmission delays and error handling in the code, but I omit this.

    In the code below, the TIM8 timer uses the CC2 channel to output the signal - to see what happens.

    volatile int transmit_done;
    volatile int receive_done;
    void DMA2_Stream1_IRQHandler(void) {
        TIM8->CR1 &= ~TIM_CR1_CEN;
        DMA2->LIFCR |= 0b1111 << 8;
        receive_done = 1;
    }
    void DMA2_Stream4_IRQHandler(void) {
        TIM1->CR1 &= ~TIM_CR1_CEN;
        TIM1->EGR |= TIM_EGR_BG;
        DMA2->HIFCR |= 0b1111101;
        transmit_done = 1;
    }
    void ii_receive(uint8_t *data, int len) {
        GPIOE->MODER = (GPIOE->MODER & 0xFFFF0000) | 0x0000;
        DMA2_Stream1->PAR = (uint32_t) &(GPIOE->IDR);
        DMA2_Stream1->M0AR = (uint32_t) data;
        DMA2_Stream1->NDTR = len;
        TIM8->CNT = 0;
        TIM8->BDTR |= TIM_BDTR_MOE;
        receive_done = 0;
        DMA2_Stream1->CR |= DMA_SxCR_EN;
        TIM8->CR1 |= TIM_CR1_CEN;
    }
    void ii_transmit(uint8_t *data, int len) {
        GPIOE->MODER = (GPIOE->MODER & 0xFFFF0000) | 0x5555;
        DMA2_Stream4->PAR = (uint32_t) &(GPIOE->ODR);
        DMA2_Stream4->M0AR = (uint32_t) data;
        DMA2_Stream4->NDTR = len;
        TIM1->CNT = 6;
        transmit_done = 0;
        DMA2_Stream4->CR |= DMA_SxCR_EN;
        TIM1->SR |= TIM_SR_BIF;
        TIM1->BDTR |= TIM_BDTR_MOE;
        TIM1->CR1 |= TIM_CR1_CEN;
    }
    // tx: TIM1 CH4 on DMA2/stream4/channel6, CH1 on output clock in PE9
    // rx: TIM8 CH2 on DMA2/stream3/channel7, CH1 on input clock in PC6
    void ii_init() {
        __HAL_RCC_GPIOC_CLK_ENABLE();
        __HAL_RCC_GPIOE_CLK_ENABLE();
        __HAL_RCC_TIM1_CLK_ENABLE();
        __HAL_RCC_TIM8_CLK_ENABLE();
        __HAL_RCC_TIM2_CLK_ENABLE();
        __HAL_RCC_DMA2_CLK_ENABLE();
        GPIOC->MODER |= (0b10 << GPIO_MODER_MODE6_Pos)
                | (0b10 << GPIO_MODER_MODE7_Pos);
        GPIOC->PUPDR |= (0b10 << GPIO_PUPDR_PUPD7_Pos);
        GPIOC->AFR[0] |= (GPIO_AF3_TIM8 << 24) | (GPIO_AF3_TIM8 << 28);
        GPIOE->MODER |= (0b10 << GPIO_MODER_MODE9_Pos);
        GPIOE->OSPEEDR |= GPIO_OSPEEDER_OSPEEDR9 | 0xFFFF;
        GPIOE->AFR[1] |= GPIO_AF1_TIM1 << 4;
        GPIOE->PUPDR |= (0b10 << GPIO_PUPDR_PUPD9_Pos);
        TIM1->ARR = 8;
        TIM1->CCR1 = 5;
        TIM1->CCR4 = 1;
        TIM1->EGR |= TIM_EGR_CC4G;
        TIM1->DIER |= TIM_DIER_CC4DE;
        TIM1->CCMR1 |= (0b110 << TIM_CCMR1_OC1M_Pos);
        TIM1->CCER |= TIM_CCER_CC1E;
        TIM1->EGR |= TIM_EGR_BG;
        TIM8->ARR = 1;
        TIM8->CCR2 = 1;
        TIM8->EGR |= TIM_EGR_UG;
        TIM8->DIER |= TIM_DIER_UDE;
        TIM8->SMCR |= (0b100 << TIM_SMCR_TS_Pos) | (0b111 << TIM_SMCR_SMS_Pos);
        TIM8->CCMR1 = (0b01 << TIM_CCMR1_CC1S_Pos) | (0b110 << TIM_CCMR1_OC2M_Pos);
        TIM8->CCER |= (0b11 << TIM_CCER_CC1P_Pos) | TIM_CCER_CC2E;
        DMA2_Stream1->CR = DMA_CHANNEL_7 | DMA_PRIORITY_VERY_HIGH | DMA_MINC_ENABLE
                | (0b00 << DMA_SxCR_DIR_Pos) | DMA_SxCR_TCIE | DMA_SxCR_TEIE
                | DMA_SxCR_DMEIE;
        DMA2_Stream1->FCR |= DMA_FIFOMODE_ENABLE;
        DMA2_Stream4->CR = DMA_CHANNEL_6 | DMA_PRIORITY_VERY_HIGH | DMA_MINC_ENABLE
                | (0b01 << DMA_SxCR_DIR_Pos) | DMA_SxCR_TCIE | DMA_SxCR_TEIE
                | DMA_SxCR_DMEIE;
        DMA2_Stream4->FCR |= DMA_FIFOMODE_ENABLE;
        HAL_NVIC_SetPriority(DMA2_Stream1_IRQn, 0, 0);
        HAL_NVIC_EnableIRQ(DMA2_Stream1_IRQn);
        HAL_NVIC_SetPriority(DMA2_Stream4_IRQn, 0, 0);
        HAL_NVIC_EnableIRQ(DMA2_Stream4_IRQn);
    }
    

    The same board was used for the tests, just the PE9 clock output was connected to the PC6 input. The main loop looked like this:

     ii_receive(rdata, 256);
     ii_transmit(tdata, 256);
     while (!transmit_done);
     while (!receive_done);
    

    According to the results: the data was perfectly sent for 30-31 microseconds without loss. The signals look something like this:


    here, white is the output of the TIM8 timer, red is the clock signal (TIM1), and orange is the least significant bit of data (0-1-0-1 -...).

    What I don’t like about this is that you can’t start DMA from interruption from the GPIO input, you have to work with timers. Maybe someone will tell you another way?

    PS As a result of fresh experiments, it turned out that raising the frequency to 168 MHz naturally increased the speed by 2 times and the data was transmitted in 14 microseconds (i.e. 150 Mbps), but when the master timer was reduced below 7, the receiving side began to glitch - the timer doesn’t have time TIM8. At 7 it still works, but at 6 it is already gone, and after all it would be 200 Mbps ...

    Also popular now: