How to make context switch on STM32

    Good afternoon!

    Streams ... Switching contexts ... The basic essence of the OS. And of course, when developing libraries and applications, we always rely on the fact that the implementation of threads is error-free. Therefore, it was unexpected to find a gross error in switching flows for STM32 to Embox RTOS , when the network, the file system and many third-party libraries had been working for a long time. And we even managed to brag about our achievements on Habré .

    I would like to talk about how we did thread switching for Cortex-M, and tested it on STM32. In addition, I will try to talk about how this is done in other OSs - NuttX and FreeRTOS.

    Well, first, a few words about how the problem was discovered. At that moment, I was collecting another craft - a robot with different sensors. At some point, I wanted to control two stepper motors, while each was controlled from a separate stream (the streams are absolutely identical). The result - until one motor finishes the rotation, the second one does not even start.

    I sat down for debugging. It turned out that all interrupts were simply disabled in the threads! You say, how could something work then? Everything is simple - there are many where there are sleep(), mutex_lock()and other “wait”, and due to them, the flows naturally switched. The problem was obviously related to context switching for STM32F4, on which I found it.

    Let's analyze the problem in more detail. Switching contexts of flows occurs, including by timer, that is, by interrupt. Schematically, interrupt handling in Embox can be represented as follows:

    void irq_handler(pt_regs_t *regs) {
            ...
        int irq = get_irq_number(regs);
        {
            ipl_enable();
            irq_dispatch(irq);
            ipl_disable();
        }
        irqctrl_eoi(irq);
            ...
        critical_dispatch_pending();
    }
    

    The whole point is that the interrupt handler is called first irq_dispatch, after that the interrupt processing “ends”, and the context switches to another thread if the scheduler requires it internally critical_dispatch_pending. And here it is very important that the state of the processor in this thread should be the same as before it was interrupted, including the permission or prohibition of interruptions. Bit interrupt is responsible for resolving interrupts xPSR, which is pushed onto the stack by the processor itself when it enters an interrupt; when exiting interrupts, it gets from the stack. The problem is that since we have preemptive multitasking, we may, having entered interrupt on one thread, want to exit on the stack of another thread, in which of course there is no savedxPSR. Moreover, like most OSs, we have synchronization primitives, for example pthread_mutex_lock(), that can lead to a context switch not from an interrupt. In general, we began to doubt whether it is possible to organize preemptive multitasking on cortex-m, because this architecture is well optimized for small tasks. But stop! But how then do other OSes work?

    Interrupt Handling on Cortex-M


    Let's first understand how interrupt handling works on Cortex-M.


    The picture shows the stacks in two modes - with a floating point and without it. When an interrupt occurs, the processor saves the corresponding registers onto the stack, and LRputs one of the following values ​​listed in the table below into the register . That is, if the interrupt is nested, then there will be 0xFFFFFFF1.



    Next, the OS interrupt handler is called, at the end of which “bx lr” is usually executed (recall that 0xFFFFFFXX is in LR). After that, automatically saved registers are restored, and program execution continues.

    Now let's see how context switching occurs in different OSs.

    FreeRTOS


    Let's start with FreeRTOS . To do this, take a look at portable/GCC/ARM_CM4F/port.c. Below is the function code xPortSysTickHandler:

    xPortSysTickHandler
    void xPortSysTickHandler( void )
    {
        /* The SysTick runs at the lowest interrupt priority, so when this interrupt
        executes all interrupts must be unmasked.  There is therefore no need to
        save and then restore the interrupt mask value as its value is already
        known. */
        portDISABLE_INTERRUPTS();
        {
            /* Increment the RTOS tick. */
            if( xTaskIncrementTick() != pdFALSE )
            {
                /* A context switch is required.  Context switching is performed in
                the PendSV interrupt.  Pend the PendSV interrupt. */
                portNVIC_INT_CTRL_REG = portNVIC_PENDSVSET_BIT;
            }
        }
        portENABLE_INTERRUPTS();
    }
    


    This is a hardware timer handler. Here we see that if you need to do a context switch, a certain PendSV interrupt is triggered. As the documentation says , “PendSV is an interrupt-driven request for system-level service. In an OS environment, use PendSV for context switching when no other exception is active. ” Inside the interrupt handler xPortPendSVHandler, the context switch happens directly:

    xPortPendSVHandler
    void xPortPendSVHandler( void )
    {
        /* This is a naked function. */
        __asm volatile
        (
        "   mrs r0, psp                         \n"
        "   isb                                 \n"
        "                                       \n"
        "   ldr r3, pxCurrentTCBConst           \n" /* Get the location of the current TCB. */
        "   ldr r2, [r3]                        \n"
        "                                       \n"
        "   tst r14, #0x10                      \n" /* Is the task using the FPU context?  If so, push high vfp registers. */
        "   it eq                               \n"
        "   vstmdbeq r0!, {s16-s31}             \n"
        "                                       \n"
        "   stmdb r0!, {r4-r11, r14}            \n" /* Save the core registers. */
        "                                       \n"
        "   str r0, [r2]                        \n" /* Save the new top of stack into the first member of the TCB. */
        "                                       \n"
        "   stmdb sp!, {r3}                     \n"
        "   mov r0, %0                          \n"
        "   msr basepri, r0                     \n"
        "   dsb                                 \n"
        "   isb                                 \n"
        "   bl vTaskSwitchContext               \n"
        "   mov r0, #0                          \n"
        "   msr basepri, r0                     \n"
        "   ldmia sp!, {r3}                     \n"
        "                                       \n"
        "   ldr r1, [r3]                        \n" /* The first item in pxCurrentTCB is the task top of stack. */
        "   ldr r0, [r1]                        \n"
        "                                       \n"
        "   ldmia r0!, {r4-r11, r14}            \n" /* Pop the core registers. */
        "                                       \n"
        "   tst r14, #0x10                      \n" /* Is the task using the FPU context?  If so, pop the high vfp registers too. */
        "   it eq                               \n"
        "   vstmdbeq r0!, {s16-s31}             \n"
        "                                       \n"
        "   stmdb r0!, {r4-r11, r14}            \n" /* Save the core registers. */
        "                                       \n"
        "   str r0, [r2]                        \n" /* Save the new top of stack into the first member of the TCB. */
        "                                       \n"
        "   stmdb sp!, {r3}                     \n"
        "   mov r0, %0                          \n"
        "   msr basepri, r0                     \n"
        "   dsb                                 \n"
        "   isb                                 \n"
        "   bl vTaskSwitchContext               \n"
        "   mov r0, #0                          \n"
        "   msr basepri, r0                     \n"
        "   ldmia sp!, {r3}                     \n"
        "                                       \n"
        "   ldr r1, [r3]                        \n" /* The first item in pxCurrentTCB is the task top of stack. */
        "   ldr r0, [r1]                        \n"
        "                                       \n"
        "   ldmia r0!, {r4-r11, r14}            \n" /* Pop the core registers. */
        "                                       \n"
        "   tst r14, #0x10                      \n" /* Is the task using the FPU context?  If so, pop the high vfp registers too. */
        "   it eq                               \n"
        "   vldmiaeq r0!, {s16-s31}             \n"
        "                                       \n"
        "   msr psp, r0                         \n"
        "   isb                                 \n"
        "                                       \n"
        #ifdef WORKAROUND_PMU_CM001 /* XMC4000 specific errata workaround. */
            #if WORKAROUND_PMU_CM001 == 1
        "           push { r14 }                \n"
        "           pop { pc }                  \n"
            #endif
        #endif
        "                                       \n"
        "   bx r14                              \n"
        "                                       \n"
        "   .align 4                            \n"
        "pxCurrentTCBConst: .word pxCurrentTCB  \n"
        ::"i"(configMAX_SYSCALL_INTERRUPT_PRIORITY)
        );
    }
    


    But now let's imagine that we are switching to a new thread that will execute, say, a certain function fn. That is, if we simply put it in the PCaddress of the function fn, then we immediately get to the right place, but with the wrong context - we did not exit the interrupt! FreeRTOS offers the following solution. Let's initially initialize the created thread as if we were going out of interrupt - /* Simulate the stack frame as it would be created by a context switch interrupt. */. In this case, we first “honestly” exit the handler xPortPendSVHandler, that is, we find ourselves in the right context, and then, following the prepared stack, we get into fn. Below is the code for such a stream preparation:

    pxPortInitialiseStack
    StackType_t *pxPortInitialiseStack( StackType_t *pxTopOfStack, TaskFunction_t pxCode, void *pvParameters )
    {
        /* Simulate the stack frame as it would be created by a context switch
        interrupt. */
        /* Offset added to account for the way the MCU uses the stack on entry/exit
        of interrupts, and to ensure alignment. */
        pxTopOfStack--;
        *pxTopOfStack = portINITIAL_XPSR;   /* xPSR */
        pxTopOfStack--;
        *pxTopOfStack = ( ( StackType_t ) pxCode ) & portSTART_ADDRESS_MASK;    /* PC */
        pxTopOfStack--;
        *pxTopOfStack = ( StackType_t ) portTASK_RETURN_ADDRESS;    /* LR */
        /* Save code space by skipping register initialisation. */
        pxTopOfStack -= 5;  /* R12, R3, R2 and R1. */
        *pxTopOfStack = ( StackType_t ) pvParameters;   /* R0 */
        /* A save method is being used that requires each task to maintain its
        own exec return value. */
        pxTopOfStack--;
        *pxTopOfStack = portINITIAL_EXEC_RETURN;
        pxTopOfStack -= 8;  /* R11, R10, R9, R8, R7, R6, R5 and R4. */
        return pxTopOfStack;
    }
    


    So, that was one of the ways suggested by FreeRTOS.

    Nuttx


    Let us now look at another method proposed by NuttX . This is another relative known OS for various small pieces of iron.

    The main part of interrupt handling occurs inside the function up_doirq, it is essentially a second-level interrupt handler, called from assembler code. It decides whether to switch to another thread. This function will return the necessary context of the new thread.

    up_doirq
    uint32_t *up_doirq(int irq, uint32_t *regs)
    {
      board_autoled_on(LED_INIRQ);
    #ifdef CONFIG_SUPPRESS_INTERRUPTS
      PANIC();
    #else
      uint32_t *savestate;
      /* Nested interrupts are not supported in this implementation.  If you want
       * to implement nested interrupts, you would have to (1) change the way that
       * CURRENT_REGS is handled and (2) the design associated with
       * CONFIG_ARCH_INTERRUPTSTACK.  The savestate variable will not work for
       * that purpose as implemented here because only the outermost nested
       * interrupt can result in a context switch.
       */
      /* Current regs non-zero indicates that we are processing an interrupt;
       * CURRENT_REGS is also used to manage interrupt level context switches.
       */
      savestate    = (uint32_t *)CURRENT_REGS;
      CURRENT_REGS = regs;
      /* Acknowledge the interrupt */
      up_ack_irq(irq);
      /* Deliver the IRQ */
      irq_dispatch(irq, regs);
      /* If a context switch occurred while processing the interrupt then
       * CURRENT_REGS may have change value.  If we return any value different
       * from the input regs, then the lower level will know that a context
       * switch occurred during interrupt processing.
       */
      regs = (uint32_t *)CURRENT_REGS;
      /* Restore the previous value of CURRENT_REGS.  NULL would indicate that
       * we are no longer in an interrupt handler.  It will be non-NULL if we
       * are returning from a nested interrupt.
       */
      CURRENT_REGS = savestate;
    #endif
      board_autoled_off(LED_INIRQ);
      return regs;
    }
    


    After returning from the function, we again find ourselves in the first-level handler. And if you need to switch to a new thread, then we modify the registers automatically saved when entering the interrupt on the stack so that upon completion of the interrupt processing, get into the desired stream. The following is a snippet of code.

        bl      up_doirq                /* R0=IRQ, R1=register save (msp) */
        mov     r1, r4                  /* Recover R1=main stack pointer */
        /* On return from up_doirq, R0 will hold a pointer to register context
         * array to use for the interrupt return.  If that return value is the same
         * as current stack pointer, then things are relatively easy.
         */
        cmp     r0, r1                  /* Context switch? */
        beq     l2                      /* Branch if no context switch */
    	//Далее копируем регистры
    …
        /* We are returning with a pending context switch.  This case is different
         * because in this case, the register save structure does not lie in the
         * stack but, rather, within a TCB structure.  We'll have to copy some
         * values to the stack.
         */
        add     r1, r0, #SW_XCPT_SIZE   /* R1=Address of HW save area in reg array */
        ldmia   r1, {r4-r11}            /* Fetch eight registers in HW save area */
        ldr     r1, [r0, #(4*REG_SP)]   /* R1=Value of SP before interrupt */
        stmdb   r1!, {r4-r11}           /* Store eight registers in HW save area */
    #ifdef CONFIG_BUILD_PROTECTED
        ldmia   r0, {r2-r11,r14}        /* Recover R4-R11, r14 + 2 temp values */
    #else
        ldmia   r0, {r2-r11}            /* Recover R4-R11 + 2 temp values */
    #endif
    	…
    

    That is, in Nuttx (unlike FreeRTOS), register values ​​automatically stored on the stack are already modified. This is perhaps the main difference. In addition, you can see that they do very well without PendSV (although ARM recommends :)). Well, the last one - the context switching itself is delayed, it happens through the interrupt stack, and not by the principle - “they kept the old values ​​and immediately loaded the new ones into the registers”.

    Embox


    Finally, about how this is done in Embox. The main idea is to add some additional function (let's call it __irq_trampoline) in which to switch contexts already in “normal mode”, and not in interrupt handling mode, and after that, really exit the interrupt handler. That is, in other words, we tried to completely preserve the logic described at the beginning of the article:

    void irq_handler(pt_regs_t *regs) {
              ...
        int irq = get_irq_number(regs);
        {
              ipl_enable();
              irq_dispatch(irq);
              ipl_disable();
        }
        irqctrl_eoi(irq); // Только тут теперь будет небольшая хитрость, а не прямой вызов
              ...
    }
    

    To begin with, I’ll give a picture that shows the whole picture. And then I will explain in parts what is what.



    How it's done? The idea is as follows. The interrupt handler is first executed as usual, as on other platforms. But when we exit the handler, we actually modify the stack and exit to a completely different place - to __pending_handle! This happens as if the interrupt really happened at the input of the function __pending_handle. Below is the code that modifies the stack to exit at __pending_handle. I tried to write comments in especially important places in Russian.

    // Регистры сохраняемые процессором при входе в прерывание
    struct cpu_saved_ctx {
        uint32_t r[5];
        uint32_t lr;
        uint32_t pc;
        uint32_t psr;
    };
    void interrupt_handle(struct context *regs) {
        uint32_t source;
        struct irq_saved_state state;
        struct cpu_saved_ctx *ctx;
        ... // Тут обычная обработка прерывания, пропустим
        state.sp = regs->sp;
        state.lr = regs->lr;
        assert(!interrupted_from_fpu_mode(state.lr));
        ctx = (struct cpu_saved_ctx*) state.sp;
        memcpy(&state.ctx, ctx, sizeof *ctx);
        // Ниже показано то как мы модифицируем стек
        /* It does not matter what value of psr is, just set up sime correct value.
         * This value is only used to go further, after return from interrupt_handle.
         * 0x01000000 is a default value of psr and (ctx->psr & 0xFF) is irq number if any. */
        ctx->psr = 0x01000000 | (ctx->psr & 0xFF);
        ctx->r[0] = (uint32_t) &state; // we want pass the state to __pending_handle()
        ctx->r[1] = (uint32_t) regs; // we want pass the registers to __pending_handle()
        ctx->lr = (uint32_t) __pending_handle;
        ctx->pc = ctx->lr;
        /* Now return from interrupt context into __pending_handle */
        __irq_trampoline(state.sp, state.lr);
    }
    

    We also give the function code __irq_trampoline. The comments on the function indicate a read from SP, but in order not to overload the article, I skip this. The main thing is “bx r1” at the end of the function. Let me remind you that in the register r1 is the second argument of the function __irq_trampoline. If you look at the code above, we will see a call “ __irq_trampoline(state.sp, state.lr)”, which means that in the register r1 is the value of state.lr, which is equal to the value 0xFFFFFXX (see. The first section)

    __irq_trampoline
    .global __irq_trampoline
    __irq_trampoline:
        cpsid  i
        # r0 contains SP stored on interrupt handler entry. So we keep some data
        # behind SP for a while, but interrupts are disabled by 'cpsid i'
        mov    sp,  r0
        # Return from interrupt handling to usual mode
        bx     r1
    


    In short, after exiting a function, __irq_trampolinewe unwind on the stack, exit the interrupt, and fall into __pending_handle. In this function, we do all the remaining operations (such as context switch). At the same time, when exiting this function, we need to return to the stack the initially saved values ​​of the registers, after which we again enter the interrupt and exit it, but in the original place! For this, the next thing is done. We first prepare the stack, then initiate the PendSV interrupt, and then find ourselves in the handler __pendsv_handle. And then in the usual way, honestly, we exit the handler, but already along the original old stack. The function code __pending_handleand is __pendsv_handlegiven below:

    __pending_handle and __pendsv_handle
    .global __pending_handle
    __pending_handle:
        // Тут выгружаем на стек “старый” контекст, чтобы выйти из прерывания
        // уже по-честному, то есть туда где нас изначально прервали.
        # Push initial saved context (state.ctx) on top of the stack
        add    r0, #32
        ldmdb  r0, {r4 - r11}
        push   {r4 - r11}
        // Тут восстанавливаем некоторые регистры. Но это не очень значимая деталь,
        // Для понимания эта деталь не важна, пропустим.
        ...
        cpsie  i
        // Вот тут переключаем контексты, если требуется
        bl     critical_dispatch_pending
        cpsid  i
        # Generate PendSV interrupt
        // Тут инициируем прерывание PendSV, обработчик приведен ниже
        bl     nvic_set_pendsv
        cpsie  i
        # DO NOT RETURN
    1: b       1
    .global __pendsv_handle
    __pendsv_handle:
        # 32 == sizeof (struct cpu_saved_ctx)
        add    sp, #32
        # Return to the place we were interrupted at,
        # i.e. before interrupt_handle_enter
        bx     r14
    


    In conclusion, I’ll say a few phrases about the considered versions of the implementation of context_switch. Each of the considered methods is working, has its own advantages and disadvantages. The FreeRTOS option is not very suitable for us, since this OS is aimed primarily at microcontrollers, which entails a certain “hard-coded” context_switch for a specific chip. And in our OS we are trying to offer even microcontrollers to use the principles of a “big” OS, with all that it implies ... NuttX has approximately the same approach, and maybe we can either implement a similar approach or improve ours using the idea of ​​modifying the stack. But at the moment, our version is quite coping with its tasks, which can be seen if you take the code from the repository .

    Also popular now: