Validation of addresses in the memory on the Cortex-M0 / M3 / M4 / M7

Hi, Habr!

Regarding the recent relaxation of the regime , disturbances in the comments of one neighboring post that the articles about microcontrollers are completely blinking by the LED, as well as the untimely death of my standard blog, I’m still lazy to restore, I will transfer here a little unfortunate little about Press trick in working with the cores of Cortex-M - checking arbitrary addresses for validity.

One of the very useful, and for some reason, finished features of the never described capabilities on Cortex-M microcontrollers (all) is the ability to verify the correctness of the address in memory. With its help, you can determine the size of the flash, RAM and EEPROM, determine the presence of specific peripherals and registers on a specific processor, beat down fallen processes while maintaining the overall operating system of the OS, etc.

In normal mode, if a non-existent address on Cortex-M3 / M4 / M7 is called, a BusFault exception is called, and in the absence of its handler, it is escalated to HardFault. There are no “detailed” exceptions (MemFault, BusFault, UsageFault) on Cortex-M0, and any failures are immediately escalated to HardFault.

In general, HardFault cannot be ignored - it can be a consequence of a hardware failure, for example, and further device behavior will become unpredictable. But in the particular case, this can and should be done.

Cortex-M3 and Cortex-M4: unimplemented BusFault

On Cortex-M3 and above, checking the validity of an address is quite simple - all exceptions (except, obviously, nonmaskable) should be disabled via the FAULTMASK register, specifically the BusFault processing is disabled, and then poked into the address being checked and see if the BFARVALID flag in the BFAR register has risen i.e. Bus Fault Address Register. If you soared - you just had a BusFault, i.e. address is incorrect.

The code looks like this, all defaults and functions from the standard (non-vendor) CMSIS, so it should work on any M3, M4 or M7:

boolcpu_check_address(volatileconstchar *address){
    /* Cortex-M3, Cortex-M4, Cortex-M4F, Cortex-M7 are supported */staticconstuint32_t BFARVALID_MASK = (0x80 << SCB_CFSR_BUSFAULTSR_Pos);
    bool is_valid = true;
    /* Clear BFARVALID flag by writing 1 to it */
    SCB->CFSR |= BFARVALID_MASK;
    /* Ignore BusFault by enabling BFHFNMIGN and disabling interrupts */uint32_t mask = __get_FAULTMASK();
    __disable_fault_irq();
    SCB->CCR |= SCB_CCR_BFHFNMIGN_Msk;
    /* probe address in question */
    *address;
    /* Check BFARVALID flag */if ((SCB->CFSR & BFARVALID_MASK) != 0)
    {
        /* Bus Fault occured reading the address */
        is_valid = false;
    }
    /* Reenable BusFault by clearing  BFHFNMIGN */
    SCB->CCR &= ~SCB_CCR_BFHFNMIGN_Msk;
    __set_FAULTMASK(mask);
    return is_valid;
}

Cortex-M0 and Cortex-M0 +

With Cortex-M0 and Cortex-M0 + it’s getting harder, as I said above, they don’t have BusFault and all relevant registers, and exceptions immediately escalate to HardFault. Therefore, there is only one way out - to make it so that the HardFault handler can understand that the exception was caused intentionally, and to return back to the function that called it, passing some flag indicating that HardFault was there.

This is done purely in assembler. In the example below, register R5 is set to 1, and two “magic numbers” are written to registers R1 and R2. If, after trying to load the value at the checked address, HardFault happens, then it should check the values of R1 and R2, and if the necessary numbers are found in them, set R5 to zero. In the sich code, the value of R5 is passed through a special variable rigidly tied to this register, in the assembler the address being checked is in an implicit form, we just know that in arm-none-eabi the first parameter of the function is put in R0.

boolcpu_check_address(volatileconstchar *address){
    /* Cortex-M0 doesn't have BusFault so we need to catch HardFault */
    (void)address;
    /* R5 will be set to 0 by HardFault handler *//* to indicate HardFault has occured */registeruint32_t result __asm("r5");
    __asm__ volatile(
        "ldr  r5, =1            \n"/* set default R5 value */"ldr  r1, =0xDEADF00D   \n"/* set magic number     */"ldr  r2, =0xCAFEBABE   \n"/* 2nd magic to be sure */"ldrb r3, [r0]          \n"/* probe address        */
    );
    return result;
}

The HardFault handler code in its simplest form looks like this:

__attribute__((naked)) voidhard_fault_default(void){
    /* Get stack pointer where exception stack frame lies */
    __asm__ volatile(
        /* decide if we need MSP or PSP stack */"movs r0, #4                        \n"/* r0 = 0x4                   */"mov r2, lr                         \n"/* r2 = lr                    */"tst r2, r0                         \n"/* if(lr & 0x4)               */"bne use_psp                        \n"/* {                          */"mrs r0, msp                        \n"/*   r0 = msp                 */"b out                              \n"/* }                          */" use_psp:                          \n"/* else {                     */"mrs r0, psp                        \n"/*   r0 = psp                 */" out:                              \n"/* }                          *//* catch intended HardFaults on Cortex-M0 to probe memory addresses */"ldr     r1, [r0, #0x04]            \n"/* read R1 from the stack        */"ldr     r2, =0xDEADF00D            \n"/* magic number to be found      */"cmp     r1, r2                     \n"/* compare with the magic number */"bne     regular_handler            \n"/* no magic -> handle as usual   */"ldr     r1, [r0, #0x08]            \n"/* read R2 from the stack        */"ldr     r2, =0xCAFEBABE            \n"/* 2nd magic number to be found  */"cmp     r1, r2                     \n"/* compare with 2nd magic number */"bne     regular_handler            \n"/* no magic -> handle as usual   */"ldr     r1, [r0, #0x18]            \n"/* read PC from the stack        */"add     r1, r1, #2                 \n"/* move to the next instruction  */"str     r1, [r0, #0x18]            \n"/* modify PC in the stack        */"ldr     r5, =0                     \n"/* set R5 to indicate HardFault  */"bx      lr                         \n"/* exit the exception handler    */" regular_handler:                  \n"/* here comes the rest of the fucking owl */
    )

When the exception handler goes to the handler, Cortex drops the registers, which are guaranteed to be corrupted by the handler (R0-R3, R12, LR, PC ...), onto the stack. The first fragment - it already exists in most of the ready-made HardFault handlers, besides those written under pure bare metal - determines which stack it is: when working in the OS, it can be either MSP or PSP, and they have different addresses. In bare metal projects, a stack of MSP (Main Stack Pointer) is usually set a priori, without verification - for the PSP (Process Stack Pointer) cannot be there due to the lack of processes.

Having determined the required stack and putting its address in R0, we read R1 values (offset 0x04) and R2 (offset 0x08) from it, compare it with magic words, if both match, read PC value (offset 0x18) from the stack, add 2 to it (2 bytes - the size of the instructions on the Cortex-M *) and save back to the stack. If this is not done, when returning from the handler, we will find ourselves on the same instruction that actually caused the exception, and we will always run in a circle. Adding 2 moves us to the next instruction at the time of return.

* Upd.In the comments there was a question about the size of the instructions on the Cortex-M, I’ll take out the correct answer here: in this case crash is caused by the LDRB instruction, which is present in the ARMv7-M architecture in two versions - 16-bit and 32-bit. The second option will be selected if at least one of the following conditions is met:

The author clearly indicated the instruction LDRB.W instead of LDRB (we do not)
registers above R7 are used (we have R0 and R3)
indicated offset greater than 31 bytes (we have no offset)

In all other cases (i.e., when the operands match the format of the 16-bit version of the instruction), the assembler must choose the 16-bit version.

Therefore, in our case there will always be a 2-byte instruction that needs to be stepped over, but if you edit the code strongly, options are possible.

Next, write 0 to R5, which serves as an indicator of getting into HardFault. Registers after R3 before special registers are not saved in the stack and are not restored when they exit the handler, so it is on our conscience to spoil them or not to spoil them. In this case, R5 from 1 to 0, we change purposefully.

Returning from an interrupt handler is done strictly in one way. When entering the handler, a special value is written to the LR register under the name EXC_RETURN, which to exit the handler must be written to the PC - and not just write, but do it with a POP or BX command (that is, “mov pc, lr”, for example, does not work , although for the first time you may think that it works). BX LR looks like an attempt to go to a meaningless address (in LR there will be something like 0xFFFFFFF1, which has nothing to do with the real address of the procedure we need to return to), but in reality the processor, seeing this value in the PC (where it will go automatically), he will restore the registers from the stack and continue to perform our procedure - with the following procedure after the HardFault, due to the fact that we increased the PC in this stack by 2.

Read about all the offsets and commands can be understood where , of course.

Well, or if the magic numbers are not visible, then everything will go to regular_handler, followed by the usual HardFault processing procedure - as a rule, this is a function that prints register values to the console, decides what to do next with the processor, etc.

Determining the size of RAM

Using all of this is simple and straightforward. We want to write a firmware that runs on several microcontrollers with different amounts of RAM, while each time using RAM in full?

Yes Easy:

static uint32_t cpu_find_memory_size(char *base, uint32_t block, uint32_t maxsize){
    char *address = base;
    do {
        address += block;
        if (!cpu_check_address(address)) {
            break;
        }
    } while ((uint32_t)(address - base) < maxsize);
    return (uint32_t)(address - base);
}
uint32_t get_cpu_ram_size(void) {
    return cpu_find_memory_size((char *)SRAM_BASE, 4096, 80*1024);
}

maxsize is needed here, so that at the maximum possible amount of RAM between it and the next block of addresses there may be no gap at which cpu_check_address will break. In this example, it is 80 KB. It also makes no sense to probe all addresses - it is enough to look at the minimum possible step between the two models of the controller by datasheet and put it as a block.

The program transition to the bootloader, located unknown where

Sometimes you can do more intricate tricks - for example, imagine that you want to programmatically jump onto a regular factory STM32 bootloader to switch to UART or USB firmware update mode without bothering to write your bootloader.

Bootloader from STM32 lies in an area called System Memory, on which it is necessary to go, but there is one problem - in this area different addresses not that different series of processors, and on different models of the same series (with an epic sign can be found in the AN2606 on pages 22 to 26). When introducing the corresponding functionality into the platform in general, and not just into a specific product, I want versatility.

In the CMSIS files, the start address of the System Memory is also missing. It is not possible to determine it by the Bootloader ID, since This is a chicken and egg problem - the bootloader ID lies in the last System Memory byte, which brings us back to the question of address.

However, if we look at the STM32 memory card, we will see something like this:

We are in this case interested in the System Memory environment - for example, there is a programmable area once (there is not in all STM32) and Option bytes (in all). This structure is observed not only in different models, but in different STM32 lines, with a difference only in the presence of OTP and the presence of a gap in the addresses between the system memory and options.

But for us in this case, the most important thing is that the address of the beginning of Option Bytes is in the regular CMSIS headers - it is called OB_BASE there.

Further simple. We write the search function for the first valid or invalid address up or down from the specified one:

char *cpu_find_next_valid_address(char *start, char *stop, bool valid){
    char *address = start;
    while (true) {       
        if (address == stop) {
            returnNULL;
        }
        if (cpu_check_address(address) == valid) {
            return address;
        }
        if (stop > start) {
            address++;
        } else {
            address--;
        }
    };
    returnNULL;
}

And we are looking down from Option bytes, first the end of either the system memory, or the OTP adjacent to it, and then the beginning of the system memory - in two passes:

/* System memory is the valid area next _below_ Option bytes */char *a, *b, *c;
a = (char *)(OB_BASE - 1);
b = 0;
/* Here we have System memory top address */
c = cpu_find_next_valid_address(a, b, true);
/* Here we have System memory bottom address */
c = cpu_find_next_valid_address(c, b, false) + 1;

And without much difficulty we arrange this into a function that finds the beginning of the system memory and jumps on it, that is, the bootloader starts:

staticvoidjump_to_bootloader(void) __attribute__((noreturn));
/* Sets up and jumps to the bootloader */staticvoidjump_to_bootloader(void){
    /* System memory is the valid area next _below_ Option bytes */char *a, *b, *c;
    a = (char *)(OB_BASE - 1);
    b = 0;
    /* Here we have System memory top address */
    c = cpu_find_next_valid_address(a, b, true);
    /* Here we have System memory bottom address */
    c = cpu_find_next_valid_address(c, b, false) + 1;
    if (!c) {
        NVIC_SystemReset();
    }
    uint32_t boot_addr = (uint32_t)c;
    uint32_t boot_stack_ptr = *(uint32_t*)(boot_addr);
    uint32_t dfu_reset_addr = *(uint32_t*)(boot_addr+4);
    void (*dfu_bootloader)(void) = (void (*))(dfu_reset_addr);
    /* Reset the stack pointer */
    __set_MSP(boot_stack_ptr);
    dfu_bootloader();
    while (1);
}

It depends on the specific processor model ... yes, nothing depends. The logic will not work on models that have a hole between OTP and system memory - but I didn’t check if there are any. Will actively work with OTP - check.

Other tricks relate only to the usual procedure for calling a bootloader from your code - do not forget to reset the stack pointer and call the exit procedure in the bootloader before initializing the processor peripherals, clock frequencies, etc .: due to its minimalism, the bootloader can score on initialize the periphery and expect it to be in the default state. A good option to call a bootloader from an arbitrary location of your program is to write to the RTC Backup Register or simply to a known address in the memory of the magic number, program reboot and check the initial stages of initialization of this number.

PS Since all the addresses in the processor's memory card are aligned in the worst case to 4, the procedure described above will speed up the idea of stepping over them in 4-byte steps instead of one.

Important note

NB: note that on a specific controller the validity of a specific address does not necessarily indicate the actual presence of a functional that can be located at this address. For example, the address of the register controlling some optional peripheral unit may be valid, although the unit itself is absent in this model. From the manufacturer’s side, the most interesting dirty tricks are possible, usually rooted in the use of the same crystals for different processor models. However, in most cases, these procedures work and are very useful.

Tags: