How ARM loads

    My last topic was completely theoretical, this one will be practical. The practice will be quite hardcore (I myself dealt with this issue only after a year of working with ARMs) - initialization of the processor and memory. In other words: what needs to be done with the processor to get into the function main(). The first part of the article is devoted to assembly and debugging tools. The second is for handling exception vectors, the third is for initializing stacks and memory.
    But first, I want to make one clarification. For some reason, many people think that ARM is necessarily a monster with external memory, a bunch of bindings, operating at a frequency of at least 600Mhz, etc. This is only partly true (if we talk about ARM9 and later families). The chip that I usually work with (AT91SAM7X512) is not much more complicated than the familiar AVRs. He needs only quartz and food to work (it is possible without quartz, but then it will be completely sad). All. But of course he has more opportunities, many more than AVR. But more on that later. Today's article will not be tied to a specific hardware.

    Compilers, linkers, debuggers


    A question that worries so many. There are paid (IAR, Keil MDK, CrossWorks) and free (gcc-arm). I will use gcc-arm in the examples. For Windows there are assemblies WinARM (seems to be dead), YAGARTO. In principle, you can collect your own. There is still such a fun thing like coLinux, but that's a completely different story. Under Linux, the cross-compiler is usually built using the standard distribution tools. Read the docks in general :)
    There is still such a useful thing as a standard library. The one that implements functions like printf, mktime, mallocand everything else that C programmers are used to. Using glibc will not work, because it is too large. Instead, they usually use the free newlib. It is part of WinARM / YAGARTO, but Linux users will have to manually assemble it. Again - read the documentation :)
    With debuggers a little more complicated. You can use emulators, but they are pretty buggy when it comes to peripherals. Here I have no experience. You can use debug messages in the COM port. I have been doing this all my life. I have enough in 99% of cases.
    But the coolest thing is JTAG. A device that connects to the processor and allows debiting the code directly in the stone (setting breakpoints, tracing, viewing / changing memory, etc.). True, it costs money, on the one hand, on the other - on the board it will be necessary to raise legs under it.

    Exception handlers


    Okay, we will assume that the compiler is installed and configured. Let's run something now. Let's start from the very beginning: what happens when the processor is reset (for example, after the power is turned on and the voltage has settled). Everything is simple here: the processor starts to execute the program from address 0x0. It would seem that you can place an initialization code from this address and work for yourself. But not so simple. The fact is that in the starting addresses the vectors of exception handlers are stored.
    For example, if an interrupt occurs, the processor will start processing it from address 0x18, and the exception “unknown instruction” will be processed from address 0x04. In general, the first 28 bytes are reserved for the table of exception handlers (reset is also an exception).
    arm exception vectors
    The figure shows this more clearly. From the figure, it can be seen that 4 bytes are allocated for each processor, or one processor instruction. (In ARM mode. All handlers are called in this mode of instructions.)
    Accordingly, the first thing we should do is write exception handlers and place them correctly. Let 's do this : What does this code do? These are commands to load the address of real handlers into the register . Such an unconditional transition. Further along the code are the variables storing these same addresses: Here it was possible to apply several tricks that accelerate the processing of interrupts. For example, as you can see, the FIQ handler is the last in the list, so that the processing of this interrupt could be started right on the spot.
    ldr pc, ResetHandlerAddr
    ldr pc, UndefHandlerAddr
    ldr pc, SWIHandlerAddr
    ldr pc, PrefetchAbtHandlerAddr
    ldr pc, DataAbtHandlerAddr
    nop
    ldr pc, IRQHandlerAddr
    ldr pc, FIQHandlerAddr

    pc

    ResetHandlerAddr: .word ResetHandler
    UndefHandlerAddr: .word UndefHandler
    SWIHandlerAddr: .word SWIHandler
    PrefetchAbtHandlerAddr: .word PrefetchAbtHandler
    DataAbtHandlerAddr: .word DataAbtHandler
    IRQHandlerAddr: .word IRQHandler
    FIQHandlerAddr: .word FIQHandler


    It was also possible to use AIC (advanced interrupt controller) registers to directly go to the interrupt handler. But until we complicate our lives. So far, only Reset processing is important to us.
    So let's write the handlers themselves as simple as possible. They will hang the processor (endlessly executing the command of unconditional transition to themselves). Anyway, we don’t know yet how to handle exceptions, so a dangling processor is perfectly acceptable. Is the Branch command. The next thing we need to do is set up the stack pointers for each of the operating modes. Thus, if exceptions occur, the handler will already have its own stack. Only at first we will describe the sizes of all stacks.
    UndefHandler: B UndefHandler
    SWIHandler: B SWIHandler
    PrefetchAbtHandler: B PrefetchAbtHandler
    DataAbtHandler: B DataAbtHandler
    IRQHandler: B IRQHandler
    FIQHandler: B FIQHandler

    B
    sp
    .EQU IRQ_STACK_SIZE, 0x100
    .EQU FIQ_STACK_SIZE, 0x100
    .EQU ABT_STACK_SIZE, 0x100
    .EQU UND_STACK_SIZE, 0x100
    .EQU SVC_STACK_SIZE, 0x100

    In order not to suffer for a long time, we allocate 256 bytes per stack for each mode. In fact, for most of these modes - this is a lot. Although it all depends on the handlers. As you can see, the sizes for 5 of 6 modes are described here. The remaining memory will be shared between the heap and the stack of the sixth (user mode) mode.
    Now we describe the constants to facilitate the transition to different modes. The current mode is the CPSR register. He also performs the role of the status register. The constants and are bits that prohibit simple and fast interrupts, respectively. Now we are ready to initialize the stacks. This is done simply: load the pointer to the top of the stack in the register, then go to the desired mode, write to the value , then decrease
    .EQU ARM_MODE_FIQ, 0x11
    .EQU ARM_MODE_IRQ, 0x12
    .EQU ARM_MODE_SVC, 0x13
    .EQU ARM_MODE_ABT, 0x17
    .EQU ARM_MODE_UND, 0x1B
    .EQU ARM_MODE_USR, 0x10

    .EQU I_BIT, 0x80
    .EQU F_BIT, 0x40

    I_BITF_BITr0spr0r0 on the size of the stack and repeat.
    .RAM_TOP:
    .word __TOP_STACK
    ResetHandler:
    ldr sp, .RAM_TOP

    msr CPSR_c, #ARM_MODE_FIQ | I_BIT | F_BIT
    mov sp, r0
    sub r0, r0, #FIQ_STACK_SIZE

    msr CPSR_c, #ARM_MODE_IRQ | I_BIT | F_BIT
    mov sp, r0
    sub r0, r0, #IRQ_STACK_SIZE

    msr CPSR_c, #ARM_MODE_SVC | I_BIT | F_BIT
    mov sp, r0
    sub r0, r0, #SVC_STACK_SIZE

    msr CPSR_c, #ARM_MODE_ABT | I_BIT | F_BIT
    mov sp, r0
    sub r0, r0, #ABT_STACK_SIZE

    msr CPSR_c, #ARM_MODE_UND | I_BIT | F_BIT
    mov sp, r0
    sub r0, r0, #UND_STACK_SIZE

    msr CPSR_c, #ARM_MODE_USR

    Memory initialization


    Now we are in unprivileged mode with interrupts turned on and the stack configured. By the way, getting out of this mode is simply impossible. Only by throwing an exception. But more on that in the next article. There is just a little bit left
    before going into the function main(). It is only necessary to transfer some data to RAM and reset the memory, which is located in the .BSS segment. This is the memory where global variables are stored. The fact is that the C language standard promises that global variables will be reset to zero at the beginning of the work, and ARM does not guarantee this to us. Therefore, we manually reset the segment:

                   MOV     R0, #0
                   LDR     R1, =__bss_start__
                   LDR     R2, =__bss_end__
    LoopZI:
                   CMP     R1, R2
                   STRLO   R0, [R1], #4
                   BLO     LoopZI
    

    Constants are __bss_end__ & __bss_start__kindly provided to us by the linker.
    By the way, here you can observe the use of conditional instructions (with the suffix O). They will be executed while R1! = R2.
    You also need to transfer pre-initialized variables (those that are int x=42) from ROM to RAM .
                   LDR     R1, =_etext
                   LDR     R2, =_data
                   LDR     R3, =_edata
    LoopRel: 
                   CMP     R2, R3
                   LDRLO   R0, [R1], #4
                   STRLO   R0, [R2], #4
                   BLO     LoopRel
    

    If we write in C ++, then we still need to call the designers of global objects:
                   LDR     r0, =__ctors_start__
                   LDR     r1, =__ctors_end__
    ctor_loop:
                   CMP     r0, r1
                   BEQ     ctor_end
                   LDR     r2, [r0], #4
                   STMFD   sp!, {r0-r1}
                   MOV     lr, pc
                   BX r2
                   LDMFD   sp!, {r0-r1}
                   B       ctor_loop
    ctor_end:
    


    Well, in general, and everything. Call main():
                   ldr     r0,=main
                   bx      r0
    


    Congratulations, now we are in the function void main(void). You can do the initialization of the periphery. The fact is that before that we only initialized the software environment. Therefore, the processor now operates at the lowest frequency possible, all peripherals are disabled. You won’t get around here :)
    But the initialization of the periphery is a thing that depends on a specific piece of hardware, and the purpose of this article is to tell how to run abstract ARM.
    And a few more nuances: this code cannot be directly compiled and run, because the sections where it is located are not described here. Also, I did not provide linker scripts (these scripts describe the placement of sections of code and data in memory and in the firmware image).
    But the Internet is full of ready-made examples for running a particular piece of iron. With scripts, makefiles and all-all-all. Look on manufacturers' websites :)

    The next article, apparently, will again be devoted to theory, this time to a description of processor modes and exceptional situations.

    Also popular now: