How ARM loads
My last topic was completely theoretical, this one will be practical. The practice will be quite hardcore (I myself dealt with this issue only after a year of working with ARMs) - initialization of the processor and memory. In other words: what needs to be done with the processor to get into the function
But first, I want to make one clarification. For some reason, many people think that ARM is necessarily a monster with external memory, a bunch of bindings, operating at a frequency of at least 600Mhz, etc. This is only partly true (if we talk about ARM9 and later families). The chip that I usually work with (AT91SAM7X512) is not much more complicated than the familiar AVRs. He needs only quartz and food to work (it is possible without quartz, but then it will be completely sad). All. But of course he has more opportunities, many more than AVR. But more on that later. Today's article will not be tied to a specific hardware.
A question that worries so many. There are paid (IAR, Keil MDK, CrossWorks) and free (gcc-arm). I will use gcc-arm in the examples. For Windows there are assemblies WinARM (seems to be dead), YAGARTO. In principle, you can collect your own. There is still such a fun thing like coLinux, but that's a completely different story. Under Linux, the cross-compiler is usually built using the standard distribution tools. Read the docks in general :)
There is still such a useful thing as a standard library. The one that implements functions like
With debuggers a little more complicated. You can use emulators, but they are pretty buggy when it comes to peripherals. Here I have no experience. You can use debug messages in the COM port. I have been doing this all my life. I have enough in 99% of cases.
But the coolest thing is JTAG. A device that connects to the processor and allows debiting the code directly in the stone (setting breakpoints, tracing, viewing / changing memory, etc.). True, it costs money, on the one hand, on the other - on the board it will be necessary to raise legs under it.
Okay, we will assume that the compiler is installed and configured. Let's run something now. Let's start from the very beginning: what happens when the processor is reset (for example, after the power is turned on and the voltage has settled). Everything is simple here: the processor starts to execute the program from address 0x0. It would seem that you can place an initialization code from this address and work for yourself. But not so simple. The fact is that in the starting addresses the vectors of exception handlers are stored.
For example, if an interrupt occurs, the processor will start processing it from address 0x18, and the exception “unknown instruction” will be processed from address 0x04. In general, the first 28 bytes are reserved for the table of exception handlers (reset is also an exception).
The figure shows this more clearly. From the figure, it can be seen that 4 bytes are allocated for each processor, or one processor instruction. (In ARM mode. All handlers are called in this mode of instructions.)
Accordingly, the first thing we should do is write exception handlers and place them correctly. Let 's do this : What does this code do? These are commands to load the address of real handlers into the register . Such an unconditional transition. Further along the code are the variables storing these same addresses: Here it was possible to apply several tricks that accelerate the processing of interrupts. For example, as you can see, the FIQ handler is the last in the list, so that the processing of this interrupt could be started right on the spot.
It was also possible to use AIC (advanced interrupt controller) registers to directly go to the interrupt handler. But until we complicate our lives. So far, only Reset processing is important to us.
So let's write the handlers themselves as simple as possible. They will hang the processor (endlessly executing the command of unconditional transition to themselves). Anyway, we don’t know yet how to handle exceptions, so a dangling processor is perfectly acceptable. Is the Branch command. The next thing we need to do is set up the stack pointers for each of the operating modes. Thus, if exceptions occur, the handler will already have its own stack. Only at first we will describe the sizes of all stacks.
In order not to suffer for a long time, we allocate 256 bytes per stack for each mode. In fact, for most of these modes - this is a lot. Although it all depends on the handlers. As you can see, the sizes for 5 of 6 modes are described here. The remaining memory will be shared between the heap and the stack of the sixth (user mode) mode.
Now we describe the constants to facilitate the transition to different modes. The current mode is the CPSR register. He also performs the role of the status register. The constants and are bits that prohibit simple and fast interrupts, respectively. Now we are ready to initialize the stacks. This is done simply: load the pointer to the top of the stack in the register, then go to the desired mode, write to the value , then decrease
Now we are in unprivileged mode with interrupts turned on and the stack configured. By the way, getting out of this mode is simply impossible. Only by throwing an exception. But more on that in the next article. There is just a little bit left
before going into the function
Constants are
By the way, here you can observe the use of conditional instructions (with the suffix O). They will be executed while R1! = R2.
You also need to transfer pre-initialized variables (those that are
If we write in C ++, then we still need to call the designers of global objects:
Well, in general, and everything. Call
Congratulations, now we are in the function
But the initialization of the periphery is a thing that depends on a specific piece of hardware, and the purpose of this article is to tell how to run abstract ARM.
And a few more nuances: this code cannot be directly compiled and run, because the sections where it is located are not described here. Also, I did not provide linker scripts (these scripts describe the placement of sections of code and data in memory and in the firmware image).
But the Internet is full of ready-made examples for running a particular piece of iron. With scripts, makefiles and all-all-all. Look on manufacturers' websites :)
The next article, apparently, will again be devoted to theory, this time to a description of processor modes and exceptional situations.
main()
. The first part of the article is devoted to assembly and debugging tools. The second is for handling exception vectors, the third is for initializing stacks and memory.But first, I want to make one clarification. For some reason, many people think that ARM is necessarily a monster with external memory, a bunch of bindings, operating at a frequency of at least 600Mhz, etc. This is only partly true (if we talk about ARM9 and later families). The chip that I usually work with (AT91SAM7X512) is not much more complicated than the familiar AVRs. He needs only quartz and food to work (it is possible without quartz, but then it will be completely sad). All. But of course he has more opportunities, many more than AVR. But more on that later. Today's article will not be tied to a specific hardware.
Compilers, linkers, debuggers
A question that worries so many. There are paid (IAR, Keil MDK, CrossWorks) and free (gcc-arm). I will use gcc-arm in the examples. For Windows there are assemblies WinARM (seems to be dead), YAGARTO. In principle, you can collect your own. There is still such a fun thing like coLinux, but that's a completely different story. Under Linux, the cross-compiler is usually built using the standard distribution tools. Read the docks in general :)
There is still such a useful thing as a standard library. The one that implements functions like
printf, mktime, malloc
and everything else that C programmers are used to. Using glibc will not work, because it is too large. Instead, they usually use the free newlib. It is part of WinARM / YAGARTO, but Linux users will have to manually assemble it. Again - read the documentation :) With debuggers a little more complicated. You can use emulators, but they are pretty buggy when it comes to peripherals. Here I have no experience. You can use debug messages in the COM port. I have been doing this all my life. I have enough in 99% of cases.
But the coolest thing is JTAG. A device that connects to the processor and allows debiting the code directly in the stone (setting breakpoints, tracing, viewing / changing memory, etc.). True, it costs money, on the one hand, on the other - on the board it will be necessary to raise legs under it.
Exception handlers
Okay, we will assume that the compiler is installed and configured. Let's run something now. Let's start from the very beginning: what happens when the processor is reset (for example, after the power is turned on and the voltage has settled). Everything is simple here: the processor starts to execute the program from address 0x0. It would seem that you can place an initialization code from this address and work for yourself. But not so simple. The fact is that in the starting addresses the vectors of exception handlers are stored.
For example, if an interrupt occurs, the processor will start processing it from address 0x18, and the exception “unknown instruction” will be processed from address 0x04. In general, the first 28 bytes are reserved for the table of exception handlers (reset is also an exception).
The figure shows this more clearly. From the figure, it can be seen that 4 bytes are allocated for each processor, or one processor instruction. (In ARM mode. All handlers are called in this mode of instructions.)
Accordingly, the first thing we should do is write exception handlers and place them correctly. Let 's do this : What does this code do? These are commands to load the address of real handlers into the register . Such an unconditional transition. Further along the code are the variables storing these same addresses: Here it was possible to apply several tricks that accelerate the processing of interrupts. For example, as you can see, the FIQ handler is the last in the list, so that the processing of this interrupt could be started right on the spot.
ldr pc, ResetHandlerAddr
ldr pc, UndefHandlerAddr
ldr pc, SWIHandlerAddr
ldr pc, PrefetchAbtHandlerAddr
ldr pc, DataAbtHandlerAddr
nop
ldr pc, IRQHandlerAddr
ldr pc, FIQHandlerAddr
pc
ResetHandlerAddr: .word ResetHandler
UndefHandlerAddr: .word UndefHandler
SWIHandlerAddr: .word SWIHandler
PrefetchAbtHandlerAddr: .word PrefetchAbtHandler
DataAbtHandlerAddr: .word DataAbtHandler
IRQHandlerAddr: .word IRQHandler
FIQHandlerAddr: .word FIQHandler
It was also possible to use AIC (advanced interrupt controller) registers to directly go to the interrupt handler. But until we complicate our lives. So far, only Reset processing is important to us.
So let's write the handlers themselves as simple as possible. They will hang the processor (endlessly executing the command of unconditional transition to themselves). Anyway, we don’t know yet how to handle exceptions, so a dangling processor is perfectly acceptable. Is the Branch command. The next thing we need to do is set up the stack pointers for each of the operating modes. Thus, if exceptions occur, the handler will already have its own stack. Only at first we will describe the sizes of all stacks.
UndefHandler: B UndefHandler
SWIHandler: B SWIHandler
PrefetchAbtHandler: B PrefetchAbtHandler
DataAbtHandler: B DataAbtHandler
IRQHandler: B IRQHandler
FIQHandler: B FIQHandler
B
sp
.EQU IRQ_STACK_SIZE, 0x100
.EQU FIQ_STACK_SIZE, 0x100
.EQU ABT_STACK_SIZE, 0x100
.EQU UND_STACK_SIZE, 0x100
.EQU SVC_STACK_SIZE, 0x100
In order not to suffer for a long time, we allocate 256 bytes per stack for each mode. In fact, for most of these modes - this is a lot. Although it all depends on the handlers. As you can see, the sizes for 5 of 6 modes are described here. The remaining memory will be shared between the heap and the stack of the sixth (user mode) mode.
Now we describe the constants to facilitate the transition to different modes. The current mode is the CPSR register. He also performs the role of the status register. The constants and are bits that prohibit simple and fast interrupts, respectively. Now we are ready to initialize the stacks. This is done simply: load the pointer to the top of the stack in the register, then go to the desired mode, write to the value , then decrease
.EQU ARM_MODE_FIQ, 0x11
.EQU ARM_MODE_IRQ, 0x12
.EQU ARM_MODE_SVC, 0x13
.EQU ARM_MODE_ABT, 0x17
.EQU ARM_MODE_UND, 0x1B
.EQU ARM_MODE_USR, 0x10
.EQU I_BIT, 0x80
.EQU F_BIT, 0x40
I_BIT
F_BIT
r0
sp
r0
r0
on the size of the stack and repeat..RAM_TOP:
.word __TOP_STACK
ResetHandler:
ldr sp, .RAM_TOP
msr CPSR_c, #ARM_MODE_FIQ | I_BIT | F_BIT
mov sp, r0
sub r0, r0, #FIQ_STACK_SIZE
msr CPSR_c, #ARM_MODE_IRQ | I_BIT | F_BIT
mov sp, r0
sub r0, r0, #IRQ_STACK_SIZE
msr CPSR_c, #ARM_MODE_SVC | I_BIT | F_BIT
mov sp, r0
sub r0, r0, #SVC_STACK_SIZE
msr CPSR_c, #ARM_MODE_ABT | I_BIT | F_BIT
mov sp, r0
sub r0, r0, #ABT_STACK_SIZE
msr CPSR_c, #ARM_MODE_UND | I_BIT | F_BIT
mov sp, r0
sub r0, r0, #UND_STACK_SIZE
msr CPSR_c, #ARM_MODE_USR
Memory initialization
Now we are in unprivileged mode with interrupts turned on and the stack configured. By the way, getting out of this mode is simply impossible. Only by throwing an exception. But more on that in the next article. There is just a little bit left
before going into the function
main()
. It is only necessary to transfer some data to RAM and reset the memory, which is located in the .BSS segment. This is the memory where global variables are stored. The fact is that the C language standard promises that global variables will be reset to zero at the beginning of the work, and ARM does not guarantee this to us. Therefore, we manually reset the segment: MOV R0, #0
LDR R1, =__bss_start__
LDR R2, =__bss_end__
LoopZI:
CMP R1, R2
STRLO R0, [R1], #4
BLO LoopZI
Constants are
__bss_end__ & __bss_start__
kindly provided to us by the linker. By the way, here you can observe the use of conditional instructions (with the suffix O). They will be executed while R1! = R2.
You also need to transfer pre-initialized variables (those that are
int x=42
) from ROM to RAM . LDR R1, =_etext
LDR R2, =_data
LDR R3, =_edata
LoopRel:
CMP R2, R3
LDRLO R0, [R1], #4
STRLO R0, [R2], #4
BLO LoopRel
If we write in C ++, then we still need to call the designers of global objects:
LDR r0, =__ctors_start__
LDR r1, =__ctors_end__
ctor_loop:
CMP r0, r1
BEQ ctor_end
LDR r2, [r0], #4
STMFD sp!, {r0-r1}
MOV lr, pc
BX r2
LDMFD sp!, {r0-r1}
B ctor_loop
ctor_end:
Well, in general, and everything. Call
main()
: ldr r0,=main
bx r0
Congratulations, now we are in the function
void main(void)
. You can do the initialization of the periphery. The fact is that before that we only initialized the software environment. Therefore, the processor now operates at the lowest frequency possible, all peripherals are disabled. You won’t get around here :) But the initialization of the periphery is a thing that depends on a specific piece of hardware, and the purpose of this article is to tell how to run abstract ARM.
And a few more nuances: this code cannot be directly compiled and run, because the sections where it is located are not described here. Also, I did not provide linker scripts (these scripts describe the placement of sections of code and data in memory and in the firmware image).
But the Internet is full of ready-made examples for running a particular piece of iron. With scripts, makefiles and all-all-all. Look on manufacturers' websites :)
The next article, apparently, will again be devoted to theory, this time to a description of processor modes and exceptional situations.