How to tame the processor core *

This article describes the stages of loading the QoriQ series processor cores and the participation of the u-boot loader, as well as the execution of a single program on a separate processor core without the participation of the OS. This article may be of interest to system programmers seeking to comprehend the diversity of processor architectures. It should also be understood that some definitions and techniques are relevant for other processors and systems.

* On the example of freescale qoriq processors with e500mc and ppc booke isa cores.

To business!

Life begins with the start of the kernel * number 0, which executes the bootloader code from a fixed address. In our case, the bootloader is u-boot, located in flash memory and accessible at physical addresses 0x3ffff000. Immediately after the start, this address is mapped to virtual 0xfffff000, about which there is an entry in the mapping table (see the documentation on e500).

hereinafter, the kernel refers to the core of the processor (core), unless otherwise specified.

The first command executed by the processor is the command located at 0xfffffffc. As you probably already guessed by this command, there must be a command to go to the u-boot start point on this page. Something like this:

/* */.section.resetvec,"ax"
	b _start_e500

for those in Intle

b ≈ jmp

for those in java

this is the unconditional jump to label _start_e500

Further, u-boot tasks include turning on and configuring caches, access control mechanisms, address mapping tables, and, of course, you need to map the rest of yourself to the address space of the processor core. In general, u-boot is well done: it takes on all the dirty work, unlike some ( we will not poke a finger , this is not his fault).

But what about the other kernels? We move on to the remaining cores when we are done with the core. If you run them, they will repeat the sequence of actions of the zero core. To prevent this, uboot will change the translation address of the boot area to the location of the additional loader of the remaining kernels (see ccsr boot space translation register).

By the way, in the u-boot settings, it is possible not to initialize the operation of the remaining processor cores, but in this case, the dirty work of initializing them becomes our further concern. Which, however, can be solved by copying the code from u-boot, and the kernel itself is launched by writing the corresponding bit to the ccsr register.

The tasks of the additional loader also include configuring the cache and adding an entry to the address mapping table, after which the kernels spin in such a fun carousel:

/* spin waiting for addr */2:
	lwz	r4,ENTRY_ADDR_LOWER(r10)  /* загрузить данные по адресу*/
	andi.	r11,r4,1/* если  младший бит = 1, то...*/
	bne	2b				/* ...переход выше, к метке 2*/

They spin until the address of the start point with the 0 least significant bit is written to a specific u-boot address. The place where you want to write this command to u-boot is called a spin-table and it is located at a fixed address (0xfffff000 + ENTRY_ADDR_LOWER). In addition to the start address, in this table you can write the values of the registers r6 and r3, which will be loaded before executing the command to go to the start point.

The start point is limited by the size of the already displayed u-boot 64MB page, this is due to the uboot internal cockroaches.

For those who studied in modern computer science textbooks

In PowerPC and some other architectures, a page is a loose concept. In particular, on e500 * processors it can stretch in the range from 4K to 4GB (for details and limitations, see the documentation).

Creating applications for the kernel.

Let's develop a program for our kernel that traditionally prints “hello world”. We assume that we have already compiled the cross compiler and the lightweight libc library, and also we can load the OS without multicore support on one of the kernels (for example, you can use linux or lynxos-178 compiled accordingly). Therefore, we proceed to the hardest part - programming:

#include<stdio.h>intmain(int argc, char *argv[]){
	printf («hello world  \n»);
	return0;
}

Done. And where and how will printf output ?! To do this, you will have to write some stubs for libc, which can be seen in the source code of u-boot. I am using a simplified version:

intwrite(int fildes, char *buf, int nbyte){
	int wbyte = 0;
	while (nbyte > 0) {
		__putc(*buf);
		buf++;
		nbyte--;
		wbyte++;
	}
	return wbyte;
}
intfstat(int fildes, struct stat *buf); {
	buf->st_mode = S_IFCHR;
	return0;
}

And the __putc function should provide a single character output to the serial port.

externvolatileunsignedchar *uart_data;
externvolatileunsignedchar *uart_status;
staticvoid __putc(unsignedchar c)
{
        unsignedchar v;
        do {
                v = *uart_status;
        } while (!(v & (1 << 5)));
        *uart_data = c;
}

It is not necessary to write a full-fledged driver for this; it is enough to use the default settings and record the character at the address described in the documentation. Physical address. Which still needs to be displayed. I will not give the display function due to its specificity, but I am ready to share it upon request.

And we trim the actual start point of the executable file — the _start function. For initialization, you can leave only zeroing the bss segment:

int _start(int argv, char **argc)
{
        unsignedchar * cp = (unsignedchar *) &__sbss;
       while (cp < (unsignedchar *)&__ebss) {
                *cp++ = 0;
        }
	return main(argv, argc);
}

So, now we can do without the OS and the printf function knows where to display information to it. We compile:

$ powerpc-eabi-gcc -o hello hello.c start.c

Will work? Not! The processor does not understand the format of the elf executable file. It is necessary to cut off the extra header and other attributes of the executable file. Crop:

$ powerpc-eabi-objcopy -O binary  hello hello.bin

Will work? Not! Previously, the start point was set in elf, but now she knows where the hell. A more detailed location can be found with the powerpc-eabi-objdump utility. Of course, you can specify u-boot as the starting point for this place as well, but it is better to write instructions to the linker about placing the starting point at the beginning of the file:

	OUTPUT_ARCH(powerpc:common)
	ENTRY(_start)
	STARTUP(start.o)
	...

The further contents of the file will depend on the version of the build tools and you can see this in the scripts included with the compiler.
Now, according to the scenario, the linker will add the start.o file to the beginning of the executable file. The sequence of functions should correspond to the source text, but it will be calmer to leave only one function in this file - _start. In general, the addition of STARTUP is a quick and narrow solution. If in the future we want more, we will have to bother with binding functions to segments and placing them inside the script.

We put it together again and think how convenient it is to do this with the makefile:

$ powerpc-eabi-gcc -T hello.ld -o hello hello.c start.c

Now, if we assembled everything correctly, we have a program ready to run on a separate kernel, without an OS. But it’s better to additionally check everything with objdump. It should be something like this:

Disassembly of section .text:
00004000 <_start>:
    4000:       3d 00 00 01     lis     r8,1
    4004:       3c e0 00 32     lis     r7,50
    4008:       39 08 70 00     addi    r8,r8,28672
    400c:       38 e7 80 78     addi    r7,r7,-32648
    4010:       39 48 ff ff     addi    r10,r8,-1
    4014:       39 27 ff ff     addi    r9,r7,-1
...

By the way, did you notice the segment addresses on the side? Unless otherwise specified, our program has position-dependent code and you cannot run it from anywhere. But now this should not bother us, and we can correct the bias using the linker script. A properly compiled program should run even through u-boot.

Kernel launch

From the point of view of the OS, the additional core will not be any different from other peripheral devices using DMA and for its operation we will need to allocate memory. The memory will be used to host our program, and in the future to exchange the results of its work, host exception handlers, etc. Memory is allocated using the usual OS tools: kmalloc for linux, alloc_cmem for lynxos, etc., but the physical address of the beginning of this memory should be aligned to page size. The general memory mapping scheme will look something like this:

If we do not want to shoot the OS leg, it will be reasonable if the size of the allocated OS memory corresponds to the size of the displayed memory for another kernel. Otherwise, our kernel can write to the memory area that the OS uses for internal purposes. You are not a hypervisor here.

So, we have allocated a memory where you can write your program. The program is recorded taking into account the offset specified by the linker and after recording the program do not forget to reset the cache for this section. Now you can write the physical address of the program start point to the spin table where the already configured kernel is spinning.

In fact,

after writing down the address, u-boot will jump to the virtual address, but since they are equal at this stage, we can consider this a miracle, but understand how it works.

Have you forgotten about the starting point size limit?
You can’t be sure in what range of addresses the system will have free memory and it may happen that the allocated memory falls on the non-displayed u-boot address range. This will lead to the execution of a page error exception that is not configured for us, which ultimately will lead to the cessation of the processor core.
There can be several solutions to this, and I liked the most about making my kernel initializer in the OS, located at the start addresses, which:

overcome the starting point limitation;
Limit our application in the used memory so that only the memory allocated by the OS is visible;
Configure devices available for the kernel
jump to the beginning of our program;

After transferring control to the program, “hello world
” should be displayed in the console .
In the future, using control registers, the processor cores can be stopped, restarted, changed frequency, issue interrupts to them (this is an interesting, but very specific topic) and much more.

Conclusion

Of course, many modern operating systems provide the ability to isolate the processor cores for individual applications, and the need to write your own code to support multi-core is dubious. However, there are tasks associated with hard real time, for which 2 ms delays that multicore in the standard configuration can introduce are very critical. And they require innovative approaches to system configuration. But this is already material for another article.

Tags:

How to tame the processor core *

To business!

Creating applications for the kernel.

Kernel launch

Conclusion

Also popular now: