madprogrammer September 19, 2016 at 10:01

Running Linux on FPGA: Hello, World

Tutorial

UPD 09/21/16: BusyBox now starts successfully.

Inspired by a series of articles on the Mars rover project website in which the author tries to run an open system on the Amber SoC and Linux chip on the Mars rover 2 FPGA board , I decided to try to repeat this experience on my Terasic DE2-115 board . But, instead of the legacy version of Linux 2.4.27, which is ~~as old as the mammoth shit~~ , I will be launching the latest version of Linux at the moment - 4.8.0-rc5.

Amber crystal system

The Amber processor core is a 32-bit RISC processor, fully compatible with the architecture and ARM v2a command system, which allows you to compile programs for it using GCC. In addition to the processor itself, the Amber project provides several peripheral devices as part of the on-chip system, including UART, timer and Ethernet MAC. The processor core is provided in two versions:

	Amber 23	Amber 25
Conveyor	three level	five level
Cache	general (code + data)	separated
Low resolution tires Wishbone	32 bits	128 bit
Performance	0.75 DMIPS / MHz	1.05 DMIPS / MHz

As you can see, the performance of the processor core is comparable to the performance of cores based on later versions of the ARM architecture, such as ARMv4 and ARMv5. The ARMv2a architecture is implemented in the Amber processor for the reason that it is not covered by patents and its implementation can be freely distributed. However, there are some problems associated with this - this architecture is considered obsolete in GCC, from where its support is gradually "cut out", and support for this architecture has been removed from the Linux kernel for a long time.

An important feature of the architecture is that, unlike newer versions of the ARM architecture, the processor does not support the THUMB mode, it does not have CPSR / SPSR registers and support for MSR / MRS instructions, and the processor flags are contained in the bits of the PC register bits:

Because of this, the processor can address a maximum of 64 MB of memory (26 bits) in the PC register, the two lowest of which are always 0, because instructions are always aligned on the word boundary, so the two least significant bits of the register are used for flags that determine the processor operating mode (user / privileged, interrupt handler). In other registers, the processor can address up to 4 GB of memory. More details about the architecture of the processor core and the set of commands implemented in it can be found here and here .

Installing the ARM Cross Compiler

Unfortunately, the Sourcery CodeBench Lite compiler, which was used by the author of articles on porting the project to the Mars rover board, is no longer available for download, but this is not a very big problem. You can use crosstool-NG or crossdevGentoo Linux to install the compiler .

To install using crosstool-NG, just use the configuration arm-unknown-eabiavailable “out of the box”:

$ ct-ng arm-unknown-eabi
$ ct-ng build

This compiler will be used to build the Linux kernel and bare-metal programs, such as the bootloader, and a simple application that prints Hello, World to the serial port.

We compile Hello World and run Verilator in the Verilog simulator

Download the project distribution kit from GitHub and look inside: the project is divided into 2 parts - the folder hwcontains the source code of the “hardware” part in Verilog, and the folder contains the swsource code of the programs that will be executed on the processor, and some auxiliary utilities used during assembly and converting ELF and BIN file formats to a format supported by Xilinx tools and test scripts from the Amber project.

Go to the folder sw/helloand compile the program hello-world.c:

$ cd sw/hello-world
$ export AMBER_CROSSTOOL=arm-unknown-eabi
$ make

As a result, among other things, a file will be generated hello-world.mem- a text file with the contents of the compiled program, suitable for loading into the simulator and in the Boot ROM of our processor.

The author of the original articles that I was guided by used Icarus Verilog, a free and very popular simulator, to simulate the project, but the problem is that it works terribly slow - on my machine with a 2.6 GHz processor, the Amber core clock frequency when simulating in Icarus Verilog is about 16 kHz, and each character of the string “Hello, World” from the example above is displayed for about half a second. This speed is sufficient if you need to debug the execution of a small program, such as the bootloader or the same hello-world, but it is unacceptable if you need to debug the loading of the whole Linux kernel - you have to wait forever.

Therefore, we will use the Verilator simulator, which compiles Verilog in C ++ and works very fast - Hello World prints instantly without any visible delay, and the clock frequency on my machine is about 1.5 MHz, which is 100 times faster than Icarus Verilog! By the way, the process of debugging the Linux kernel launch took me about a week, and the simulation helped a lot, since in the simulation mode the test bench code writes an assembler listing of all instructions executed by the processor to the text log file, including address jumps, asynchronous and software interrupts, etc. A sort of disassembler implemented in Verilog.

We install Verilator according to the instructions from the official website , go to the folder hw/de2_115/tbin which the modified test bench is located, and domake. Despite the Verilog compiler’s warning stream, a folder will appear as a result obj_dir, and an executable file will appear in it Vtb, which we will run to simulate the system’s operation.

Next, execute the following commands:

$ cp ../../../sw/hello-world/hello-world.mem ./boot-loader.mem
$ ./obj_dir/Vtb

As a result, the simulation will be launched and we will see the long-awaited Hello, World:

Load boot memory from boot-loader.mem
Read in 961 lines
Hello, World!

This means that the processor has successfully read and executed our program compiled by GCC for ARM!

If you wish, you can Makefileadd a key to the list of startup verilatorkeys --trace, then during the testbench work another file will be generated - out.vcdwhich can then be opened with the GTKWave program, and you can personally see the waveforms of various signals inside the processor and other blocks:

Build initramfs with Builtroot

Before building the Linux kernel, create an environment for compiling user programs for our system (based on uClibc-ng) and generate a file that will be added to the kernel as initramfs during the build process. To do this, use Buildroot, which can be downloaded from here .

$ make amber_defconfig
$ make

As a result, we will have a toolchain arm-buildroot-uclinux-uclibcgnueabiand file system image in ./output/images/rootfs.cpio. The path to this image will need to be specified in the kernel configuration file, parameter CONFIG_INITRAMFS_SOURCE. BusyBox is included in the image of the file system, ~~however, it still does not start up to the end~~ (now it starts), but within the framework of this article we restrict ourselves to a simple “Hello, World” as a process /sbin/init. To do this, in the directory in which BuildRoot was going to create a file hello.cwith the contents known to each programmer, and execute the following commands:

$ ./output/host/usr/bin/arm-buildroot-uclinux-uclibcgnueabi-gcc -o hello hello.c
$ mv hello output/target/sbin/init
$ rm hello.gdb
$ make

After successful execution of these commands ./output/images/rootfs.cpiowill be rebuilt with our application instead of BusyBox. This method of file substitution is suitable to quickly check something, for the full addition and replacement of files rootfsin the build process there is a configuration option BR2_ROOTFS_OVERLAY.

Unlike the example that we launched in the Verilator simulator, this new “Hello, World” is no longer working as a bare-metal application, but as a user application for Linux - the text will be output to the serial port using the standard uClibc library, which will make the system call writewill transfer control to the kernel through a software interrupt, the kernel will transfer control to the driver tty, then to the serial port driver and, finally, a message will be displayed.

Build the Linux kernel and bootloader

Naturally, in order to launch the freshest kernel, it had to make some changes. For the most part, these changes are related to interrupt handling and processor mode switching, as this code is architecture dependent. Next, I adapted the Integrator platform support code (mach-integrator), because in the original patch of the author of the Amber project for the 2.4 kernel, there are hints that this platform is the prototype of the Amber SoC architecture (in particular, it was found that peripheral devices, such as the interrupt controller, timer, and serial port, are implemented compatible with the device drivers used on this platform) and created on its basis the new Amber platform.

Fortunately, the debugging clock is behind, and now the assembly of the working core is done with a flick of the wrist. Those who wish to repeat it can clone the sources and execute the following commands:

$ make ARCH=arm CROSS_BUILD=arm-none-eabi- amber_defconfig
$ make -j8 ARCH=arm CROSS_BUILD=arm-none-eabi- Image
$ make ARCH=arm CROSS_BUILD=arm-none-eabi- arch/arm/boot/dts/amber-de2115.dts

After assembling the kernel, files will be created arch/arm/boot/Imageand arch/arm/boot/dts/amber-de2115.dtbready to be downloaded to the board using the bootloader via the serial port using the XMODEM protocol.

To build the bootloader, go to the folder sw/boot-loader-serial, do it make(do not forget about the environment variable AMBER_CROSSTOOL) and get a file boot-loader-serial.memthat mem2mifcan be converted using the utility into the MIF format, which accepts Altera Quartus II as a memory initialization file.

Putting it all together

For those who have a Terasic DE2-115 board, it's time to open the project de2_115.qpf, synthesize it (note that in my project the serial port is connected to the EXT_IO connector instead of the RS232 on the board, since there are no COM ports on my motherboard), indicate as a memory initialization file de2_115_sram_2048_32_byte_enobtained in the previous stepboot-loader-serial.mifand load the bitstream into the board. Since the Amber processor, for one known developer, did not implement a reset logic, it is possible to reset the processor to its initial state only by reloading the bitstream. At the same time, if the KEY0 button is kept pressed during the process, the processor will not start the program until it is released. I used this button to debug Verilog code using SignalTap. But if you let her go, then just rebooting the bitstream will help to start all over again.

After loading the bitstream in a terminal configured to 921600 baud, an Amber bootloader prompt will immediately appear. Next, you need to type the command b 80000and send using the XMODEM Linux kernel file ( arch/arm/boot/Image), formed earlier, and then type the command againb 78000and send a DTB file that describes which devices should be searched for, which drivers to load for them, how much RAM is in the system, a command line with kernel parameters, and other information. I patched the bootloader in such a way that it sends the kernel the address 0x78000as the address where to look for DTB, so we load it at this address.

Finally, when both files are loaded into RAM (SDRAM), you can enter the command j 80000in the bootloader console. Linux will start loading, and if everything is done correctly, the result will be something like this:

Our “Hello, World” started as the first user process ( /sbin/init) and displayed the coveted phrase on the screen through the standard library and the kernel. Great, isn't it?

If you do not have a Terasic DE2-115 board or any other board with a sufficient FPGA, then you can still run Linux in the Verilator simulator. To do this, hw/de2_115/tb/Makefileyou need to add the keys -DAMBER_LOAD_MAIN_MEM=1and -DAMBER_LOAD_DTB_MEM=1, and rebuild the executable file Vtb. Then, using the utility, amber-bin2memwe create the kernel and DTB files for the simulator:

$ amber-bin2mem arch/arm/boot/Image 80000 > vmlinux.mem
$ amber-bin2mem arch/arm/boot/dts/amber-de2115.dtb 78000 > dtb.mem

In addition, you will need to slightly modify the bootloader code for the simulation by commenting out the function call mainsince in normal mode it requests commands from the user. Then the bootloader will immediately transfer control to the Linux kernel. Copy the files *.memto the testbench folder, run: ./obj_dir/Vtband watch Linux boot.

Limitations, practical benefits

Of course, the Linux that started as a result is not quite similar to the one we are used to seeing on servers and workstations, due to the fact that the Amber processor core does not have an MMU (Memory Management Unit) and, as a result, virtual memory support (all memory is physical), memory protection (any application can spoil the kernel memory or communicate with devices bypassing it through the Wishbone bus), copy-on-write, etc. NOMMU Linux currently does not support ELF executable files in the usual form (although there are developments to support the FDPIC ELF format) and dynamic bib ioteki - used instead bFLT (Binary Flat) format - a simple format based on a.out. And if you run, say, N copies of some application on such a system, then exactly the same number of copies will be in memory.

There is still practical benefit from the work done, even such "stripped down" versions of Linux work in many devices based on microcontrollers with limited resources. I hope that the FPGA programming enthusiasts will be able to learn something useful for themselves by experimenting with full-fledged Linux on a FPGA synthesized processor (which by the way, on DE2-115 takes up only 8% of capacity or about 10,000 LE). If you have another board based on Altera or Xilinx, then porting to it will not be difficult, because most of the work has already been done. Of course, now there are already more interesting solutions from a practical point of view, such as Xilinx Zynq, Altera Cyclone V SoC, which contain a full-fledged ARM-SoC on the same chip with FPGA, but the solution presented in this article allows Linux to be launched even for owners of simple cards with not very powerful FPGAs on board. The remaining free logic can be used to implement new custom peripherals that can be hung on the Wishbone bus and made accessible from the OS using drivers.

Plans

The Terasic DE2-115 board is truly one of the most powerful debugging boards on the basis of which interesting projects have already been made (here is the brightest example and here is another one ). It has on board a wide range of peripherals:

128 MB SDRAM
8 MB SPI Flash
LEDs and seven-segment indicators
16x2 liquid crystal display
24-bit audio codec
SD card slot
2 gigabit Ethernet ports
VGA monitor output, PS / 2 for keyboard
USB ports

Of all this wealth, in this project so far I have used only random access memory. In the future, if there is time, I want to compile U-Boot and place it in the built-in Flash memory, in the bootloader code in the FPGA, load U-Boot, which would then load the Linux kernel and root file system from the SD memory card. In addition, I would like to try to implement support for peripherals on the board - Ethernet, for example.

Tags: