RISC-V from scratch

Original author: Tyler Wilcock
  • Transfer
In this article, we explore various low-level concepts (compilation and layout, primitive runtimes, assembler, and more) through the prism of the RISC-V architecture and its ecosystem. I’m a web developer myself, I don’t do anything at work, but it’s very interesting for me, this is where the article came from! Join me on this hectic journey into the depths of low-level chaos.

First, let's talk a bit about RISC-V and the importance of this architecture, configure the RISC-V toolchain, and run a simple C program on emulated RISC-V hardware.

Content


  1. What is RISC-V?
  2. Configuring QEMU and RISC-V Tools
  3. Hi RISC-V!
  4. Naive approach
  5. Lifting the curtain -v
  6. Search our stack
  7. Layout
  8. Stop! Hammertime! Runtime!
  9. Debug but now for real
  10. What's next?
  11. Additionally

What is RISC-V?


RISC-V is a free instruction set architecture. The project originated at the University of California at Berkeley in 2010. An important role in its success was played by openness of code and freedom of use, which was very different from many other architectures. Take ARM: to create a compatible processor, you have to pay an advance fee of $ 1 million to $ 10 million, as well as pay royalties of 0.5–2% on sales . A free and open model makes RISC-V an attractive option for many, including for startups who cannot pay a license for an ARM or other processor, for academic researchers and (obviously) for the open source community.

The rapid growth in popularity of RISC-V did not go unnoticed. ARM launched the sitewho tried (rather unsuccessfully) to emphasize the alleged advantages of ARM over RISC-V (the site is already closed). The RISC-V project is supported by many large companies , including Google, Nvidia and Western Digital.

Configuring QEMU and RISC-V Tools


We cannot run the code on the RISC-V processor until we set up the environment. Fortunately, this does not require a physical RISC-V processor; instead, we take qemu . Follow the instructions for your operating system to install . I have MacOS, so just enter one command:

# also available via MacPorts - `sudo port install qemu`
brew install qemu

Conveniently, it qemucomes with several machines ready for operation (see option qemu-system-riscv32 -machine).

Next, install OpenOCD for RISC-V and RISC-V tools.

Download ready-made assemblies of RISC-V OpenOCD and RISC-V tools here .
We extract the files to any directory, I have it ~/usys/riscv. Remember it for future use.

mkdir -p ~/usys/riscv
cd ~/Downloads
cp openocd--.tar.gz ~/usys/riscv
cp riscv64-unknown-elf-gcc--.tar.gz ~/usys/riscv
cd ~/usys/riscv
tar -xvf openocd--.tar.gz
tar -xvf riscv64-unknown-elf-gcc--.tar.gz

Set environment variables RISCV_OPENOCD_PATHand RISCV_PATHso that other programs can find our tool chain. This may look different depending on the OS and shell: I added the paths to the file ~/.zshenv.

# I put these two exports directly in my ~/.zshenv file - you may have to do something else.
export RISCV_OPENOCD_PATH="$HOME/usys/riscv/openocd--"
export RISCV_PATH="$HOME/usys/riscv/riscv64-unknown-elf-gcc--"
# Reload .zshenv with our new environment variables.  Restarting your shell will have a similar effect.
source ~/.zshenv

Create a /usr/local/binsymbolic link for this executable file so that you can run it at any time without specifying the full path to .~/usys/riscv/riscv64-unknown-elf-gcc--/bin/riscv64-unknown-elf-gcc

# Symbolically link our gcc executable into /usr/local/bin.  Repeat this process for any other executables you want to quickly access.
ln -s ~/usys/riscv/riscv64-unknown-elf-gcc-8.2.0--/bin/riscv64-unknown-elf-gcc /usr/local/bin

And voila, we have a working RISC-V toolkit! All our executable files, such as riscv64-unknown-elf-gcc, riscv64-unknown-elf-gdb, riscv64-unknown-elf-ldand others are in .~/usys/riscv/riscv64-unknown-elf-gcc--/bin/

Hi RISC-V!


May 26, 2019 Patch:

Unfortunately, due to a bug in RISC-V QEMU, the freedom-e-sdk 'hello world' program in QEMU no longer works. A patch has been released to solve this problem, but for now, skip this section. This program will not be needed in subsequent sections of the article. I track the situation and update the article after fixing the bug.

See this comment for more information .


With the tools set up, let's run the simple RISC-V program. Let's start by cloning the SiFive freedom-e-sdk repository:

cd ~/wherever/you/want/to/clone/this
git clone --recursive https://github.com/sifive/freedom-e-sdk.git
cd freedom-e-sdk

By tradition , let's start with the 'Hello, world' program from the repository freedom-e-sdk. We use the ready-made Makefileone that they provide for compiling this program in debug mode:

make PROGRAM=hello TARGET=sifive-hifive1 CONFIGURATION=debug software

And run in QEMU:

qemu-system-riscv32 -nographic -machine sifive_e -kernel software/hello/debug/hello.elf
Hello, World!

This is a great start. You can run other examples from freedom-e-sdk. After that, we will write and try to debug our own program in C.

Naive approach


Let's start with a simple program that infinitely adds two numbers.

cat add.c
int main() {
    int a = 4;
    int b = 12;
    while (1) {
        int c = a + b;
    }
    return 0;
}

We want to run this program, and the first thing we need to compile it for the RISC-V processor.

# -O0 to disable all optimizations. Without this, GCC might optimize 
# away our infinite addition since the result 'c' is never used.
# -g to tell GCC to preserve debug info in our executable.
riscv64-unknown-elf-gcc add.c -O0 -g

A file is created here a.out, this gccdefault name gives executable files. Now run this file in qemu:

# -machine tells QEMU which among our list of available machines we want to
# run our executable against.  Run qemu-system-riscv64 -machine help to list
# all available machines.
# -m is the amount of memory to allocate to our virtual machine.
# -gdb tcp::1234 tells QEMU to also start a GDB server on localhost:1234 where
# TCP is the means of communication.
# -kernel tells QEMU what we're looking to run, even if our executable isn't 
# exactly a "kernel".
qemu-system-riscv64 -machine virt -m 128M -gdb tcp::1234 -kernel a.out

We chose the car virtwith which it was originally deliveredriscv-qemu .

Now that our program works inside QEMU with the GDB server on localhost:1234, connect to it with the RISC-V GDB client from a separate terminal:

# --tui gives us a (t)extual (ui) for our GDB session.
# While we can start GDB without any arguments, specifying 'a.out' tells GDB 
# to load debug symbols from that file for the newly created session.
riscv64-unknown-elf-gdb --tui a.out

And we are inside GDB!

This GDB was configured as "--host = x86_64-apple-darwin17.7.0 --target = riscv64-unknown-elf". │
Type "show configuration" for configuration details. │
For bug reporting instructions, please see: │
. │
Find the GDB manual and other documentation resources online at: │
    . │
                                                                                                      │
For help, type "help". │
Type "apropos word" to search for commands related to "word" ... │
Reading symbols from a.out ... │
(gdb) 

We can try to run commands in GDB runor startfor the executable file a.out, but at the moment this will not work for an obvious reason. We compiled the program as riscv64-unknown-elf-gcc, so the host should work on the architecture riscv64.

But there is a way out! This situation is one of the main reasons for the existence of the client-server model of GDB. We can take the executable file riscv64-unknown-elf-gdband instead of launching it on the host, indicate to it some remote target (GDB server). As you recall, we just started riscv-qemuand said to start the GDB server on localhost:1234. Just connect to this server:

(gdb) target remote: 1234 │
Remote debugging using: 1234

Now you can set some breakpoints:

(gdb) b main
Breakpoint 1 at 0x1018e: file add.c, line 2.
(gdb) b 5 # this is the line within the forever-while loop. int c = a + b;
Breakpoint 2 at 0x1019a: file add.c, line 5.

And finally, specify GDB continue(abbreviated command c) until we reach the breakpoint:

(gdb) c
        Continuing.

You will quickly notice that the process does not end in any way. This is strange ... shouldn't we immediately reach the breakpoint b 5? What happened?



Here you can see several problems:

  1. The text UI cannot find the source. The interface should display our code and any nearby breakpoints.
  2. GDB does not see the current line of execution ( L??) and displays the counter 0x0 ( PC: 0x0).
  3. Some text in the input line, which in full form looks like this: 0x0000000000000000 in ?? ()

Combined with the fact that we cannot reach the breakpoint, these indicators indicate: we did something wrong. But what?

Lifting the curtain -v


To understand what is happening, you need to take a step back and talk about how our simple C program under the hood actually works. The function maindoes a simple addition, but what is it really? Why should it be called main, not originor begin? According to the convention, all executable files begin to execute with a function main, but what magic provides this behavior?

To answer these questions, let's repeat our GCC team with a flag -vto get a more detailed output of what is actually happening.

riscv64-unknown-elf-gcc add.c -O0 -g -v

The output is large, so we will not view the entire listing. It is important to note that although GCC is formally a compiler, it also defaults to compiling (to limit itself to compilation and assembly, you should specify a flag -c). Why is it important? Well, take a look at the snippet from the detailed output gcc:

# The actual `gcc -v` command outputs full paths, but those are quite
# long, so pretend these variables exist.
# $ RV_GCC_BIN_PATH = / Users / twilcock / usys / riscv / riscv64-unknown-elf-gcc--/ bin /
# $ RV_GCC_LIB_PATH = $ RV_GCC_BIN_PATH /../ lib / gcc / riscv64-unknown-elf / 8.2.0
$ RV_GCC_BIN_PATH /../ libexec / gcc / riscv64-unknown-elf / 8.2.0 / collect2 \
  ... truncated ... 
  $ RV_GCC_LIB_PATH /../../../../ riscv64-unknown-elf / lib / rv64imafdc / lp64d / crt0.o \ 
  $ RV_GCC_LIB_PATH / riscv64-unknown-elf / 8.2.0 / rv64imafdc / lp64d / crtbegin.o \
  -lgcc --start-group -lc -lgloss --end-group -lgcc \ 
  $ RV_GCC_LIB_PATH / rv64imafdc / lp64d / crtend.o
  ... truncated ...
COLLECT_GCC_OPTIONS = '- O0' '-g' '-v' '-march = rv64imafdc' '-mabi = lp64d'

I understand that even in abbreviated form this is a lot, so let me explain. The first line gccexecutes the program collect2, passes the arguments crt0.o, crtbegin.oand crtend.o, -lgccand flags --start-group. The description of collect2 can be found here : in short, collect2 organizes various initialization functions at startup, making the layout in one or more passes.

Thus, GCC compiles several files crtwith our code. As you can guess, it crtmeans 'C runtime'. Here it is described in detail what everyone is intended for crt, but we are interested in crt0one that performs one important thing:

"This [crt0] object is expected to contain a character _startthat indicates the bootstrap of the program."

The essence of the “bootstrap” depends on the platform, but usually it includes important tasks, such as setting up a stack frame, passing command line arguments, and calling main. Yes, finally we found the answer to the question: what exactly _startcauses our main function!

Search our stack


We solved one riddle, but how does this bring us closer to the original goal - to run a simple C program in gdb? It remains to solve several problems: the first of them is related to how crt0our stack is configured.

As we saw above, gccit defaults to layout crt0. Default parameters are selected based on several factors:

  • Target triplet corresponding to the structure machine-vendor-operatingsystem. We have itriscv64-unknown-elf
  • Target architecture rv64imafdc
  • Target ABI, lp64d

Usually everything works fine, but not for every RISC-V processor. As mentioned earlier, one of the tasks crt0 is to configure the stack. But he does not know where exactly the stack should be for our CPU ( -machine)? He cannot do it without our help.

In the team qemu-system-riscv64 -machine virt -m 128M -gdb tcp::1234 -kernel a.outwe used the car virt. Fortunately, qemuit makes it easy to dump machine information into a dump dtb(device tree blob).

# Go to the ~/usys/riscv folder we created before and create a new dir 
# for our machine information.
cd ~/usys/riscv && mkdir machines
cd machines
# Use qemu to dump info about the 'virt' machine in dtb (device tree blob) 
# format.
# The data in this file represents hardware components of a given 
# machine / device / board.
qemu-system-riscv64 -machine virt -machine dumpdtb=riscv64-virt.dtb

Dtb data is hard to read because it is basically a binary format, but there is a command line utility dtc(device tree compiler) that can convert the file to something more readable.

# I'm running MacOS, so I use Homebrew to install this. If you're
# running another OS you may need to do something else.
brew install dtc
# Convert our .dtb into a human-readable .dts (device tree source) file.
dtc -I dtb -O dts -o riscv64-virt.dts riscv64-virt.dtb

The output file riscv64-virt.dts, where we see a lot of interesting information about virt: the number of processor cores available, the memory location of various peripheral devices, such as UART, the location of the internal memory (RAM). The stack should be in this memory, so let's look for it with grep:

grep memory riscv64-virt.dts -A 3
        memory@80000000 {
                device_type = "memory";
                reg = <0x00 0x80000000 0x00 0x8000000>;
        };

Как видим, у этого узла в качестве device_type указано 'memory'. Судя по всему, мы нашли то, что искали. По значениям внутри reg = <...> ; можно определить, где начинается банк памяти и какова его длина.

В спецификации devicetree видим, что синтаксис reg — это произвольное количество пар (base_address, length). Однако внутри reg четыре значения. Странно, разве для одного банка памяти не хватит двух значений?

Опять же из спецификации devicetree (поиск свойства reg) мы узнаём, что количество ячеек для указания адреса и длины определяется свойствами #address-cells и #size-cells в родительском узле (или в самом узле). Эти значения не указаны в нашем узле памяти, а родительский узел памяти — просто корневая часть файла. Поищем в ней эти значения:

head -n8 riscv64-virt.dts
/dts-v1/;
/ {
        #address-cells = <0x02>;
        #size-cells = <0x02>;
        compatible = "riscv-virtio";
        model = "riscv-virtio,qemu";

It turns out that both the address and the length require two 32-bit values. This means that with values, reg = <0x00 0x80000000 0x00 0x8000000>;our memory starts с 0x00 + 0x80000000 (0x80000000)and takes up a 0x00 + 0x8000000 (0x8000000)byte, that is, ends at an address 0x88000000, which corresponds to 128 megabytes.

Layout


Using qemuand, dtcwe found the RAM addresses in the virt virtual machine. We also know what gcccompiles by default crt0, without tuning the stack as we need. But how to use this information to eventually run and debug the program?

Since crt0we are not satisfied, there is one obvious option: write your own code, and then link it with the object file that we obtained after compiling our simple program. Ours crt0must know where the top of the stack begins in order to properly initialize it. We could hard code the value 0x80000000directly into crt0, but this is not a very suitable solution, taking into account changes that may be needed in the future. What if we want to use another CPU in the emulator, such assifive_ewith other characteristics?

Fortunately, we are not the first to ask this question, and a good solution already exists. The GNU linker ldallows you to define a character that is accessible from ours crt0. We can define a character __stack_topsuitable for different processors.

Instead of writing your own linker file from scratch, it makes sense to take the default script with ldand modify it a bit to support additional characters. What is a linker script? Here is a good description :

The main purpose of the linker script is to describe how file sections are matched in input and output, and to control the layout of the memory of the output file.

Knowing this, let's copy the default linker script riscv64-unknown-elf-ldto a new file:

cd ~/usys/riscv
# Make a new dir for custom linker scripts out RISC-V CPUs may require.
mkdir ld && cd ld
# Copy the default linker script into riscv64-virt.ld
riscv64-unknown-elf-ld --verbose > riscv64-virt.ld

This file has a lot of interesting information, much more than we can discuss in this article. Detailed key delivery --Verboseincludes version information ld, supported architectures, and more. This is all good to know, but such a syntax is unacceptable in the linker script, so open a text editor and delete everything superfluous from the file.

vim riscv64-virt.ld
# Remove everything above and including the ============= line
GNU ld (GNU Binutils) 2.32
  Supported emulations:
   elf64lriscv
   elf32lriscv
using internal linker script:
===================================================
/ * Script for -z combreloc: combine and sort reloc sections * /
/ * Copyright (C) 2014-2019 Free Software Foundation, Inc.
   Copying and distribution of this script, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved. * /
OUTPUT_FORMAT ("elf64-littleriscv", "elf64-littleriscv",
	      "elf64-littleriscv")
... rest of the linker script ...

After that, run the MEMORY command to manually determine where it will be __stack_top. Find the line that starts with OUTPUT_ARCH(riscv), it should be at the top of the file, and add the command below it MEMORY:

OUTPUT_ARCH(riscv)
/* >>> Our addition. <<< */
MEMORY
{
   /* qemu-system-risc64 virt machine */
   RAM (rwx)  : ORIGIN = 0x80000000, LENGTH = 128M 
}
/* >>> End of our addition. <<< */
ENTRY(_start)

We created a memory block under the name RAMfor which reading ( r), writing ( w), and storing executable code ( x) are permissible .

Great, we have identified a memory layout that matches the specifications of our virtRISC-V machine . Now you can use it. We want to put our stack in memory.

You need to define a symbol __stack_top. Open your linker script ( riscv64-virt.ld) in a text editor and add a few lines:

SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x10000));
  . = SEGMENT_START("text-segment", 0x10000) + SIZEOF_HEADERS;
  /* >>> Our addition. <<< */
  PROVIDE(__stack_top = ORIGIN(RAM) + LENGTH(RAM));
  /* >>> End of our addition. <<< */
  .interp         : { *(.interp) }
  .note.gnu.build-id  : { *(.note.gnu.build-id) }

As you can see, we define __stack_topusing the PROVIDE command . The symbol will be accessible from any program associated with this script (assuming that the program itself will not determine something with the name itself __stack_top). Set the value __stack_topas ORIGIN(RAM). We know that this value is equal to 0x80000000plus LENGTH(RAM), which is 128 megabytes ( 0x8000000bytes). This means that our __stack_topinstalled in 0x88000000.

For brevity, I will not list the entire linker file here ; you can view it here .

Stop! Hammertime! Runtime!


Now we have everything we need to create our C runtime. In fact, this is a fairly simple task, here is the whole file crt0.s:

.section .init, "ax"
.global _start
_start:
    .cfi_startproc
    .cfi_undefined ra
    .option push
    .option norelax
    la gp, __global_pointer$
    .option pop
    la sp, __stack_top
    add s0, sp, zero
    jal zero, main
    .cfi_endproc
    .end

Immediately attracts a large number of lines that start with a period. This is a file for assembler as. Lines with dots are called assembler directives : they provide information for assembler. This is not executable code, like RISC-V assembler instructions such as jaland add.

Let's go through the file line by line. We will work with various standard RISC-V registers, so check out this table , which covers all the registers and their purpose.

.section .init, "ax"

As indicated in the GNU assembler 'as' manual , this line tells the assembler to insert the following code into a section .initthat is allocated ( a) and executable ( x). This section is another common convention for running code within the operating system. We work on pure hardware without an OS, so in our case such an instruction may not be absolutely necessary, but in any case this is good practice.

.global _start
_start:

.globalmakes the following character available for ld. Without this, the link will not work, because the command ENTRY(_start)in the linker script points to the symbol _startas the entry point to the executable file. The next line tells the assembler that we are starting the character definition _start.

_start:
  .cfi_startproc
  .cfi_undefined ra
  ...other stuff...
  .cfi_endproc

These directives .cfiinform you about the structure of the frame and how to handle it. The directives .cfi_startprocboth .cfi_endprocsignal the beginning and end of the function, and .cfi_undefined rainforms the assembler that the register rashould not be restored to any value contained in it before starting _start.

.option push
.option norelax
la gp, __global_pointer$
.option pop

These directives .optionchange the assembler behavior according to the code when you need to apply a specific set of options. Here it is described in detail why the use .optionin this segment is important :

... since we possibly relax the addressing of sequences to shorter sequences relative to the GP, the initial loading of the GP should not be weakened and should be something like this:

.option push
.option norelax
la gp, __global_pointer$
.option pop

so that after relaxation you get the following code:

auipc gp, %pcrel_hi(__global_pointer$)
addi gp, gp, %pcrel_lo(__global_pointer$)

instead of simple:

addi gp, gp, 0

And now the last part of ours crt0.s:

_start:
  ...other stuff...
  la sp, __stack_top
  add s0, sp, zero
  jal zero, main
  .cfi_endproc
  .end

Here we can finally use the symbol __stack_topthat we worked so hard to create. Pseudo-instructionla (load address), loads the value __stack_topinto the register sp(stack pointer), setting it for use in the rest of the program.

It then add s0, sp, zeroadds the values ​​of the registers spand zero(which is actually a register x0with a hard reference to 0) and puts the result in the register s0. This is a special register that is unusual in several respects. Firstly, it is a “persistent register”, that is, it is saved when function calls. Secondly,s0sometimes acts as a frame pointer, which gives each function call a small space on the stack to hold the parameters passed to this function. How function calls work with the stack and frame pointers is a very interesting topic that you can easily devote to a separate article, but for now, just know that it is important to initialize the frame pointer in our runtime s0.

Next we see the instructions jal zero, main. Here jalmeans transition and layout (Jump And Link). The instruction expects operands in the form jal rd (destination register), offset_address. Functionally jalwrites the value of the next instruction (register pcplus four) to rd, and then sets the register pcto the current value pcplus the offset address with a character extension, effectively "calling" this address.

As mentioned above, it is x0tightly bound to the literal value 0, and writing to it is useless. Therefore, it may seem strange that we use a register as the destination register zero, which the RISC-V assemblers interpret as a register x0. After all, this means an unconditional transition to offset_address. Why do this, because in other architectures there is an explicit instruction for an unconditional transition?

This weird pattern jal zero, offset_addressis actually smart optimization. Support for each new instruction means an increase and, consequently, a rise in cost of the processor. Therefore, the simpler the ISA, the better. Instead of contaminate the space of two instructions instructions jaland unconditional jumparchitecture RISC-V only supportsjal, and unconditional jumps are supported through jal zero, main.

RISC-V has many such optimizations, most of which take the form of so-called pseudo - instructions . Assemblers know how to translate them into real hardware instructions. For example, j offset_addressRISC-V assemblers translate pseudo- instructions for unconditional jumps to jal zero, offset_address. For a complete list of officially supported pseudo instructions, see the RISC-V specification (version 2.2) .

_start:
  ...other stuff...
  jal zero, main
  .cfi_endproc
  .end

Our last line is the assembler directive .end, which simply marks the end of the file.

Debug but now for real


Trying to debug a simple C program on a RISC-V processor, we solved a lot of problems. First, using qemuand dtcfound our memory in the virtRISC-V virtual machine . Then we used this information to manually control the memory allocation in our version of the default script of the linker riscv64-unknown-elf-ld, which allowed us to accurately determine the symbol __stack_top. Then we used this symbol in our own version crt0.s, which sets up our stack and global pointers, and finally called the function main. Now you can achieve your goal and start debugging our simple program in GDB.

Recall here is the C program itself:

cat add.c
int main() {
    int a = 4;
    int b = 12;
    while (1) {
        int c = a + b;
    }
    return 0;
}

Compiling and linking:

riscv64-unknown-elf-gcc -g -ffreestanding -O0 -Wl,--gc-sections -nostartfiles -nostdlib -nodefaultlibs -Wl,-T,riscv64-virt.ld crt0.s add.c

Here we indicated a lot more flags than last time, so let's go through those that we have not described before.

-ffreestandingtells the compiler that the standard library may not exist , so there is no need to make assumptions about its mandatory presence. This parameter is not required when starting the application on its host (in the operating system), but in this case it is not, therefore it is important to inform the compiler of this information.

-Wl - A comma-separated list of flags to pass to the linker ( ld). Here, it --gc-sectionsmeans “garbage collection sections”, and ldis instructed to remove unused sections after linking. Flags -nostartfiles, -nostdliband -nodefaultlibstell the linker not to process the standard system startup files (for example, defaultcrt0), standard implementations of system stdlib and standard system default linked libraries. We have our own script crt0and linker, so it’s important to pass these flags so that the default values ​​do not conflict with our user preferences.

-Tindicates the path to our linker script, which is simple in our case riscv64-virt.ld. Finally, we specify the files that we want to compile, compile and compose: crt0.sand add.c. As before, the result is a complete and ready to run file called a.out.

Now run our pretty new brand new executable in qemu:

# -S freezes execution of our executable (-kernel) until we explicitly tell 
# it to start with a 'continue' or 'c' from our gdb client
qemu-system-riscv64 -machine virt -m 128M -gdb tcp::1234 -S -kernel a.out

Now run gdb, remember to load the debugging symbols for a.out, specifying it with the last argument:

riscv64-unknown-elf-gdb --tui a.out
GNU gdb (GDB) 8.2.90.20190228-git
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-apple-darwin17.7.0 --target=riscv64-unknown-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
    .
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...
(gdb)

Then connect our client gdbto the server gdbthat we launched as part of the command qemu:

(gdb) target remote :1234                                                                             │
Remote debugging using :1234

Set a breakpoint in main:

(gdb) b main
Breakpoint 1 at 0x8000001e: file add.c, line 2.

And start the program:

(gdb) c
Continuing.
Breakpoint 1, main () at add.c:2

From the given output it is clear that we successfully hit the breakpoint on line 2! This is also visible in the text interface, finally we have the correct line L, the value PC:is L2, and PC: - 0x8000001e. If you did everything as in the article, then the output will be something like this:



From now on, you can use it gdbas usual: -sto go to the next instruction, info all-registersto check the values ​​inside the registers as the program runs, etc. Experiment for your pleasure ... we, of course , worked a lot for this!

What's next?


Today we have achieved a lot and, I hope, have learned a lot! I never had a formal plan for this and subsequent articles, I just followed what was most interesting to me at every moment. Therefore, I’m not sure what will happen next. I especially liked the deep immersion in the manual jal, so maybe in the next article we will take as a basis the knowledge gained here, but replace it with add.csome program in pure RISC-V assembler. If you have something specific that you would like to see or have any questions, open tickets .

Thank you for reading! I hope to meet in the next article!

Additionally


If you liked the article and want to know more, check out Matt Godbolt's presentation titled “Bits Between Bits: How We Get into main ()” from the CppCon2018 conference. She approaches the topic a little differently than we are here. Really good lecture, see for yourself!

Also popular now: