Development model on the example of Stack-based CPU

From the sandbox

Have you ever wondered how the processor works? Yes, yes, exactly the one that is in your PC / laptop / smartphone. In this article I want to give an example of a self-invented processor with a Verilog design. Verilog is not exactly the programming language it is similar to. This is the Hardware Description Language. The written code is not executed by anything (if you do not run it in the simulator, of course), but turns into a physical circuit design, or into a view perceived by the FPGA (Field Programmable Gate Array).

Disclaimer: this article is the result of work on the project at the university, so the time for work was limited and many parts of the project are still only in the initial stages of development.

Please note that the processor created in this article has little in common with modern widely used processors, but by its creation I tried to achieve a slightly different goal.

To truly understand the programming process, you need to understand how each of the tools used work: the language compiler / interpreter, the virtual machine, if it exists, the intermediate code, and, of course, the processor itself. Very often, people who study programming have been in the first stage for a long time - they only think about how the language and its compiler work. This often leads to errors, solutions to which are unknown to the beginning programmer, because he has no idea where the roots of these problems grow from. I myself saw several living examples where the situation was approximately as in the description above, so I decided to try to correct this situation and create a set of things that will help the beginning programmers understand all the steps.

This set consists of:

Actually invented language
Highlight plug-in for VS Code
Compiler to it
Instruction set
A simple processor capable of executing this instruction set (written in Verilog)

Once again I remind you that this article DOES NOT DESCRIBE ANYTHING LIKE THE MODERN REAL PROCESSOR, it describes a model that is easy to understand without going into detail.

Things you need if you want to do it yourself:

To run the CPU simulation, you need ModelSim, which you can download from the Intel site.

To run the OurLang compiler you need Java version> = 8.

Links to projects:
https://github.com/IamMaxim/OurCPU
https://github.com/IamMaxim/OurLang

Extension:
https://github.com/IamMaxim/ourlang-vscode

I usually use a bash script to build the Verilog part:

#/bin/bash
vlib work
vlog *.v
vsim -c testbench_1 -do "run; exit"

But the same can be repeated through the GUI.

Intellij IDEA is convenient to use with the compiler. The main thing is to keep track of which modules your module has in dependencies. I did not publish ready-made .jar, because I expect the reader to read the source code of the compiler.

Launched modules - Compiler and Interpreter. Everything is clear with the compiler, Interpreter is just a OurCPU simulator in Java, but we will not discuss it in this article.

Instruction set

I think it's better to start with the Instruction Set.

There are several instruction set architectures:

Stack-based is what is described in the article. A distinctive feature is that all operands are pushed onto the stack and dropped from the stack, which immediately excludes the possibility of parallelizing execution, but is one of the simplest approaches to working with data.
Accumulator-based - the bottom line is that there is only one register that stores the value that is modified by the instructions.
Register-based is what is used in modern processors, because it allows you to achieve maximum performance by applying various optimizations, including execution parallelization, pipelining, etc.

Our processor instruction set contains 30 instructions.

Next, I propose to look at the implementation of the processor:

The code consists of several modules:

CPU
Ram
Modules for each instruction

RAM is a module that contains the memory itself, as well as a way to access the data in it.

CPU - a module that directly controls the program execution progress: reads instructions, transfers control to the desired instruction, stores the necessary registers (pointer to the current instruction, etc.).

Practically all instructions work only with the stack, so it is enough just to execute them. Some (for example, putw, putb, jmp and jif) have an additional argument in the instruction itself. They need to give all the instructions so that they can read the necessary data.

Here is a diagram in general terms of the processor operation:

General principles of device programs at the instruction level

I think it's time to get acquainted with the device itself programs. As can be seen from the diagram above, after the execution of each instruction, the address goes to the next one. This gives a linear course of the program. When it becomes necessary to break this linearity (condition, loop, etc.), branch instructions are used (in our instruction set, this is jmp and jif).

When calling functions, we need to save the current state of everything, and for this purpose there are activation records — records that store this information. They are not tied to the processor itself or instructions, it is just a concept that is used by the compiler when generating code. The Activation record in OurLang has the following structure:

As can be seen from this scheme, local variables are also stored in the activation record, which allows you to calculate the address of a variable in memory at compile time, not at run time, and, thus, program execution is accelerated.

For function calls, our instruction set provides ways to work with two registers contained in the CPU module (operation pointer and activation address pointer) - putopa / popopa, putara / popara.

Compiler

And now let's take a look at the part closest to the final programmer - the compiler. In general, the compiler as a program consists of 3 parts:

Lexer
Parser
Compiler

Lexer is responsible for translating the source code of the program into lexical units understandable to the parser.

The parser builds an abstract syntax tree from these lexical items.

The compiler runs through this tree and generates some code consisting of low-level instructions. This can be either a bytecode or a processor-ready binary code.

In OurLang compiler, these parts are represented respectively by classes.

Lexer.java
Parser.java
Compiler.java

Tongue

OurLang is in its infancy, that is, it works, but so far there are not so many things in it and even the core part of the language is not completed. But to understand the essence of the current state of the compiler is already enough.

As an example of a program for understanding the syntax, this code fragment is proposed (it is also used to test the functional):

// single-line comments
/*
* Multi-line comments
*/ 
function print(int arg) {
    instr(putara, 0);
    instr(putw, 4);
    instr(add, 0);
    instr(lw, 0);
    instr(printword, 0);
}
function func1(int arg1, int arg2): int {
    print(arg1);
    print(arg2);
    if (arg1 == 0) {
        return arg2;
    } else {
        return func1(arg1 - 1, arg2);
    };
}
function main() {
    var i: int;
    i = func1(1, 10);
    if (i == 0) {
        i = 1;
    } else {
        i = 2;
    };
    print(i);
}

Focus on the language, I will not, leave it to your study. Through the compiler code, naturally;).

While writing it, I tried to make self-explaining code that is understandable without comments, so there shouldn't be any problems with understanding the compiler code.

And of course, the most interesting thing is to write code, and then watch what it turns into. Fortunately, the OurLang compiler generates assembly-like code with comments,
which will help not to get lost in what is going on inside.

I also recommend installing the extension for Visual Studio Code, it will facilitate the work with the language.

Good luck in learning the project!

Tags:

Development model on the example of Stack-based CPU

Instruction set

Compiler

Tongue

Also popular now: