zserge November 24, 2011 at 22:13

Forth CPU. What it is? (Part 1)

On Habré there were few posts about Fort, and impressed by Moore ’s recent excellent work , I’ll tell you about another fort-processor - J1 .
This is probably the most minimalistic processor that can be used in practice.
You can do it yourself on FPGA, but I, as a person far from electronics, will try to write his emulator. And, to add abnormalities to the post, I will write the emulator in Go.

Hardware Features

The processor is 16-bit, it can address up to 32KB of memory (why not understand 64Kb below). The code and data are in different memory areas, the maximum size of the program running on this processor is also up to 32Kb.

The architecture is stacked, the processor does not contain a single register (which could be explicitly accessed). But it contains two stacks:

- a data stack - designed for computing
- a call stack - designed to store the return address when calling functions. Sometimes used as temporary storage.

Both stacks are shallow, 33 elements each. Stacks, of course, are also 16-bit.

It's all. Instead of I / O ports, it is proposed to use memory-mapped I / O. To do this, it is recommended to reserve the upper 16Kb of memory.

Emulator

I do not pretend to be ideologically correct code on Go, but I will try to make it clear to people even unfamiliar with the language.

Let us describe the processor structure:

type J1Stack [33]uint16
type J1Cpu struct {
	pc  uint16
	r   J1Stack
	s   J1Stack
	rp  int
	sp  int
	mem []uint16
}

Total, our emulated processor inside has a pc instruction counter, data stack s, call stack r, pointers to the tops of these stacks sp and rp, respectively, and internal memory (by the way, memory is also only 16-bit, because byte cannot be accessed).

We write a constructor function for the J1Cpu type:

func NewJ1Cpu(memSize int) (j1 *J1Cpu) {
	j1 = new(J1Cpu)
	j1.mem = make([]uint16, memSize)
	j1.sp, j1.rp = -1, -1
	return j1
}

In the constructor, allocate as much memory as indicated, and initialize the stacks.

Instructions

The processor has 5 instructions:

They work as follows:
LIT X: put the value X on the top of the stack s. Since the 1st bit indicates that it is a literal, only 15 bits are allocated to the value. Hence the 15-bit addressing of code and data.
JMP X: go to address X
JZ X: if the value s is at the top of stack s, then go to address X
CALL X: put the value PC + 1 on the top of stack r (next instruction), and go to address X.
ALU: here a little more complicated, I’ll talk about it below.

We implement the execution of these instructions in the Exec method:

const (
	J1_OP  = (7 << 13)
	J1_ARG = 0x1fff
	J1_JMP      = (0 << 13)
	J1_JZ       = (1 << 13)
	J1_CALL     = (2 << 13)
	J1_ALU      = (3 << 13)
	J1_LIT      = (4 << 13)
	J1_LIT_MASK = 0x7fff
)
func (j1 *J1Cpu) Exec(op uint16) {
	if op & J1_LIT != 0 {
		j1.sp++
		j1.s[j1.sp] = op & J1_LIT_MASK
		j1.pc++
		return
	}
	arg := op & J1_ARG
	switch op & J1_OP {
	case J1_JMP:
		j1.pc = arg
	case J1_JZ:
		if j1.s[j1.sp] == 0 {
			j1.pc = arg
		}
		j1.sp--
	case J1_CALL:
		j1.rp++
		j1.r[j1.rp] = j1.pc + 1
		j1.pc = arg
	case J1_ALU:
		j1.execAlu(arg)
	}
}

By changing the pointers to the top of the stack, I use pre-increment and post-decrement. However, in Go you cannot use them as in C (stack [++ sp] = x; ... x = stack [sp--]). Because you need to specify such things explicitly.

It remains to implement the execAlu method - and the job is done!

J1 ALU

This is the most powerful instruction. Let's first decide on the symbols that the authors of the processor use.

T is the element at the top of the stack at the beginning of the execution of the instruction
N is the element following T at the beginning of the execution of the instruction
T 'is the result of arithmetic operations
R is the element at the top of the call stack at the beginning of the execution of the
PC instruction is the processor instruction counter value
[T] is an indirect memory access via the address T of

Arithmetic instructions is a total of 16:
0: T '= T
1: T' = N
2: T '= T + N
3: T' = T and N
4: T '= T or N
5: T' = T xor N
6: T '= ~ T (bitwise inversion)
7: T' = (T == N)
8: T '= (N <T)
9: T '= N >> T (right shift by T bits)
10: T' = T-1
11: T '= R
12: T' = [T]
13: T '= N << T
14: T '= depth (data stack depth)
15: T' = (N <T), this time unsigned comparison

This is how the implementation of these operations will look:

func (j1 *J1Cpu) execAlu(op uint16) {
	var res, t, n, r uint16
	if j1.sp >= 0 {
		t = j1.s[j1.sp]
	}
	if j1.sp > 0 {
		n = j1.s[j1.sp-1]
	}
	if j1.rp > 0 {
		r = j1.r[j1.rp-1]
	}
	j1.pc++
	code := (op & (0xf << 8)) >> 8
	switch code {
	case 0:
		res = t
	case 1:
		res = n
	case 2:
		res = t + n
	case 3:
		res = t & n
	case 4:
		res = t | n
	case 5:
		res = t ^ n
	case 6:
		res = ^t
	case 7:
		if n == t {
			res = 1
		}
	case 8:
		if int16(n) < int16(t) {
			res = 1
		}
	case 9:
		res = n >> t
	case 10:
		res = t - 1
	case 11:
		res = j1.r[r]
	case 12:
		res = j1.mem[t]
	case 13:
		res = n << t
	case 14:
		res = uint16(j1.sp + 1)
	case 15:
		if n < t {
			res = 1
		}
	}
	ds := op & 3
	rs := (op & (3 << 2)) >> 2
	if ds == 1 {
		j1.sp++
	}
	if ds == 2 {
		j1.sp--
	}
	if rs == 1 {
		j1.rp++
	}
	if rs == 2 {
		j1.rp--
	}
	if op&(1<<5) != 0 {
		println("N->[T]")
		j1.mem[t] = n
	}
	if op&(1<<6) != 0 {
		println("T->R")
		if j1.rp < 0 {
			panic("Call stack underrun")
		}
		j1.r[j1.rp] = t
	}
	if op&(1<<7) != 0 {
		println("T->N")
		if j1.sp > 0 {
			j1.s[j1.sp-1] = t
		}
	}
	if op&(1<<12) != 0 {
		println("R->PC")
		j1.pc = j1.r[r]
	}
	if j1.sp >= 0 {
		j1.s[j1.sp] = res
	}
}

The function turned out to be so large because I do not parse the command parameters very optimally. In general terms, it works simply - it receives the current values of T, N, R, calculates the new value of T ', shifts the pointers to the top of the stacks and sets the new value of T.
Now this is exactly all that the processor can do.

For example, adding two numbers is three
LIT 5 commands (on the stack - 5)
LIT 7 (on the stack - 5 7)
ALU OP = 2 (addition), DS = -1 (that is, add and reduce the stack pointer by 1. On the stack, 12).

Interestingly, to exit the function does not require a separate instruction. Just in the last instruction the flag “R-> PC” is added. This, according to the developers, reduces the amount of code by about 7%.

And where does the Forth actually?

The fact is that this set of instructions is optimized specifically for the Fort language. Most of the words in this language correspond to single instructions J1. But a brief introduction to the Fort as applied to the J1 processor will be in the next article, but now I’ll leave a link to the full emulator sources .
Who already knows the fort - cross-compiler for J1 can be found here (crossj1.fs).

Tags: