# We program "Megaprocessor"

- Tutorial

At Geektimes in the summer there was an article about Megaprocessor - a processor made of discrete transistors and LEDs that weighs half a ton and occupies the entire living room in a regular townhouse near Cambridge. I decided to take advantage of my geographical proximity to this megaproject and program something presentable for it - for example, to sport my previous “Digital Rain” software for Megaprocessor .

The Megaprocessor command system is described on the developer's site .

Most instructions consist of a single byte, which can be followed by a direct operand (one or two bytes). There are only four general purpose registers (R0-R3), but they are not equal: for example, for memory access commands, the address must be either in R2 or in R3; and the operand is in one of the two remaining registers.

For programmers who are accustomed to the x86 or ARM instruction set, the Megaprocessor instruction set will seem extremely poor: there are no indirect base + offset addresses, nor direct operands for arithmetic instructions (except

So let's get started.

The program for Megaprocessor begins (at address 0) with a table of interrupt vectors. Four bytes are allocated to each of the four vectors. Starting from 0x10, the actual program code can be located. From 64KB of address space, the entire first half (up to address 0x8000) can be used by code; addresses 0xA000-0xA0FF correspond to a “display” - a discrete memory, each bit of which is equipped with an LED indicator. marksmade a mistake by writing "The amount of memory is 256 bytes." - This is the amount of "video memory", not the main memory for code and data.

Of the four interrupt vectors in our program, only a vector is used

The first thing after starting is to initialize the stack and variables. Let the stack grow down, starting at address 0x2000 - this is enough for us with a large margin. Only two variables will be needed:

Since

The main part of the program is an endless cycle in which we go from right to left for each “display column”, and shift the “drop” in it one position down. The lower two bits of the “drop” indicate its color (3 - “on”; 0, 1 or 2 - “off”), the remaining six bits - the coordinate (0..63), so “shift down” means the addition of 4. How only the "drop" crawled to the bottom of the "display" (the value exceeded 255), we replace it with a new random byte.

It is impossible to add 4 with one command, so we repeat twice

How will we generate random numbers? The RANDU that I used in the 32-bit Digital Rain is no longer suitable: Megaprocessor can only multiply 16-bit numbers; therefore, from the list of simple RNGs we take one where the multiplier is 16-bit. I liked the RNG, designated as “Turbo Pascal”.

This simple and pretty RNG returns the generated value to r0, but unfortunately, it spoils the values of all other registers. Note that in both cases, when we call

The last trick here is that the calculation

Now the most interesting thing is the second half of the cycle

To “light up” or “quench” the desired bit, you first need to calculate the address of the corresponding byte of “video memory”. Since we have the number of the "column" stored in

And here is the result:

There was one final touch: make the “drops” blink like on a GIF-KDPV. In fact, this means that the program will work twice as slow: at each iteration of the cycle it

So, the “drop" must be ignited if a) the two lower bits of its value are both set; b) we are on igniting half-iteration; - and extinguish in all other cases. The easiest way to implement all this is to replace a

At the beginning of the cycle,

The last thing to add is to set another bit on the igniting half-iteration, in byte at an offset of -4 from the “drop” itself:

The check

Now the drops are blinking, as intended:

Now, to try to run your own program on Megaprocessor, you need to agree with its creator about a visit to his home; but in a month, he said, Megaprocessor will move to the Cambridge Computer History Center , and will be available to the general public five days a week.

Good luck in megaprogramming!

The Megaprocessor command system is described on the developer's site .

Most instructions consist of a single byte, which can be followed by a direct operand (one or two bytes). There are only four general purpose registers (R0-R3), but they are not equal: for example, for memory access commands, the address must be either in R2 or in R3; and the operand is in one of the two remaining registers.

For programmers who are accustomed to the x86 or ARM instruction set, the Megaprocessor instruction set will seem extremely poor: there are no indirect base + offset addresses, nor direct operands for arithmetic instructions (except

`addq ±1`

, `addq ±2`

). But there are a couple of unexpected possibilities: a separate command `sqrt`

, and a mode `.wt`

for shift commands, which replaces the result with the sum of the extended bits. Thus, for example, a pair of commands`ld.b r1, #15; lsr.wt r0, r1`

calculate the number of single bits in `r0`

(a question so beloved by job interviewers!). Mnemonics `ld`

for a command that loads a direct value into the register (instead of the usual x86 or ARM mnemonics `mov`

) indicates a way to execute it: in fact, from the point of view of the processor, it is executed `ld.b r1, (pc++)`

. So let's get started.

The program for Megaprocessor begins (at address 0) with a table of interrupt vectors. Four bytes are allocated to each of the four vectors. Starting from 0x10, the actual program code can be located. From 64KB of address space, the entire first half (up to address 0x8000) can be used by code; addresses 0xA000-0xA0FF correspond to a “display” - a discrete memory, each bit of which is equipped with an LED indicator. marksmade a mistake by writing "The amount of memory is 256 bytes." - This is the amount of "video memory", not the main memory for code and data.

Of the four interrupt vectors in our program, only a vector is used

`reset`

, and in the remaining vectors there is a “stub" from one instruction `reti`

. (For programmers for x86 or ARM, the return command from the interrupt handler is familiar under mnemonics `iret`

.) None of these interrupts in our program can happen anyway, so that we could not even put “stubs” for them.```
reset: jmp start;
nop;
ext_int: reti;
nop;
nop;
nop;
div_zero: reti;
nop;
nop;
nop;
illegal: reti;
nop;
nop;
nop;
```

The first thing after starting is to initialize the stack and variables. Let the stack grow down, starting at address 0x2000 - this is enough for us with a large margin. Only two variables will be needed:

`seed`

for the current RNG value, and an array `position`

of 32 values - one for each “display column”, to keep track of where the “drop” creeps in this column. The array is initialized with just 32 random bytes. The command `jsr`

— a subroutine call — matches `call`

in x86 or `bl`

in ARM.```
start:
ld.w r0, #0x2000;
move sp, r0;
// set random positions
ld.b r1, #32;
init_loop:
jsr rand; // returns random value in r0
ld.b r2, #position;
add r2, r1;
st.b (r2), r0;
addq r1, #-1;
bne init_loop;
```

Since

`(#position + r1)`

it is impossible to write a byte to an address with one command, you must first calculate the address with a separate addition command. The main part of the program is an endless cycle in which we go from right to left for each “display column”, and shift the “drop” in it one position down. The lower two bits of the “drop” indicate its color (3 - “on”; 0, 1 or 2 - “off”), the remaining six bits - the coordinate (0..63), so “shift down” means the addition of 4. How only the "drop" crawled to the bottom of the "display" (the value exceeded 255), we replace it with a new random byte.

```
busy_loop:
ld.b r1, #32;
next_col:
ld.b r2, #position;
add r2, r1;
ld.b r0, (r2);
addq r0, #2;
addq r0, #2;
btst r0, #8;
beq save;
jsr rand;
save: st.b (r2), r0;
addq r1, #-1;
bmi busy_loop;
```

It is impossible to add 4 with one command, so we repeat twice

`addq r0, #2`

, and then check the eighth bit of the result to determine whether it has exceeded the value of 255. If it exceeded, then save a `position`

new random value to the array ; otherwise, we keep the old one, increased by 4. The conditional jump command `bmi`

goes to the beginning of the cycle `busy_loop`

if the result of the last action is negative, i.e. after processing the null column. How will we generate random numbers? The RANDU that I used in the 32-bit Digital Rain is no longer suitable: Megaprocessor can only multiply 16-bit numbers; therefore, from the list of simple RNGs we take one where the multiplier is 16-bit. I liked the RNG, designated as “Turbo Pascal”.

```
rand: ld.w r0, seed;
ld.w r1, #33797;
mulu;
addq r2, #1;
st.w seed, r2;
move r0, r2;
ret;
```

This simple and pretty RNG returns the generated value to r0, but unfortunately, it spoils the values of all other registers. Note that in both cases, when we call

`rand`

, we have the `r1`

index of the “display column” stored in it , and it needs to be saved and restored; and then there `r2`

should be an offset `(#position + r1)`

. So, you can put the calculation of this displacement inside `rand`

:```
rand: push r1; // !
ld.w r0, seed;
ld.w r1, #33797;
mulu;
addq r2, #1;
st.w seed, r2;
pop r1; // !
move r0, r2;
ld.b r2, #position; // !
add r2, r1; // !
ret;
start: ld.w r0, #0x2000;
move sp, r0;
// set random positions
ld.b r1, #32;
init_loop:
jsr rand;
st.b (r2), r0;
addq r1, #-1;
bne init_loop;
busy_loop:
ld.b r1, #32;
next_col:
ld.b r2, #position;
add r2, r1;
ld.b r0, (r2);
addq r0, #2;
addq r0, #2;
btst r0, #8;
beq save;
jsr rand;
save: st.b (r2), r0;
addq r1, #-1;
bmi busy_loop;
```

The last trick here is that the calculation

`ld.b r2, #position; add r2, r1;`

at the beginning of the loop `next_col`

can be replaced with a jump inside the subprogram `rand`

:```
rand: push r1;
ld.w r0, seed;
ld.w r1, #33797;
mulu;
addq r2, #1;
st.w seed, r2;
pop r1;
move r0, r2;
add_position:
ld.b r2, #position;
add r2, r1;
ret;
start: <...>
busy_loop:
ld.b r1, #32;
next_col:
jsr add_position; // !
ld.b r0, (r2);
addq r0, #2;
addq r0, #2;
btst r0, #8;
beq save;
jsr rand;
save: st.b (r2), r0;
addq r1, #-1;
bmi busy_loop;
```

Now the most interesting thing is the second half of the cycle

`next_col`

, which will draw a “drop” on the display.```
move r3, r1; // x (0..1f)
lsr r3, #3; // byte addr in row (0..3)
ld.b r2, #0xfc; // y mask
and r2, r0; // y * 4 (0..fc)
add r3, r2; // byte addr in screen
ld.w r2, #0xa000;
add r3, r2; // byte addr in memory
ld.b r2, #2;
lsr.wt r0, r2;
ld.b r2, #7;
and r2, r1; // bit index in byte (0..7)
lsl r2, #1;
lsr r0, #2;
roxr r2, #1;
ld.b r0, (r3);
// and now apply
test r2;
bpl blank;
bset r0, r2;
jmp apply;
blank: bclr r0, r2;
apply: st.b (r3), r0;
jmp next_col;
```

To “light up” or “quench” the desired bit, you first need to calculate the address of the corresponding byte of “video memory”. Since we have the number of the "column" stored in

`r1`

, and the position and "color" of the drop in `r0`

, the address of the byte is calculated as `(r1 >> 3) + (r0 & 0xfc) + 0xa000`

. After that, with the commands `ld.b r2, #2; lsr.wt r0, r2;`

we determine the color of the drop: if both the least significant bits in `r0`

were set, then as a result of these commands the `r0`

value will be 2; otherwise, the value is 0 or 1. Finally, in the lower three bits `r2`

we memorize the number of the desired “video memory” bit, and “push” the `r2`

drop color into the high-order bit by the sequence `lsl r2, #1; lsr r0, #2; roxr r2, #1;`

- the second command pushes the color bit out `r0`

into the CF flag, and the last (a cyclic shift to the right with involving CF) pushes this bit into`r2`

. When there are not enough registers for all the necessary values, you have to get thin! Finally, a byte is retrieved from the "video memory" at the desired address, and depending on the color bit, this byte is either set or the desired bit is reset. Command `bset`

and `bclr`

use only the least significant bits of its second operand, so a bit of color in the most significant bit `r2`

they do not interfere. We check this high-order bit with a sequence `test r2; bpl blank;`

- the conditional branch command `bpl`

executes the transition if the result of the last action is positive, i.e. a bit of color is taken off. And here is the result:

**Entire code**

```
reset: jmp start;
nop;
ext_int: reti;
nop;
nop;
nop;
div_zero: reti;
nop;
nop;
nop;
illegal: reti;
nop;
nop;
nop;
rand: push r1;
ld.w r0, seed;
ld.w r1, #33797;
mulu;
addq r2, #1;
st.w seed, r2;
pop r1;
move r0, r2;
add_position:
ld.b r2, #position;
add r2, r1;
ret;
start: ld.w r0, #0x2000;
move sp, r0;
// set random positions
ld.b r1, #32;
init_loop:
jsr rand;
st.b (r2), r0;
addq r1, #-1;
bne init_loop;
busy_loop:
ld.b r1, #32;
next_col:
jsr add_position;
ld.b r0, (r2);
addq r0, #2;
addq r0, #2;
btst r0, #8;
beq save;
jsr rand;
save: st.b (r2), r0;
addq r1, #-1;
bmi busy_loop;
move r3, r1; // x (0..1f)
lsr r3, #3; // byte addr in row (0..3)
ld.b r2, #0xfc; // y mask
and r2, r0; // y * 4 (0..fc)
add r3, r2; // byte addr in screen
ld.w r2, #0xa000;
add r3, r2; // byte addr in memory
ld.b r2, #2;
lsr.wt r0, r2;
ld.b r2, #7;
and r2, r1; // bit index in byte (0..7)
lsl r2, #1;
lsr r0, #2;
roxr r2, #1;
ld.b r0, (r3);
// and now apply
test r2;
bpl blank;
bset r0, r2;
jmp apply;
blank: bclr r0, r2;
apply: st.b (r3), r0;
jmp next_col;
seed: dw 1;
position:;
```

There was one final touch: make the “drops” blink like on a GIF-KDPV. In fact, this means that the program will work twice as slow: at each iteration of the cycle it

`busy_loop`

will first light up and then extinguish every “drop”. On the igniting half-iteration, it will be necessary to set two bits of video memory: both for the current position of the “drop” and for the previous (extinguished by the last half-iteration). So, the “drop" must be ignited if a) the two lower bits of its value are both set; b) we are on igniting half-iteration; - and extinguish in all other cases. The easiest way to implement all this is to replace a

`ld.b r2, #2; lsr.wt r0, r2;`

fixed value `#2`

with a variable in the sequence of commands that defines the color of the drop ( ) `flag`

, which will have a value of 2 on the igniting half-iteration, and 1 on the quenching one:```
busy_loop:
ld.b r1, #3; // !
ld.b r2, flag; // !
sub r1, r2; // !
st.b flag, r1; // !
ld.b r1, #32;
next_col:
jsr add_position;
ld.b r0, (r2);
ld.b r3, flag; // !
lsr r3, #1; // !
lsl r3, #2; // !
add r0, r3; // !
btst r0, #8;
beq save;
jsr rand;
save: st.b (r2), r0;
addq r1, #-1;
bmi busy_loop;
move r3, r1; // x (0..1f)
lsr r3, #3; // byte addr in row (0..3)
ld.b r2, #0xfc; // y mask
and r2, r0; // y * 4 (0..fc)
add r3, r2; // byte addr in screen
ld.w r2, #0xa000;
add r3, r2; // byte addr in memory
ld.b r2, flag; // !
lsr.wt r0, r2;
```

At the beginning of the cycle,

`busy_loop`

we subtract the current value `flag`

from 3, i.e. we change 2 to 1, and 1 to 2. Then, instead of moving the “drop” down at each iteration ( `addq r0, #2; addq r0, #2;`

), we add to the `r0`

value `(flag >> 1) << 2`

, i.e. 4 on the igniting half-iteration, and 0 on the quenching. The last thing to add is to set another bit on the igniting half-iteration, in byte at an offset of -4 from the “drop” itself:

```
// and now apply
test r2;
bpl blank;
bset r0, r2;
st.b (r3), r0; // !
addq r3, #-2; // !
addq r3, #-2; // !
btst r3, #8; // !
bne next_col; // !
ld.b r0, (r3); // !
bset r0, r2; // !
jmp apply;
blank: bclr r0, r2;
apply: st.b (r3), r0;
jmp next_col;
```

The check

`btst r3, #8; bne next_col;`

ensures that we do not go beyond the upper edge of the “display” and do not try to write something to the address 0x9FFx. Now the drops are blinking, as intended:

**Entire code**

```
reset: jmp start;
nop;
ext_int: reti;
nop;
nop;
nop;
div_zero: reti;
nop;
nop;
nop;
illegal: reti;
nop;
nop;
nop;
rand: push r1;
ld.w r0, seed;
ld.w r1, #33797;
mulu;
addq r2, #1;
st.w seed, r2;
pop r1;
move r0, r2;
add_position:
ld.b r2, #position;
add r2, r1;
ret;
start: ld.w r0, #0x2000;
move sp, r0;
// set random positions
ld.b r1, #32;
init_loop:
jsr rand;
st.b (r2), r0;
addq r1, #-1;
bne init_loop;
busy_loop:
ld.b r1, #3;
ld.b r2, flag;
sub r1, r2;
st.b flag, r1;
ld.b r1, #32;
next_col:
jsr add_position;
ld.b r0, (r2);
ld.b r3, flag;
lsr r3, #1;
lsl r3, #2;
add r0, r3;
btst r0, #8;
beq save;
jsr rand;
save: st.b (r2), r0;
addq r1, #-1;
bmi busy_loop;
move r3, r1; // x (0..1f)
lsr r3, #3; // byte addr in row (0..3)
ld.b r2, #0xfc; // y mask
and r2, r0; // y * 4 (0..fc)
add r3, r2; // byte addr in screen
ld.w r2, #0xa000;
add r3, r2; // byte addr in memory
ld.b r2, flag;
lsr.wt r0, r2;
ld.b r2, #7;
and r2, r1; // bit index in byte (0..7)
lsl r2, #1;
lsr r0, #2;
roxr r2, #1;
ld.b r0, (r3);
// and now apply
test r2;
bpl blank;
bset r0, r2;
st.b (r3), r0;
addq r3, #-2;
addq r3, #-2;
btst r3, #8;
bne next_col;
ld.b r0, (r3);
bset r0, r2;
jmp apply;
blank: bclr r0, r2;
apply: st.b (r3), r0;
jmp next_col;
seed: dw 1;
flag: db 2;
position:;
```

Now, to try to run your own program on Megaprocessor, you need to agree with its creator about a visit to his home; but in a month, he said, Megaprocessor will move to the Cambridge Computer History Center , and will be available to the general public five days a week.

Good luck in megaprogramming!