tyomitch September 11, 2016 at 22:35

We program "Megaprocessor"

Tutorial

At Geektimes in the summer there was an article about Megaprocessor - a processor made of discrete transistors and LEDs that weighs half a ton and occupies the entire living room in a regular townhouse near Cambridge. I decided to take advantage of my geographical proximity to this megaproject and program something presentable for it - for example, to sport my previous “Digital Rain” software for Megaprocessor .

The Megaprocessor command system is described on the developer's site .

Most instructions consist of a single byte, which can be followed by a direct operand (one or two bytes). There are only four general purpose registers (R0-R3), but they are not equal: for example, for memory access commands, the address must be either in R2 or in R3; and the operand is in one of the two remaining registers.

For programmers who are accustomed to the x86 or ARM instruction set, the Megaprocessor instruction set will seem extremely poor: there are no indirect base + offset addresses, nor direct operands for arithmetic instructions (except addq ±1, addq ±2). But there are a couple of unexpected possibilities: a separate command sqrt, and a mode .wtfor shift commands, which replaces the result with the sum of the extended bits. Thus, for example, a pair of commandsld.b r1, #15; lsr.wt r0, r1calculate the number of single bits in r0(a question so beloved by job interviewers!). Mnemonics ldfor a command that loads a direct value into the register (instead of the usual x86 or ARM mnemonics mov) indicates a way to execute it: in fact, from the point of view of the processor, it is executed ld.b r1, (pc++).

So let's get started.

The program for Megaprocessor begins (at address 0) with a table of interrupt vectors. Four bytes are allocated to each of the four vectors. Starting from 0x10, the actual program code can be located. From 64KB of address space, the entire first half (up to address 0x8000) can be used by code; addresses 0xA000-0xA0FF correspond to a “display” - a discrete memory, each bit of which is equipped with an LED indicator. marksmade a mistake by writing "The amount of memory is 256 bytes." - This is the amount of "video memory", not the main memory for code and data.

Of the four interrupt vectors in our program, only a vector is used reset, and in the remaining vectors there is a “stub" from one instruction reti. (For programmers for x86 or ARM, the return command from the interrupt handler is familiar under mnemonics iret.) None of these interrupts in our program can happen anyway, so that we could not even put “stubs” for them.

reset:       jmp    start;
             nop;
ext_int:     reti;
             nop;
             nop;
             nop;        
div_zero:    reti;
             nop;
             nop;
             nop;        
illegal:     reti;
             nop;
             nop;
             nop;

The first thing after starting is to initialize the stack and variables. Let the stack grow down, starting at address 0x2000 - this is enough for us with a large margin. Only two variables will be needed: seedfor the current RNG value, and an array positionof 32 values - one for each “display column”, to keep track of where the “drop” creeps in this column. The array is initialized with just 32 random bytes. The command jsr— a subroutine call — matches callin x86 or blin ARM.

start:
        ld.w    r0, #0x2000;
        move    sp, r0;
        // set random positions
        ld.b    r1, #32;
init_loop:
        jsr     rand; // returns random value in r0
        ld.b    r2, #position;
        add     r2, r1;
        st.b    (r2), r0;
        addq    r1, #-1;
        bne     init_loop;

Since (#position + r1)it is impossible to write a byte to an address with one command, you must first calculate the address with a separate addition command.

The main part of the program is an endless cycle in which we go from right to left for each “display column”, and shift the “drop” in it one position down. The lower two bits of the “drop” indicate its color (3 - “on”; 0, 1 or 2 - “off”), the remaining six bits - the coordinate (0..63), so “shift down” means the addition of 4. How only the "drop" crawled to the bottom of the "display" (the value exceeded 255), we replace it with a new random byte.

busy_loop:
        ld.b    r1, #32;
next_col:
        ld.b    r2, #position;
        add     r2, r1;
        ld.b    r0, (r2);
        addq    r0, #2;
        addq    r0, #2;
        btst    r0, #8;
        beq     save;
        jsr     rand;
save:   st.b    (r2), r0;
        addq    r1, #-1;
        bmi     busy_loop;

It is impossible to add 4 with one command, so we repeat twice addq r0, #2, and then check the eighth bit of the result to determine whether it has exceeded the value of 255. If it exceeded, then save a positionnew random value to the array ; otherwise, we keep the old one, increased by 4. The conditional jump command bmigoes to the beginning of the cycle busy_loopif the result of the last action is negative, i.e. after processing the null column.

How will we generate random numbers? The RANDU that I used in the 32-bit Digital Rain is no longer suitable: Megaprocessor can only multiply 16-bit numbers; therefore, from the list of simple RNGs we take one where the multiplier is 16-bit. I liked the RNG, designated as “Turbo Pascal”.

rand:   ld.w    r0, seed;
        ld.w    r1, #33797;
        mulu;
        addq    r2, #1;
        st.w    seed, r2;
        move    r0, r2;
        ret;

This simple and pretty RNG returns the generated value to r0, but unfortunately, it spoils the values of all other registers. Note that in both cases, when we call rand, we have the r1index of the “display column” stored in it , and it needs to be saved and restored; and then there r2should be an offset (#position + r1). So, you can put the calculation of this displacement inside rand:

rand:   push    r1;            // !
        ld.w    r0, seed;
        ld.w    r1, #33797;
        mulu;
        addq    r2, #1;
        st.w    seed, r2;
        pop     r1;            // !
        move    r0, r2;
        ld.b    r2, #position; // !
        add     r2, r1;        // !
        ret;
start:  ld.w    r0, #0x2000;
        move    sp, r0;
        // set random positions
        ld.b    r1, #32;
init_loop:
        jsr     rand;
        st.b    (r2), r0;
        addq    r1, #-1;
        bne     init_loop;
busy_loop:
        ld.b    r1, #32;
next_col:
        ld.b    r2, #position;
        add     r2, r1;
        ld.b    r0, (r2);
        addq    r0, #2;
        addq    r0, #2;
        btst    r0, #8;
        beq     save;
        jsr     rand;
save:   st.b    (r2), r0;
        addq    r1, #-1;
        bmi     busy_loop;

The last trick here is that the calculation ld.b r2, #position; add r2, r1;at the beginning of the loop next_colcan be replaced with a jump inside the subprogram rand:

rand:   push    r1;
        ld.w    r0, seed;
        ld.w    r1, #33797;
        mulu;
        addq    r2, #1;
        st.w    seed, r2;
        pop     r1;
        move    r0, r2;
add_position:
        ld.b    r2, #position;
        add     r2, r1;
        ret;
start:  <...>
busy_loop:
        ld.b    r1, #32;
next_col:
        jsr     add_position; // !
        ld.b    r0, (r2);
        addq    r0, #2;
        addq    r0, #2;
        btst    r0, #8;
        beq     save;
        jsr     rand;
save:   st.b    (r2), r0;
        addq    r1, #-1;
        bmi     busy_loop;

Now the most interesting thing is the second half of the cycle next_col, which will draw a “drop” on the display.

        move    r3, r1; // x     (0..1f)
        lsr     r3, #3; // byte addr in row (0..3)
        ld.b    r2, #0xfc; // y mask
        and     r2, r0; // y * 4 (0..fc)
        add     r3, r2; // byte addr in screen
        ld.w    r2, #0xa000;
        add     r3, r2; // byte addr in memory
        ld.b    r2, #2;
        lsr.wt  r0, r2;
        ld.b    r2, #7;
        and     r2, r1; // bit index in byte (0..7)
        lsl     r2, #1;
        lsr     r0, #2;
        roxr    r2, #1;
        ld.b    r0, (r3);
        // and now apply
        test    r2;
        bpl     blank;
        bset    r0, r2;
        jmp     apply;
blank:  bclr    r0, r2;
apply:  st.b    (r3), r0;
        jmp     next_col;

To “light up” or “quench” the desired bit, you first need to calculate the address of the corresponding byte of “video memory”. Since we have the number of the "column" stored in r1, and the position and "color" of the drop in r0, the address of the byte is calculated as (r1 >> 3) + (r0 & 0xfc) + 0xa000. After that, with the commands ld.b r2, #2; lsr.wt r0, r2;we determine the color of the drop: if both the least significant bits in r0were set, then as a result of these commands the r0value will be 2; otherwise, the value is 0 or 1. Finally, in the lower three bits r2we memorize the number of the desired “video memory” bit, and “push” the r2drop color into the high-order bit by the sequence lsl r2, #1; lsr r0, #2; roxr r2, #1;- the second command pushes the color bit out r0into the CF flag, and the last (a cyclic shift to the right with involving CF) pushes this bit intor2. When there are not enough registers for all the necessary values, you have to get thin! Finally, a byte is retrieved from the "video memory" at the desired address, and depending on the color bit, this byte is either set or the desired bit is reset. Command bsetand bclruse only the least significant bits of its second operand, so a bit of color in the most significant bit r2they do not interfere. We check this high-order bit with a sequence test r2; bpl blank;- the conditional branch command bplexecutes the transition if the result of the last action is positive, i.e. a bit of color is taken off.

And here is the result:

Entire code

reset:       jmp    start;
             nop;
ext_int:     reti;
             nop;
             nop;
             nop;        
div_zero:    reti;
             nop;
             nop;
             nop;        
illegal:     reti;
             nop;
             nop;
             nop;
rand:   push    r1;
        ld.w    r0, seed;
        ld.w    r1, #33797;
        mulu;
        addq    r2, #1;
        st.w    seed, r2;
        pop     r1;
        move    r0, r2;
add_position:
        ld.b    r2, #position;
        add     r2, r1;
        ret;
start:  ld.w    r0, #0x2000;
        move    sp, r0;
        // set random positions
        ld.b    r1, #32;
init_loop:
        jsr     rand;
        st.b    (r2), r0;
        addq    r1, #-1;
        bne     init_loop;
busy_loop:
        ld.b    r1, #32;
next_col:
        jsr     add_position;
        ld.b    r0, (r2);
        addq    r0, #2;
        addq    r0, #2;
        btst    r0, #8;
        beq     save;
        jsr     rand;
save:   st.b    (r2), r0;
        addq    r1, #-1;
        bmi     busy_loop;
        move    r3, r1; // x     (0..1f)
        lsr     r3, #3; // byte addr in row (0..3)
        ld.b    r2, #0xfc; // y mask
        and     r2, r0; // y * 4 (0..fc)
        add     r3, r2; // byte addr in screen
        ld.w    r2, #0xa000;
        add     r3, r2; // byte addr in memory
        ld.b    r2, #2;
        lsr.wt  r0, r2;
        ld.b    r2, #7;
        and     r2, r1; // bit index in byte (0..7)
        lsl     r2, #1;
        lsr     r0, #2;
        roxr    r2, #1;
        ld.b    r0, (r3);
        // and now apply
        test    r2;
        bpl     blank;
        bset    r0, r2;
        jmp     apply;
blank:  bclr    r0, r2;
apply:  st.b    (r3), r0;
        jmp     next_col;
seed:   dw 1;
position:;

There was one final touch: make the “drops” blink like on a GIF-KDPV. In fact, this means that the program will work twice as slow: at each iteration of the cycle it busy_loopwill first light up and then extinguish every “drop”. On the igniting half-iteration, it will be necessary to set two bits of video memory: both for the current position of the “drop” and for the previous (extinguished by the last half-iteration).

So, the “drop" must be ignited if a) the two lower bits of its value are both set; b) we are on igniting half-iteration; - and extinguish in all other cases. The easiest way to implement all this is to replace a ld.b r2, #2; lsr.wt r0, r2;fixed value #2with a variable in the sequence of commands that defines the color of the drop ( ) flag, which will have a value of 2 on the igniting half-iteration, and 1 on the quenching one:

busy_loop:
        ld.b    r1, #3;   // !
        ld.b    r2, flag; // !
        sub     r1, r2;   // !
        st.b    flag, r1; // !
        ld.b    r1, #32;
next_col:
        jsr     add_position;
        ld.b    r0, (r2);
        ld.b    r3, flag; // !
        lsr     r3, #1;   // !
        lsl     r3, #2;   // !
        add     r0, r3;   // !
        btst    r0, #8;
        beq     save;
        jsr     rand;
save:   st.b    (r2), r0;
        addq    r1, #-1;
        bmi     busy_loop;
        move    r3, r1; // x     (0..1f)
        lsr     r3, #3; // byte addr in row (0..3)
        ld.b    r2, #0xfc; // y mask
        and     r2, r0; // y * 4 (0..fc)
        add     r3, r2; // byte addr in screen
        ld.w    r2, #0xa000;
        add     r3, r2; // byte addr in memory
        ld.b    r2, flag;  // !
        lsr.wt  r0, r2;

At the beginning of the cycle, busy_loopwe subtract the current value flagfrom 3, i.e. we change 2 to 1, and 1 to 2. Then, instead of moving the “drop” down at each iteration ( addq r0, #2; addq r0, #2;), we add to the r0value (flag >> 1) << 2, i.e. 4 on the igniting half-iteration, and 0 on the quenching.

The last thing to add is to set another bit on the igniting half-iteration, in byte at an offset of -4 from the “drop” itself:

        // and now apply
        test    r2;
        bpl     blank;
        bset    r0, r2;
        st.b    (r3), r0;  // !
        addq    r3, #-2;   // !
        addq    r3, #-2;   // !
        btst    r3, #8;    // !
        bne     next_col;  // !
        ld.b    r0, (r3);  // !
        bset    r0, r2;    // !
        jmp     apply;
blank:  bclr    r0, r2;
apply:  st.b    (r3), r0;
        jmp     next_col;

The check btst r3, #8; bne next_col;ensures that we do not go beyond the upper edge of the “display” and do not try to write something to the address 0x9FFx.

Now the drops are blinking, as intended:

Entire code

reset:       jmp    start;
             nop;
ext_int:     reti;
             nop;
             nop;
             nop;        
div_zero:    reti;
             nop;
             nop;
             nop;        
illegal:     reti;
             nop;
             nop;
             nop;
rand:   push    r1;
        ld.w    r0, seed;
        ld.w    r1, #33797;
        mulu;
        addq    r2, #1;
        st.w    seed, r2;
        pop     r1;
        move    r0, r2;
add_position:
        ld.b    r2, #position;
        add     r2, r1;
        ret;
start:  ld.w    r0, #0x2000;
        move    sp, r0;
        // set random positions
        ld.b    r1, #32;
init_loop:
        jsr     rand;
        st.b    (r2), r0;
        addq    r1, #-1;
        bne     init_loop;
busy_loop:
        ld.b    r1, #3;
        ld.b    r2, flag;
        sub     r1, r2;
        st.b    flag, r1;
        ld.b    r1, #32;
next_col:
        jsr     add_position;
        ld.b    r0, (r2);
        ld.b    r3, flag;
        lsr     r3, #1;
        lsl     r3, #2;
        add     r0, r3;
        btst    r0, #8;
        beq     save;
        jsr     rand;
save:   st.b    (r2), r0;
        addq    r1, #-1;
        bmi     busy_loop;
        move    r3, r1; // x     (0..1f)
        lsr     r3, #3; // byte addr in row (0..3)
        ld.b    r2, #0xfc; // y mask
        and     r2, r0; // y * 4 (0..fc)
        add     r3, r2; // byte addr in screen
        ld.w    r2, #0xa000;
        add     r3, r2; // byte addr in memory
        ld.b    r2, flag;
        lsr.wt  r0, r2;
        ld.b    r2, #7;
        and     r2, r1; // bit index in byte (0..7)
        lsl     r2, #1;
        lsr     r0, #2;
        roxr    r2, #1;
        ld.b    r0, (r3);
        // and now apply
        test    r2;
        bpl     blank;
        bset    r0, r2;
        st.b    (r3), r0;
        addq    r3, #-2;
        addq    r3, #-2;
        btst    r3, #8;
        bne     next_col;
        ld.b    r0, (r3);
        bset    r0, r2;
        jmp     apply;
blank:  bclr    r0, r2;
apply:  st.b    (r3), r0;
        jmp     next_col;
seed:   dw 1;
flag:   db 2;
position:;

Now, to try to run your own program on Megaprocessor, you need to agree with its creator about a visit to his home; but in a month, he said, Megaprocessor will move to the Cambridge Computer History Center , and will be available to the general public five days a week.

Good luck in megaprogramming!

Tags:

We program "Megaprocessor"

Also popular now: