
Compilation. 6: intermediate code
The first stage - analysis of the syntax of our j-squeak - passed; getting to the code generation.
Let's start by generating p-code (an intermediate portable pseudo-code) - something like an "abstract machine language." He is chosen so that
Often the p-code is made stack (for example, MSIL): imagine a processor inside which there are no numbered registers, but there is one large stack, and all actions are performed with upper values on the stack. (Those who have programmed for x87 are familiar with this model.) Stack code is really convenient to generate and convenient to execute, but not very convenient to process - for example, it is difficult to track dependencies between teams.
The creator of the Beep language expressively speaks about the choice between the stack and register intermediate code :
So, instead of the stack, we will take a three-address code in the spirit of MIPS:
In addition to the commands corresponding to computational operations, we also need:
Working with memory is not yet relevant for us: if not all variables can be placed in registers, then the programmer was out of luck. Operations that are not yet in the j-script, such as calling functions, all the more, we do not put them in the p-code.
We arrange the opcodes so that commands of a similar structure (in terms of the registers used) go in a row, and we take out the definition of the command structure in a separate file
All variables in the j-creak are global. In a separate,
In general, the idea is the same as with a formatted printout; only now in each node of the tree we store not a text, but a list of commands. For expression elements, we also store the register number in which the result of the calculation.
We take the parser and let's go: we replace the method
(340 lines of code were posted at http://tyomitch.net.ru/jsk.y.html to keep the post size within the acceptable range.)
As you can see, we needed to “split” the node
We map the file with the p-code into memory, and we will work with it as an array of structures
Now, in each node of the tree, a list of all the commands corresponding to it is stored, and each implementation
The most dramatic difference is that now we don’t need a tree at all: for each symbol it is enough to store one scalar . Moreover: operator symbols now have no meaning at all - the whole result of their analysis immediately falls into the vector of the finished p-code; therefore, convolutions where code is not generated do not even need to be inherited.
A small problem arises in constructions of the type
To insert a dummy jump after checking the conditions in
The trick is that the convolution
The distance to jump forward now we know; But how to
An additional advantage of the new system is that for each generated command its address is immediately known, so that for strings we can store not temporary “identifiers”, but lists of all links. This will simplify the substitution of line addresses at the final stage of p-code generation. The compiler code is almost halved, and it compiles in exactly the same way as before. If it makes no difference, why write more?
Let's start by generating p-code (an intermediate portable pseudo-code) - something like an "abstract machine language." He is chosen so that
- it was easy to generate;
- It was easy to handle.
Further in the post:
- Code selection
- Compilation
- Performance
- Backpatching
Code selection
Often the p-code is made stack (for example, MSIL): imagine a processor inside which there are no numbered registers, but there is one large stack, and all actions are performed with upper values on the stack. (Those who have programmed for x87 are familiar with this model.) Stack code is really convenient to generate and convenient to execute, but not very convenient to process - for example, it is difficult to track dependencies between teams.
The creator of the Beep language expressively speaks about the choice between the stack and register intermediate code :
Register without options:
Simplification of runtime. Less manipulation with pointers. There is no concept of stack overflow. Less code, less memory work - less space for critical errors.
Compilation complexity increases: a phase of register allocation appears. In the case of execution on a virtual machine, the number of registers is not important for us, we can make enough of them to not allocate at all, but simply map all the parameters and function variables to the registers (see Lua). If the number of parameters exceeds the number of registers, then you can select the activation record part in heap, but it is easier to make the compiler offer the author of such a code to heal his head.
In any case, if there is a question of simplifying runtime at the cost of complicating the compiler, this should be done.
Optimization opportunity: mapping of N registers of a virtual machine to processor registers. On a stacked machine, this is much more difficult.
So, instead of the stack, we will take a three-address code in the spirit of MIPS:
- All commands are of equal length (we have 4 bytes each)
- The first byte is the command number (by a separate opcode for each possible operation)
- The second byte is the register number for the result (256 registers are enough for anyone!)
- The remainder is either two numbers of the operand registers, or the immediate value is the operand.
In addition to the commands corresponding to computational operations, we also need:
- loading the direct value into the register;
- reading from memory to register;
- write from register to memory;
- conditional jump;
- the output of a string or number;
- entering a number;
- stop.
Working with memory is not yet relevant for us: if not all variables can be placed in registers, then the programmer was out of luck. Operations that are not yet in the j-script, such as calling functions, all the more, we do not put them in the p-code.
We arrange the opcodes so that commands of a similar structure (in terms of the registers used) go in a row, and we take out the definition of the command structure in a separate file
jsk.h
: both the compiler and the interpreter will need it.
So that one byte is really allocated under the opcode, you must specify the key during compilationtypedef unsigned char regnum;
struct command {
enum opcodes {
hlt,
store, // dst>
jz, // dst>
echo, // dst>
mov, // >dst
load, // >dst
input, // >dst
add, // src>dst
sub, // src>dst
mul, // src>dst
div, // src>dst
eq, // src>dst
ne, // src>dst
ge, // src>dst
le, // src>dst
gt, // src>dst
lt // src>dst
};
opcodes opcode;
regnum dest;
union {
struct {
regnum src1, src2;
};
short imm;
};
command(opcodes opcode, regnum dest, regnum src1, regnum src2) :
opcode(opcode), dest(dest), src1(src1), src2(src2) {}
command(opcodes opcode, regnum dest, short imm) :
opcode(opcode), dest(dest), imm(imm) {}
};
-fshort-enums
Compilation
echo
We will store the
lines for along with the program code, at the very end; combine the same lines so that only one copy is stored. To do this, we will store map
all lines where the value is the "identifier" of the line (its serial number in the program). All variables in the j-creak are global. In a separate,
map
we will store by the name of the variable the number of the register allocated to it. In general, the idea is the same as with a formatted printout; only now in each node of the tree we store not a text, but a list of commands. For expression elements, we also store the register number in which the result of the calculation.
We take the parser and let's go: we replace the method
print()
with a similar one that essentially emit()
glues the lists of commands of the child nodes into one big list.(340 lines of code were posted at http://tyomitch.net.ru/jsk.y.html to keep the post size within the acceptable range.)
As you can see, we needed to “split” the node
value
into three different types: literal-number, literal- string, and variable reference. When formatting the code, the differences between them were insignificant, but during generation it was already necessary to distinguish them.Performance
We map the file with the p-code into memory, and we will work with it as an array of structures
command
. Actually, execution is a cycle of 4 lines, and the implementation of command functions; most of the code is helper husks.
Fanfare! We launch the world's first program on a j-script:
If you look closely at the dump, you will notice that the first byte is occupied by the p-code (four bytes), and the rest of the file is the lines in UTF-8.#include
#include
#include
#include
#include
#include "jsk.h"
int pc = 0; // индекс команды (не смещение)
bool halted = false;
int mem[1000]; // размер не важен: всё равно пока не пользуемся
typedef int (*op)(int src1, int src2, int dest, int imm); // все возможные входные значения
const char* fdata = NULL; // весь прочитанный п-код
extern op ops[]; // объявлен ниже
int main(int argc, char** argv) {
if(argc!=2) {
printf("Missing pcode file name.\n");
exit(1);
}
int fd = open(argv[1], O_RDONLY);
if (fd<0) {
printf("Cannot open pcode file.\n");
exit(1);
}
struct stat finfo;
fstat(fd, &finfo);
fdata = (const char*)mmap(0, finfo.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (!fdata) {
printf("Cannot read pcode file.\n");
exit(1);
}
const command* pcode = (const command*) fdata;
int r[256] = {0}; // registers
while (!halted) {
command next = pcode[pc++];
r[next.dest] = ops[next.opcode](r[next.src1], r[next.src2], r[next.dest], next.imm);
}
munmap((void*)fdata, finfo.st_size);
close(fd);
return 0;
}
int hlt(int src1, int src2, int dest, int imm) { halted = true; return dest; }
int store(int src1, int src2, int dest, int imm) { mem[imm] = dest; return dest; }
int jz(int src1, int src2, int dest, int imm) { if (!dest) pc+=imm; return dest; }
int echo(int src1, int src2, int dest, int imm) { if (imm) printf("%s", fdata+imm); else printf("%d", dest); return dest; }
int mov(int src1, int src2, int dest, int imm) { return imm; }
int load(int src1, int src2, int dest, int imm) { return mem[imm]; }
int input(int src1, int src2, int dest, int imm) { int d; scanf(" %d", &d); return d; }
int add(int src1, int src2, int dest, int imm) { return src1+src2; }
int sub(int src1, int src2, int dest, int imm) { return src1-src2; }
int mul(int src1, int src2, int dest, int imm) { return src1*src2; }
int div(int src1, int src2, int dest, int imm) { return src1/src2; }
int eq(int src1, int src2, int dest, int imm) { return src1==src2; }
int ne(int src1, int src2, int dest, int imm) { return src1!=src2; }
int ge(int src1, int src2, int dest, int imm) { return src1>=src2; }
int le(int src1, int src2, int dest, int imm) { return src1<=src2; }
int gt(int src1, int src2, int dest, int imm) { return src1>src2; }
int lt(int src1, int src2, int dest, int imm) { return src1
op ops[] = {hlt, store, jz, echo, mov, load, input, add, sub, mul, div, eq, ne, ge, le, gt, lt};
[tyomitch@home ~]$ bison jsk.y
[tyomitch@home ~]$ c++ -fshort-enums jsk.tab.c lex.yy.c -o jskc
[tyomitch@home ~]$ c++ -fshort-enums jsk.c -o jsk
[tyomitch@home ~]$ ./jskc < test.jsk > pcode
[tyomitch@home ~]$ hexdump -C pcode
00000000 04 01 00 00 04 02 e8 03 03 00 26 01 03 01 00 00 |..........&.....|
00000010 03 00 a0 00 03 02 00 00 03 00 a7 00 0e 03 01 02 |................|
00000020 02 03 1d 00 07 04 01 02 04 05 02 00 0a 06 04 05 |................|
00000030 03 00 62 01 03 06 00 00 03 00 cc 00 06 07 00 00 |..b.............|
00000040 04 08 01 00 0b 09 07 08 02 09 04 00 04 0a 01 00 |................|
00000050 08 0b 06 0a 07 02 0b 00 02 00 0e 00 04 0c 02 00 |................|
00000060 0b 0d 07 0c 02 0d 04 00 04 0e 01 00 07 0f 06 0e |................|
00000070 07 01 0f 00 02 00 07 00 04 10 03 00 0b 11 07 10 |................|
00000080 02 11 03 00 03 00 46 01 00 00 00 00 02 00 01 00 |......F.........|
00000090 03 00 6a 01 02 00 e1 ff 03 00 ff 00 00 00 00 00 |..j.............|
000000a0 20 d0 b4 d0 be 20 00 2c 20 d0 b0 20 d1 8f 20 d0 | .... ., .. .. .|
000000b0 b1 d1 83 d0 b4 d1 83 20 d1 83 d0 b3 d0 b0 d0 b4 |....... ........|
000000c0 d1 8b d0 b2 d0 b0 d1 82 d1 8c 0a 00 3f 20 20 28 |............? (|
000000d0 31 3d d0 bc d0 b5 d0 bd d1 8c d1 88 d0 b5 2c 20 |1=............, |
000000e0 32 3d d0 b1 d0 be d0 bb d1 8c d1 88 d0 b5 2c 20 |2=............, |
000000f0 33 3d d0 bf d0 be d0 bf d0 b0 d0 bb 29 20 00 d0 |3=..........) ..|
00000100 92 d1 80 d1 91 d1 88 d1 8c 2c 20 d1 82 d0 b0 d0 |........., .....|
00000110 ba 20 d0 bd d0 b5 20 d0 b1 d1 8b d0 b2 d0 b0 d0 |. .... .........|
00000120 b5 d1 82 21 0a 00 d0 97 d0 b0 d0 b4 d1 83 d0 bc |...!............|
00000130 d0 b0 d0 b9 20 d1 87 d0 b8 d1 81 d0 bb d0 be 20 |.... .......... |
00000140 d0 be d1 82 20 00 d0 a3 d1 80 d0 b0 21 20 d0 af |.... .......! ..|
00000150 20 d0 bc d0 be d0 bb d0 be d0 b4 d0 b5 d1 86 21 | ..............!|
00000160 0a 00 d0 ad d1 82 d0 be 20 00 d0 af 20 d0 bd d0 |........ ... ...|
00000170 b8 d1 87 d0 b5 d0 b3 d0 be 20 d0 bd d0 b5 20 d0 |......... .... .|
00000180 bf d0 be d0 bd d1 8f d0 bb 21 0a 00 |.........!..|
0000018c
[tyomitch@home ~]$ ./jsk pcode
Задумай число от 0 до 1000, а я буду угадывать
Это 500? (1=меньше, 2=больше, 3=попал) 2
Это 750? (1=меньше, 2=больше, 3=попал) 2
Это 875? (1=меньше, 2=больше, 3=попал) 2
Это 938? (1=меньше, 2=больше, 3=попал) 1
Это 906? (1=меньше, 2=больше, 3=попал) 1
Это 890? (1=меньше, 2=больше, 3=попал) 2
Это 898? (1=меньше, 2=больше, 3=попал) 2
Это 902? (1=меньше, 2=больше, 3=попал) 1
Это 900? (1=меньше, 2=больше, 3=попал) 1
Это 899? (1=меньше, 2=больше, 3=попал) 1
Врёшь, так не бывает!
0xa0
Backpatching
Now, in each node of the tree, a list of all the commands corresponding to it is stored, and each implementation
emit
includes a union of commands from child nodes - in the same order (from left to right) in which these nodes were created during parsing. It is possible to save both memory for storing commands and a code for combining them if all generated commands are immediately blamed on the result and only “meta-information” such as register numbers is stored in symbols. The most dramatic difference is that now we don’t need a tree at all: for each symbol it is enough to store one scalar . Moreover: operator symbols now have no meaning at all - the whole result of their analysis immediately falls into the vector of the finished p-code; therefore, convolutions where code is not generated do not even need to be inherited.
A small problem arises in constructions of the type
if
and while
, where after checking the condition you need to insert a jump forward if the condition is not met; and until the end of the analysis of the structure it is not known how much to jump. We'll have to leave the dummy team at the jump site and fill it at the end of the analysis. Such a system of one-pass code generation with dummies, and their subsequent "patching", is called backpatching . It is very versatile, and not only allows you to compile all the usual control structures, but also simplifies the implementation of operators such break
as jumping forward to an unknown distance. To insert a dummy jump after checking the conditions in
if
and while
, add a marker to the grammar - an "empty character":OP: 'if' '(' EXPR ')' M OP 'else' M OP; | 'while' '(' EXPR ')' M OP; M:
The trick is that the convolution
M
will be performed after the convolution EXPR
(which will generate a condition verification code) and before the convolution OP
, so that M
we can add the generation of a dummy to the convolution code . When we collapse the entire structure, fill the dummy with a jump until the next dummy (after if
) or until the end of the entire generated code (after else
and while
). The distance to jump forward now we know; But how to
while
find out the distance to jump back at the end of the cycle? After all, since there are no lists of commands, it means that we can’t find out in the convolution code how many teams the whole structure took. We’ll have to get another marker N
, which does not generate a code, but simply remembers the address of the desired place:OP: 'while' N '(' EXPR ')' M OP; M: N:;
An additional advantage of the new system is that for each generated command its address is immediately known, so that for strings we can store not temporary “identifiers”, but lists of all links. This will simplify the substitution of line addresses at the final stage of p-code generation. The compiler code is almost halved, and it compiles in exactly the same way as before. If it makes no difference, why write more?
%{
#include
#include
#include
#include