Compilation. 9: executable code

    I remind you that we are writing a compiler for the toy language j-creak. We started by compiling into p-code , spent a lot of effort on its optimization , and prepared for the final stage of compilation - to output machine-dependent executable code.
    There are no intricate algorithms here: by and large, only replacing one system of commands with another.

    Further in the post:

    1. Code selection
    2. Loader
    3. Changes to p-code
    4. Generation
    5. What happened?

    Code selection

    I would venture to assume that you are reading this post on a computer with an x86 or x64 processor inside. I apologize to mobile device users: it would be interesting to generate code for a fashionable gadget, but it’s too difficult for me to get such a copy for experiments.

    We will try to generate a "cross-platform" code that is equally functional on x86 and x64. We focus on machine code, and not on the internals of the ELF format and interaction with the bootloader; therefore, we will create the code in a “solid piece”, similar to the dos format COM. Read it from disk and run will be our own bootloader.

    We will use only 32-bit registers, so the same program, even if overflows occur during calculations, will give the same results on x64 and x86.

    To communicate with the outside world (commands ECHOand INPUT), the generated code will have to somehow call the functions of the operating system. How does he know their addresses?
    Our loader will create for the loaded program a “communication area” - an array of pointers to “standard functions”, and pass the address of this array to the program as a parameter. We have three standard functions: input of a number, output of a number, output of a line; therefore, the communication area will consist of three pointers. For the convenience of addressing, immediately at the entrance to the program, save the transmitted address in EBP/ RBP.

    Another interesting point is access to memory (to poured variables). Instead of the typical x86 absolute addressing, x64 introduced "relocatable absolute", relative RIP. To really get the code cross-platform, you must use explicit relative addressing.
    Without further ado, we will address the cells of the poured variables with respect to the same EBP/ RBP- i.e. immediately after the communication region, by offset 0x18, the variable region begins.

    The generated p-code uses four “abstract physical registers” R01..R04, which naturally match real registersEAX,ECX,EDX,EBX: i.e. the number of this register is obtained from the number of "abstract" by subtracting one. If we wanted to use the rest of the processor registers, the correspondence between the numbers would be more complicated.

    The way the parameters are passed to the function on x86 and on x64 is very different. On x86, the default parameters are passed through the stack ( cdecl ), but you can ask to gccpass up to three parameters in registers ECX,EDX,EAX( fastcall ). On x64, six parameters are transferred in registers RDI,RSI,RDX,RCX,R8,R9, and it gccdoes not allow changing the way of calling. Due to this incompatibility, it will be necessary to implement both transmission mechanisms in the code generator, and select the necessary one. Fortunately, passing parameters will be the only platform-specific place in our code.
    On both processors, the called function has the right to spoil the contents EAX,ECX,EDX, so they will need to be saved before the call (if they are alive), and restored after. Thus, the information about the vitality of physical registers, which we used during their assignment, is also needed when generating the code.

    What code will be generated, sorted out; it’s up to the small thing - to realize the plan.


    We take as a basis the long-standing p-code interpreter, leave all the husks (checking input, mapping the file to memory), and replace the gut (the cycle of execution and implementation of commands) with a call to the read code, as if it were a normal c-function. Note that the one that defined the command definitions disappeared : the new bootloader knows nothing about the j-script, nor about our p-code.

    const char* fdata = NULL; // весь прочитанный код

    int input() { int d; scanf(" %d", &d); return d; }
    void echoi(int i) { printf("%d", i); }
    void echos(int offset) { printf("%s", fdata+offset); }

    void* linkarea[1000] = {(void*)input, (void*)echoi, (void*) echos};

    int main(int argc, char** argv) {

        if(argc!=2) {
            printf("Missing code file name.\n");

        int fd = open(argv[1], O_RDONLY);
        if (fd<0) {
            printf("Cannot open code file.\n");
        struct stat finfo;
        fstat(fd, &finfo);
        fdata = (const char*)mmap(0, finfo.st_size, PROT_READ|PROT_EXEC, MAP_PRIVATE, fd, 0);
        if (!fdata) {
            printf("Cannot read code file.\n");

        ((void (*)(void**)) fdata)(linkarea); // запуск

        munmap((void*)fdata, finfo.st_size);
        return 0;

    #include "jsk.h"

    Changes to p-code

    Now that the p-code has become an internal representation, invisible outside the compiler, there is no sense in compressing each command into 4 bytes; therefore, no tricky unionor field size restrictions are needed . Replace jsk.hwith the declaration of the usual structure with fields int: A feature of the x86 / x64 instruction set is its diversity and non-orthogonality. Almost any operation can be encoded in three to four different ways. We will try for each n-team to choose the most compact of the possible implementations. (This is the limiting case of “optimization through the peephole” - peephole optimization , which deals not with the program as a whole, but with short blocks of adjacent teams. In general, each team is processed independently of those around it.) For example, the p-command
    struct command {
        enum opcodes {
            hlt, store, jz, echo, mov, load, input, add, sub, mul, div,
            eq, ne, ge, le, gt, lt
        opcodes opcode;
        regnum dest, src1, src2;
        int imm;
        command(opcodes opcode, regnum dest, regnum src1, regnum src2) :
                opcode(opcode), dest(dest), src1(src1), src2(src2) {}
        command(opcodes opcode, regnum dest, int imm) :
                opcode(opcode), dest(dest), imm(imm) {}

    addwe can compile in MOV, in INC, in DEC, in one of three forms ADDor in one of three forms LEA- depending on the combination of arguments.

    Some local optimizations, such as replacing mul r, r, 2with add r, r, r, are conveniently performed at the p-code level, before being translated into executable code. At the same stage - after choosing physical registers, but before the final translation - we will get rid of the registers, the value of which is known in all places where they are used: during translation we will use the known value.
    // в подстановке констант:
    //      ...
            if(i->cmd.opcode==command::mul) { // несколько замен
                if(i->known.count(i->cmd.src1)) std::swap(i->cmd.src1, i->cmd.src2);
                if(i->known.count(i->cmd.src2)) switch(i->known[i->cmd.src2]) {
                    case -1: i->cmd = command(command::sub, i->cmd.dest, 0, i->cmd.src1); break;
                    case 0:  i->cmd = command(command::mov, i->cmd.dest, 0); break;
                    case 1:  nopOut(i); break;
                    case 2:  i->cmd = command(command::add, i->cmd.dest, i->cmd.src1, i->cmd.src1); break;

    // удалить регистры, значение которых известно всюду, где используется
    void postalloc() {
            std::vector needed(lastreg+1);
            foreach(i, pcode) {
                if(i->has2src()) {
                    if(!i->known.count(i->cmd.src1)) needed[i->cmd.src1]=true;
                    if(!i->known.count(i->cmd.src2)) needed[i->cmd.src2]=true;
                    else // src2 известен: есть один особый случай
                        if(i->cmd.opcode==command::div && i->known[i->cmd.src2]!=2)
                            needed[i->cmd.src2]=true; // под делитель нужен регистр
                if(i->readsdest() && !i->known.count(i->cmd.dest))
            foreach(i, pcode)
                if(i->writesdest() && !needed[i->cmd.dest])

    // после успешного выбора всех физических регистров:
                // ...
                // вычистить NOP-ы
                // подставляем физические регистры: и в командах, и в known
                foreach(i, pcode) {
                    std::map known;
                        known[physmap[i->cmd.dest]] = i->known[i->cmd.dest];
                    i->cmd.dest = physmap[i->cmd.dest];
                    if (i->has2src()) {
                            known[physmap[i->cmd.src1]] = i->known[i->cmd.src1];
                            known[physmap[i->cmd.src2]] = i->known[i->cmd.src2];
                        i->cmd.src1 = physmap[i->cmd.src1];
                        i->cmd.src2 = physmap[i->cmd.src2];
                    i->known = known;


    All the generated code will be stored in one large vector . We will need several new fields. will determine the position of the team in the vector . In addition, for the and commands , the offset value of which is unknown until the end of the code generation, we will store the index of the unfilled offset in the field . So, on the first pass through the p-code, we generate all the executable code in the vector ; then go through the code a second time and fill in the missing offsets. After that, we print the result of all the code and all the lines. I tried to remove the most uninteresting pieces from the listing, in order to at least slightly reduce it. The full compiler code is available at code;
    struct commandnint offset, length;codeJZECHOint needfixat;


    // последний этап: генерация кода для x86/x64
    const char regsize = sizeof(void*); // 4 либо 8
    // пролог
    if(regsize==4) { // PUSH EBP / MOV EBP, [ESP+8] / PUSH EDI / PUSH ESI / PUSH EBX
        code.push_back(0x55); code.push_back(0x8b); code.push_back(0x6c); code.push_back(0x24);
        code.push_back(0x08); code.push_back(0x57); code.push_back(0x56); code.push_back(0x53);
    } else {         // PUSH EBP / MOV RBP, RDI / PUSH EBX
        code.push_back(0x55); code.push_back(0x48); code.push_back(0x8b); code.push_back(0xef);
    foreach(i, pcode) {
        int imm, doffset, joffset;
        bool jshort, saveAX, saveDX;
        i->offset = code.size();
        switch(i->cmd.opcode) {
        case command::hlt:
            if(regsize==4) {
                i->emit(0x5b, 0x5e, 0x5f, 0x5d); // POP EBX / POP ESI / POP EDI / POP EBP
                i->emit(0xc2, 2, 0);             // RET 2
            } else
                i->emit(0x5b, 0x5d, 0xc3);       // POP EBX / POP EBP / RET
        case command::store: // MOV [EBP+18+(imm-1)*4], dst
            doffset = 0x18+(i->cmd.imm-1)*4;
            if(i->known.count(i->cmd.dest)) {
                    i->emit(0x45, (char)doffset);
                else {
                    i->emit14(0x85, doffset);
            } else {
                    i->emit(0x45|(i->cmd.dest-1)<<3, (char)doffset);
                    i->emit14(0x85|(i->cmd.dest-1)<<3, doffset);
        case command::jz:
            joffset = i->tgt->offset - (i->offset+2); // предполагаем JMP SHORT
            jshort = i->tgt->offset && (joffset>=-128); // прыжок назад и близко
            if(!i->cmd.dest) { // JMP off
                    i->emit(0xeb, (char)joffset);
                else {
                    if(i->tgt->offset) // прыжок назад
                    else {
                        i->needfixat = code.size();
            if(jshort && (i->cmd.dest==2)) { // JECXZ
                i->emit(0xe3, (char)joffset);
            // OR dst, dst / JZ off
            i->emit(0x0b, 0xc0|(i->cmd.dest-1)<<3|(i->cmd.dest-1));
            if(jshort && (joffset>=-126))
                i->emit(0x74, (char)(joffset-2));
            else {
                i->emit(0x0f, 0x84);
                if(i->tgt->offset) // прыжок назад
                else {
                    i->needfixat = code.size();
        case command::echo:  // PUSH live / PUSH dst / CALL [EBP+?] / ADD ESP, 4 / POP live
            foreach(rp, i->onexitp) if(*rp!=4) i->emit(0x50|(*rp-1));
            if(!i->cmd.dest) { // imm, [EBP+8]
                if(regsize==4) i->emit(0x68); else i->emit(0xbf);
                i->needfixat = code.size();
                i->emit(0xff, 0x55, 2*regsize);
            } else {
                if(i->known.count(i->cmd.dest)) { // imm, [EBP+4]
                    imm = i->known[i->cmd.dest];
                        i->emit14(0xbf, imm);
                    else if((imm>=-128)&&(imm<128))
                        i->emit(0x6a, (char)imm);
                        i->emit14(0x68, imm);
                } else if(regsize==4) // dst, [EBP+4]
                    i->emit(0x8b, 0xf8|(i->cmd.dest-1));
                i->emit(0xff, 0x55, regsize);
            if(regsize==4) i->emit(0x83, 0xc4, 4);
            foreachr(rp, i->onexitp) if(*rp!=4) i->emit(0x58|(*rp-1));
        case command::mov:   // MOV dst, imm
                i->emit14(0xb8|(i->cmd.dest-1), i->cmd.imm);
            else             // XOR dst, dst
                i->emit(0x33, 0xc0|(i->cmd.dest-1)<<3|(i->cmd.dest-1));
        case command::load:  // MOV dst, [EBP+(3+i)*4]
            doffset = 0x18+(i->cmd.imm-1)*4;
                i->emit(0x45|(i->cmd.dest-1)<<3, (char)doffset);
                i->emit14(0x85|(i->cmd.dest-1)<<3, doffset);
        case command::input: // PUSH live / CALL [EBP+0] / XCHG EAX, dst / POP live
            foreach(rp, i->onenterp) if(*rp!=4) i->emit(0x50|(*rp-1));
            i->emit(0xff, 0x55, 0);
            if(i->cmd.dest!=1) i->emit(0x90|(i->cmd.dest-1));
            foreachr(rp, i->onenterp) if(*rp!=4) i->emit(0x58|(*rp-1));
        case command::add:
            // максимум 1 операнд известен; поставим его в src2
            if(i->known.count(i->cmd.src1) || (i->cmd.src2==i->cmd.dest))
                std::swap(i->cmd.src1, i->cmd.src2);
            if(!i->cmd.src2) // MOV dst, src1
                i->emit(0x8b, 0xc0|(i->cmd.dest-1)<<3|(i->cmd.src1-1));
            else if(i->known.count(i->cmd.src2)) {
                imm = i->known[i->cmd.src2];
                if(i->cmd.dest==i->cmd.src1) // ADD dst, imm
                    if(imm==1) // INC dst
                        i->emit(0xff, 0xc0|(i->cmd.dest-1));
                    else if(imm==-1) // DEC dst
                        i->emit(0xff, 0xc8|(i->cmd.dest-1));
                    else if((imm>=-128)&&(imm<128))
                        i->emit(0x83, 0xc0|(i->cmd.dest-1), (char)imm);
                    else // for imm=128 we might use SUB Edst, -128
                        i->emit114(0x81, 0xc0|(i->cmd.dest-1), imm);
                else // LEA dst, [src1+imm]
                        i->emit(0x8d, 0x40|(i->cmd.dest-1)<<3|(i->cmd.src1-1), (char)imm);
                        i->emit114(0x8d, 0x80|(i->cmd.dest-1)<<3|(i->cmd.src1-1), imm);
            } else if(i->cmd.dest==i->cmd.src1) // ADD dst, src2
                i->emit(0x03, 0xc0|(i->cmd.dest-1)<<3|(i->cmd.src2-1));
            else // LEA dst, [src1+src2]
                i->emit(0x8d, (i->cmd.dest-1)<<3|4, (i->cmd.src1-1)<<3|(i->cmd.src2-1));
        case command::sub: // ...
        case command::mul: // ...
        case command::div:
            if(i->known.count(i->cmd.src2) && i->known[i->cmd.src2]==2) {
                if(i->cmd.dest!=i->cmd.src1) // MOV dst, src1 / SAR dst, 1
                    i->emit(0x8b, 0xc0|(i->cmd.dest-1)<<3|(i->cmd.src1-1));
                i->emit(0xd1, 0xf8|(i->cmd.dest-1));
            // для деления нужны именно EAX,EDX;
            // сохраним их текущее содержимое в EDI,ESI, а после деления восстановим
            // MOV EDI, EAX / MOV EAX, src1 / MOV ESI, EDX / XOR EDX, EDX
            // IDIV src2 / XCHG dst, EAX / MOV EDX, ESI / MOV EAX, EDI
            saveAX = i->onexitp.count((physreg)1) && (i->cmd.dest!=1);
            saveDX = i->onexitp.count((physreg)3) && (i->cmd.dest!=3);
            if(saveAX || (i->cmd.src2==1)) i->emit(0x8b, 0xf8);
                    i->emit14(0xb8, i->known[i->cmd.src1]);
                    i->emit(0x8b, 0xc0|(i->cmd.src1-1));
            if(saveDX || (i->cmd.src2==3)) i->emit(0x8b, 0xf2);
            i->emit(0x33, 0xd2);
            if(i->cmd.src2==1) // переехал в EDI
                i->emit(0xf7, 0xff);
            else if(i->cmd.src2==3) // переехал ESI
                i->emit(0xf7, 0xfe);
                i->emit(0xf7, 0xf8|(i->cmd.src2-1));
            if(saveDX) i->emit(0x8b, 0xd6);
            if(saveAX) i->emit(0x8b, 0xc7);
        case command::eq: case command::ne: case command::ge:
        case command::le: case command::gt: case command::lt:
            // максимум 1 операнд известен; поставим его в src2
            if(i->known.count(i->cmd.src1)) {
                std::swap(i->cmd.src1, i->cmd.src2);
                switch(i->cmd.opcode) {
                case command::ge: i->cmd.opcode = command::le; break;
                case command::le: i->cmd.opcode = command::ge; break;
                case command::gt: i->cmd.opcode = command::lt; break;
                case command::lt: i->cmd.opcode = command::gt; break;
            // CMP src1, src2 / SETcc dstL / MOVZX dst, dstL
            if(i->known.count(i->cmd.src2)) {
                imm = i->known[i->cmd.src2];
                    i->emit(0x83, 0xf8|(i->cmd.src1-1), (char)imm);
                else if(i->cmd.src1==1)
                    i->emit14(0x3d, imm);
                    i->emit114(0x81, 0xf8|(i->cmd.src1-1), imm);
            } else
                i->emit(0x3b, 0xc0|(i->cmd.src1-1)<<3|(i->cmd.src2-1));
            switch(i->cmd.opcode) {
            case command::eq: i->emit(0x94); break;
            case command::ne: i->emit(0x95); break;
            // ...
            i->emit(0x0f, 0xb6, 0xc0|(i->cmd.dest-1)<<3|(i->cmd.dest-1));
            yyerror("unknown opcode");
    // после всего кода -- строки
    int offset = code.size();
    std::vector offsets(strings.size()+1);
    foreach(i, strings) {
        offsets[i->second] = offset;
        offset += i->first.length();
    // второй проход: подставляем смещения
    foreach(i, pcode) if(i->needfixat)
        else if (i->cmd.opcode==command::echo)

    write(1, &*code.begin(), code.size()); // вывод кода
    foreach(i, strings)                     // вывод строк
        write(1, i->first.c_str(), i->first.length()+1);

    What happened?

    We compile the compiler, compile a test program with it, compile the bootloader, and run: I wonder what our program compiled into?

    [tyomitch@home ~]$ bison jsk.y
    [tyomitch@home ~]$ c++ lex.yy.c -o jskc
    [tyomitch@home ~]$ ./jskc < test.jsk > code
    [tyomitch@home ~]$ cc jskld.c -o jskld
    [tyomitch@home ~]$ ./jskld code
    Задумай число от 0 до 1000, а я буду угадывать
    Это 500? (1=меньше, 2=больше, 3=попал) 1
    Это 249? (1=меньше, 2=больше, 3=попал) 3
    Ура! Я молодец!

    00:55 push rbp
    01: 48 8b ef mov rbp, rdi
    04:39 push rbx
    05:33 c0 xor eax, eax
    07: b9 e8 03 00 00 mov ecx, 0x3e8
    0c: 50 push rax
    0d: 51 push rcx
    0e: bf 80 01 00 00 mov edi, 0x180
    13: ff 55 10 call qword ptr [rbp + 0x10]
    16: 59 pop rcx
    17: 58 pop rax
    18: 50 push rax
    19: 51 push rcx
    1a: bf 00 00 00 00 mov edi, 0x0
    1f: ff 55 08 call qword ptr [rbp + 0x8]
    22: 59 pop rcx
    23: 58 pop rax
    24: 50 push rax
    25: 51 push rcx
    26: bf fa 00 00 00 mov edi, 0xfa
    2b: ff 55 10 call qword ptr [rbp + 0x10]
    2e: 59 pop rcx
    2f: 58 pop rax
    30: 50 push rax
    31: 51 push rcx
    32: bf e8 03 00 00 mov edi, 0x3e8
    37: ff 55 08 call qword ptr [rbp + 0x8]
    3a: 59 pop rcx
    3b: 58 pop rax
    3c: 50 push rax
    3d: 51 push rcx
    3e: bf 01 01 00 00 mov edi, 0x101
    43: ff 55 10 call qword ptr [rbp + 0x10]
    46: 59 pop rcx
    47: 58 pop rax
    48: 3b c1 cmp eax, ecx
    4a: 0f 9e c2 setle dl
    4d: 0f b6 d2 movzx edx, dl
    50: 0b d2 or edx, edx
    52: 0f 84 97 00 00 00 je 0xef
    58: 8d 14 01 lea edx, [rcx + rax * 1]
    5b: d1 fa sar edx, 1
    5d: 50 push rax
    5e: 51 push rcx
    5f: 52 push rdx
    60: bf bc 01 00 00 mov edi, 0x1bc
    65: ff 55 10 call qword ptr [rbp + 0x10]
    68: 5a pop rdx
    69: 59 pop rcx
    6a: 58 pop rax
    6b: 50 push rax
    6c: 51 push rcx
    6d: 52 push rdx
    6e: 8b fa mov edi, edx
    70: ff 55 08 call qword ptr [rbp + 0x8]
    73: 5a pop rdx
    74: 59 pop rcx
    75: 58 pop rax
    76: 50 push rax
    77: 51 push rcx
    78: 52 push rdx
    79: bf 26 01 00 00 mov edi, 0x126
    7e: ff 55 10 call qword ptr [rbp + 0x10]
    81: 5a pop rdx
    82: 59 pop rcx
    83: 58 pop rax
    84: 50 push rax
    85: 51 push rcx
    86: 52 push rdx
    87: ff 55 00 call qword ptr [rbp + 0x0]
    8a: 93 xchg ebx, eax
    8b: 5a pop rdx
    8c: 59 pop rcx
    8d: 58 pop rax
    8e: 89 45 18 mov dword ptr [rbp + 0x18], eax
    91: 83 fb 01 cmp ebx, 0x1
    94: 0f 94 c0 sete al
    97: 0f b6 c0 movzx eax, al
    9a: 0b c0 or eax, eax
    9c: 0f 84 08 00 00 00 je 0xaa
    a2: 8b 45 18 mov eax, dword ptr [rbp + 0x18]
    a5: 8d 4a ff lea ecx, [rdx-0x1]
    a8: eb 9e jmp 0x48
    aa: 83 fb 02 cmp ebx, 0x2
    ad: 0f 94 c0 sete al
    b0: 0f b6 c0 movzx eax, al
    b3: 0b c0 or eax, eax
    b5: 0f 84 04 00 00 00 je 0xbf
    bb: 03 c2 add eax, edx
    bd: eb 89 jmp 0x48
    bf: 8b 45 18 mov eax, dword ptr [rbp + 0x18]
    c2: 83 fb 03 cmp ebx, 0x3
    c5: 0f 94 c2 sete dl
    c8: 0f b6 d2 movzx edx, dl
    cb: 0b d2 or edx, edx
    cd: 0f 84 0b 00 00 00 je 0xde
    d3: bf a0 01 00 00 mov edi, 0x1a0
    d8: ff 55 10 call qword ptr [rbp + 0x10]
    db: 5b pop rbx
    dc: 5d pop rbp
    dd: c3 ret
    de: 50 push rax
    df: 51 push rcx
    e0: bf c4 01 00 00 mov edi, 0x1c4
    e5: ff 55 10 call qword ptr [rbp + 0x10]
    e8: 59 pop rcx
    e9: 58 pop rax
    ea: e9 59 ff ff ff jmp 0x48
    ef: bf 59 01 00 00 mov edi, 0x159
    f4: ff 55 10 call qword ptr [rbp + 0x10]
    f7: 5b pop rbx
    f8: 5d pop rbp
    f9: c3 ret

    With the naked eye you can see at least two relevant optimizations that need an “eye” for more than one team:
    • when the command ends on POP reg, and the next starts on PUSH reg, both of these instructions can be deleted;
    • the combination SETcc regl / MOVZX reg, regl / OR reg, reg / JZ offsetcan be replaced with one instruction JNcc offset, if regnot alive at the exit JZ.
    This “big peephole optimization” is what we’ll do next time. The more important remaining task is to generate a real ELF so that you don't need a makeshift bootloader.

    Also popular now: