X86 assembler guide for beginners

Transfer

Nowadays, it is rarely necessary to write in pure assembler, but I definitely recommend it to anyone interested in programming. You will see things from a different angle, and the skills will be useful when debugging code in other languages.

In this article we will write from scratch the reverse Polish record calculator (RPN) in pure x86 assembler. When done, we can use it like this:

$ ./calc "32+6*"# "(3+2)*6" в инфиксной нотации
30

All code for the article is here . It is abundantly commented out and can serve as training material for those who already know the assembler.

Let's start by writing the basic program Hello world! to check the environment settings. Then we go to the system calls, call stack, stack frames, and x86 calling convention. Then, for the practice, we will write some basic functions in x86 assembler - and we will start writing the RPN calculator.

It is assumed that the reader has some C programming experience and basic knowledge of computer architecture (for example, what is a processor register). Since we will use Linux, you should also be able to use the Linux command line.

Environment setup

As already said, we use Linux (64- or 32-bit). The code above does not work on Windows or Mac OS X.

All you need to install is the GNU linker ldfrom binutils, which is preinstalled on most distros, and the NASM assembler. On Ubuntu and Debian, you can install both with one command:

$ sudo apt-get install binutils nasm

I would also recommend keeping the ASCII table handy .

Hello, world!

To test your environment, save the following code in a file calc.asm:

; Компоновщик находит символ _start и начинает выполнение программы
; отсюда.
global _start
; В разделе .rodata хранятся константы (только для чтения)
; Порядок секций не имеет значения, но я люблю ставить её вперёд
section .rodata
    ; Объявляем пару байтов как hello_world. Псевдоинструкция базы NASM 
    ; допускает однобайтовое значение, строковую константу или их сочетание,
    ; как здесь. 0xA = новая строка, 0x0 = нуль окончания строки
    hello_world: db "Hello world!", 0xA, 0x0
; Начало секции .text, где находится код программы
section .text
_start:
    mov eax, 0x04           ; записать число4 в регистр eax (0x04 = write())
    mov ebx, 0x1            ; дескриптор файла (1 = стандартный вывод, 2 = стандартная ошибка)
    mov ecx, hello_world    ; указатель на выводимую строку
    mov edx, 14             ; длина строки
    int 0x80                ; отправляем сигнал прерывания 0x80, который ОС
                            ;   интерпретирует как системный вызов
    mov eax, 0x01           ; 0x01 = exit()
    mov ebx, 0              ; 0 = нет ошибок
    int 0x80

Comments explain the general structure. For a list of registers and general instructions, see the University of Virginia's x86 Assembler Guide . In further discussion of system calls, this is all the more necessary.

The following commands assemble the assembler file into an object file, and then compose the executable file:

$ nasm -f elf_i386 calc.asm -o calc
$ ld -m elf_i386 calc.o -o calc

After launch, you should see:

$ ./calc
Hello world!

Makefile

This is an optional part, but you can do it to simplify assembly and layout in the future Makefile. Save it in the same directory as calc.asm:

CFLAGS= -f elf32
LFLAGS= -m elf_i386
all: calc
calc: calc.o
	ld $(LFLAGS) calc.o -o calc
calc.o: calc.asm
	nasm $(CFLAGS) calc.asm -o calc.o
clean:
	rm -f calc.o calc
.INTERMEDIATE: calc.o

Then, instead of the above instructions, just run make.

System calls

Linux system calls tell the OS to do something for us. In this article, we use only two system calls: write()to write a string to a file or stream (in our case, this is a standard output device and a standard error) and exit()to exit the program:

syscall0x01: exit(int error_code)
  error_code - используем 0 для выхода без ошибок и любые другие значения (такие как 1) для ошибок
syscall0x04: write(int fd, char *string, intlength)
  fd — используем 1 для стандартного вывода, 2 для стандартного потока вывода ошибок
  string — указатель на первый символ строки
  length — длина строки в байтах

System calls are set by storing system call number in the register eax, and then its arguments ebx, ecx, edxin that order. You may notice that there is exit()only one argument - in this case ecx and edx do not matter.

eax	ebx	ecx	edx
System call number	arg1	arg2	arg3

Call stack

The call stack is a data structure that stores information about each function call. Each call has its own section in the stack - “frame”. It stores some information about the current call: the local variables of this function and the return address (where the program should go after the function is executed).

Immediately I will note one non-obvious thing: the stack grows down through memory. When you add something to the top of the stack, it is inserted at a memory address lower than the previous item. In other words, as the stack grows, the memory address at the top of the stack decreases. To avoid confusion, I will remind you of this fact all the time.

The instruction pushputs something on top of the stack, and poptakes the data from there. For example,push еахallocates space at the top of the stack and places the value there from the register eax, and pop еахtransfers any data from the top of the stack to eaxand releases this memory area.

The purpose of the register espis to point to the top of the stack. Any data above is espconsidered to be not on the stack, this is garbage data. Execution of instruction push(or pop) moves esp. You can manipulate espdirectly if you give a report to your actions.

The register ebpis similar to esp, only it always indicates approximately the middle of the current stack frame, immediately before the local variables of the current function (let's talk about this later). However, calling another function does not move ebpautomatically; you need to do this manually each time.

Call Agreement for x86 Architecture

In x86, there is no built-in concept of a function as in high-level languages. The instruction call is essentially just jmp( goto) to another memory address. To use subroutines as functions in other languages (which can take arguments and return data), you need to follow the calling convention (there are many conventions, but we use CDECL, the most popular agreement for x86 among C compilers and assembler programmers). It also ensures that the subroutine registers are not confused when calling another function.

Caller Rules

Before calling a function, the caller must:

Save the registers that the caller must save to the stack. The called function may change some registers: in order not to lose the data, the caller must keep them in memory before being pushed onto the stack. These are registers eax, ecxand edx. If you do not use any of them, then you can not save.
Write the function arguments to the stack in reverse order (first the last argument, at the end the first argument). This order ensures that the function being called gets its arguments in the correct order from the stack.
Call a subroutine.

If possible, the function will save the result to eax. Immediately after the callcaller must:

Remove function arguments from stack. This is usually done by simply adding the number of bytes to esp. Do not forget that the stack grows down, so you need to add bytes to remove from the stack.
Restore the saved registers, taking them from the stack in the reverse order of the instruction pop. The called function will not change any other registers.

The following example demonstrates how these rules are applied. Suppose a function _subtractaccepts two integer (4-byte) arguments and returns the first argument minus the second. In the subroutine we _mysubroutinecall _subtractwith arguments 10and 2:

_mysubroutine:
    ; ...
    ; здесь какой-то код
    ; ...
    push ecx       ; сохраняем регистры (я решил не сохранять eax)
    push edx
    push2         ; второе правило, пушим аргументы в обратном порядке
    push10
    call _subtract ; eax теперь равен 10-2=8
    add esp, 8     ; удаляем 8 байт со стека (два аргумента по 4 байта)
    pop edx        ; восстанавливаем сохранённые регистры
    pop ecx
    ; ...
    ; ещё какой-то код, где я использую удивительно полезное значение из eax
    ; ...

Called Subroutine Rules

Before calling the subroutine must:

Save the pointer base register of the ebpprevious frame, writing it to the stack.
Adjust ebpfrom the previous frame to the current (current value esp).
Allocate more stack space for local variables, and move the pointer if necessary esp. As the stack grows down, you need to subtract the missing memory from esp.
Save to the stack the registers of the called subroutine. This ebx, ediand esi. It is not necessary to save registers that are not planned to change.

Call stack after step 1:

Call stack after step 2:

Call stack after step 4:

On these diagrams, the return address is specified in each stack frame. It is automatically inserted into the stack instruction call. The instruction retretrieves the address from the top of the stack and transfers to it. We do not need this instruction, I just showed why the local variables of the function are 4 bytes higher ebp, but the function arguments are 8 bytes lower ebp.

The last diagram also shows that local variables of a function always begin 4 bytes higher ebpwith the address ebp-4(here subtraction, because we move up the stack), and the function arguments always begin 8 bytes lower ebpwith the addressebp+8(addition, because we move down the stack). If you follow the rules of this convention, so will c variables and arguments of any function.

When the function is completed and you want to return, you must first set eaxto the return value of the function, if necessary. In addition, you need:

Restore saved registers by removing them from the stack in reverse order.
Free space on the stack allocated to local variables in step 3, if necessary: done by simple installation espin ebp
Restore the base pointer of the ebpprevious frame, removing it from the stack.
Return with ret

Now we implement the function _subtractfrom our example:

_subtract:
    push ebp           ; сохранение указателя базы предыдущего фрейма
    mov ebp, esp        ; настройка ebp
    ; Здесь я бы выделил место на стеке для локальных переменных, но они мне не нужны
    ; Здесь я бы сохранил регистры вызываемой подпрограммы, но я ничего не
    ; собираюсь изменять
    ; Тут начинается функция
    mov eax, [ebp+8]    ; копирование первого аргумента функции в eax. Скобки
                        ; означают доступ к памяти по адресу ebp+8subeax, [ebp+12]   ; вычитание второго аргумента по адресу ebp+12 из первого 
                        ; аргумента
    ; Тут функция заканчивается, eax равен её возвращаемому значению
    ; Здесь я бы восстановил регистры, но они не сохранялись
    ; Здесь я бы освободил стек от переменных, но память для них не выделялась
    pop ebp             ; восстановление указателя базы предыдущего фрейма
    ret

entrance and exit

In the above example you can see that the function always starts the same way: push ebp, mov ebp, espand allocate memory for local variables. In the x86 set, there is a handy instruction that does all this:, enter a bwhere a is the number of bytes that you want to allocate for local variables, the b “nesting level,” which we will always put on 0. In addition, the function always ends with instructions pop ebpand mov esp, ebp(although they are necessary only when allocating memory for local variables, but in any case they do no harm). It can also be replaced by a single instruction: leave. Make changes:

_subtract:
    enter 0, 0            ; сохранение указателя базы предыдущего фрейма и настройка ebp
    ; Здесь я бы сохранил регистры вызываемой подпрограммы, но я ничего не 
    ; собираюсь изменять
    ; Тут начинается функция
    mov eax, [ebp+8]    ; копирование первого аргумента функции в eax. Скобки
                        ; означают доступ к памяти по адресу ebp+8subeax, [ebp+12]   ; вычитание второго аргумента по адресу ebp+12 из 
                        ; первого аргумента
    ; Тут функция заканчивается, eax равен её возвращаемому значению
    ; Здесь я бы восстановил регистры, но они не сохранялись
    leave              ; восстановление указателя базы предыдущего фрейма
    ret

Writing some basic functions

Having mastered the calling convention, you can begin writing some subroutines. Why not summarize the code that displays "Hello world!" To output any lines: a function _print_msg.

Here we need another function _strlento calculate the length of the string. In C, it may look like this:

size_tstrlen(char *s) {
    size_t length = 0;
    while (*s != 0)
    {           // начало цикла
        length++;
        s++;
    }           // конец циклаreturn length;
}

In other words, from the very beginning of the line, we add 1 to the return value for each character except zero. As soon as the null character is noticed, we return the value accumulated in the loop. In assembly language, this is also quite simple: you can use a previously written function as a base _subtract:

_strlen:
    enter 0, 0          ; сохраняем указатель базы предыдущего фрейма и настраиваем ebp
    ; Здесь я бы сохранил регистры вызываемой подпрограммы, но я ничего не 
    ; собираюсь изменять
    ; Здесь начинается функцияmoveax, 0          ; length = 0
    movecx, [ebp+8]    ; первыйаргументфункции (указатель на первый
                        ; символ строки) копируется в ecx (его сохраняет вызывающая 
                        ; сторона, так что нам нет нужды сохранять)
_strlen_loop_start:     ; это метка, куда можно перейти
    cmp byte [ecx], 0   ; разыменование указателя и сравнение его с нулём. По
                        ; умолчанию память считывается по32 бита (4 байта).
                        ; Иное нужно указать явно. Здесь мы указываем
                        ; чтение только одного байта (один символ)
    je _strlen_loop_end ; выход из цикла при появлении нуля
    inc eax             ; теперь мы внутри цикла, добавляем 1 к возвращаемому значению
    add ecx, 1          ; переход к следующему символу в строке
    jmp _strlen_loop_start  ; переход обратно к началу цикла
_strlen_loop_end:
    ; Здесь функциязаканчивается, eaxравновозвращаемомузначению
    ; Здесь я бывосстановилрегистры, ноонинесохранялисьleave               ; восстановлениеуказателябазыпредыдущегофреймаret

Already not bad, right? First, writing C code can help, because most of it is directly converted to assembler. Now you can use this function in _print_msg, where we apply all the knowledge gained:

_print_msg:
    enter 0, 0
    ; Здесь начинается функция
    mov eax, 0x04       ; 0x04 = системный вызов write()
    mov ebx, 0x1        ; 0x1 = стандартный вывод
    mov ecx, [ebp+8]    ; мы хотим вывести первый аргумент этой функции,
    ; сначала установим edx на длину строки. Пришло время вызвать _strlen
    push eax            ; сохраняем регистры вызываемой функции (я решил не сохранять edx)
    push ecx       
    push dword [ebp+8]  ; пушим аргумент _strlen в _print_msg. Здесь NASM
                        ; ругается, если не указать размер, не знаю, почему.
                        ; В любом случае указателем будет dword (4 байта, 32 бита)
    call _strlen        ; eax теперь равен длине строки
    mov edx, eax        ; перемещаем размер строки в edx, где он нам нужен
    add esp, 4          ; удаляем 4 байта со стека (один 4-байтовый аргумент char*)
    pop ecx             ; восстанавливаем регистры вызывающей стороны
    pop eax
    ; мы закончили работу с функцией _strlen, можно инициировать системный вызов
    int0x80
    leave
    ret

And let's see the fruits of our hard work using this feature in the full program “Hello, world!”.

_start:
    enter 0, 0
    ; сохраняем регистры вызывающей стороны (я решил никакие не сохранять)
    push hello_world    ; добавляем аргумент для _print_msg
    call _print_msg
    mov eax, 0x01           ; 0x01 = exit()
    mov ebx, 0              ; 0 = без ошибок
    int0x80

Believe it or not, we have covered all the main topics that are needed for writing basic programs in x86 assembler! Now we have all the introductory material and theory, so we’ll concentrate entirely on the code and apply this knowledge to write our RPN calculator. Functions will be much longer and will even use some local variables. If you want to immediately see the finished program, here it is .

For those of you who are not familiar with reverse polish notation (sometimes called reverse polish notation or postfix notation), here the expressions are calculated using the stack. So you need to create a stack, as well as the functions _popand _pushthe manipulation of this stack. It will take another function_print_answerwhich will print at the end of the calculations a string representation of the numerical result.

Stack creation

First, we define a space in memory for our stack, as well as a global variable stack_size. It is advisable to change these variables so that they fall not into a section .rodata, but into .data.

section .data
    stack_size: dd 0        ; создаём переменную dword (4 байта) со значением 0
    stack: times 256 dd 0   ; заполняем стек нулями

Now you can implement the functions _pushand _pop:

_push:
    enter 0, 0
    ; Сохраняем регистры вызываемой функции, которые будем использовать
    pusheaxpushedxmoveax, [stack_size]movedx, [ebp+8]mov[stack + 4*eax], edx    ; Заносим аргумент на стек. Масштабируем по
                                ; четыре байта в соответствии с размером dwordincdword[stack_size]      ; Добавляем 1 к stack_size
    ; Восстанавливаем регистры вызываемой функции
    popedxpopeaxleaveret
_pop:
    enter 0, 0
    ;  Сохраняем регистры вызываемой функции
    decdword[stack_size]      ; Сначала вычитаем 1 из stack_sizemoveax, [stack_size]moveax, [stack + 4*eax]    ; Заносим число на верх стека в eax
    ; Здесь я бы восстановил регистры, но они не сохранялись
    leaveret

Output numbers

_print_answermuch more difficult: you have to convert numbers to strings and use several other functions. You need a function _putcthat outputs one character, a function modto calculate the remainder of the division (module) of two arguments and _pow_10for raising to the power of 10. Later, you will understand why they are needed. It's pretty simple, here's the code:

_pow_10:
    enter 0, 0
    mov ecx, [ebp+8]    ; задаёт ecx (сохранённый вызывающей стороной) аргументом 
                        ; функции
    mov eax, 1          ; первая степень 10 (10**0 = 1)
_pow_10_loop_start:     ; умножает eax на 10, если ecx не равно 0
    cmp ecx, 0
    je _pow_10_loop_end
    imul eax, 10subecx, 1
    jmp _pow_10_loop_start
_pow_10_loop_end:
    leaveret
_mod:
    enter 0, 0
    pushebxmovedx, 0          ; объясняется ниже
    mov eax, [ebp+8]
    mov ebx, [ebp+12]
    idiv ebx            ; делит 64-битное целое [edx:eax] на ebx. Мы хотим поделить
                        ; только 32-битное целое eax, так что устанавливаем edx равным 
                        ; нулю.
                        ; частное сохраняем в eax, остаток в edx. Как обычно, получить 
                        ; информацию по конкретной инструкции можно из справочников, 
                        ; перечисленных в конце статьи.
    mov eax, edx        ; возвращает остаток от деления (модуль)
    pop ebx
    leave
    ret
_putc:
    enter 0, 0
    mov eax, 0x04       ; write()
    mov ebx, 1          ; стандартный вывод
    lea ecx, [ebp+8]    ; входной символ
    mov edx, 1          ; вывести только 1 символ
    int0x80
    leave
    ret

So, how do we print individual numbers in a number? First, note that the last digit of the number is equal to the remainder of dividing by 10 (for example, 123 % 10 = 3), and the next digit is the remainder of dividing by 100 divided by 10 (for example, (123 % 100)/10 = 2). In general, you can find a specific digit of the number (from right to left), finding (число % 10**n) / 10**(n-1)where the number of units will be equal n = 1, the number of tens, n = 2and so on.

Using this knowledge, you can find all the digits of the number from n = 1to n = 10(this is the maximum number of digits in the significant 4-byte whole). But it is much easier to go from left to right - so we can type each character as soon as we find it, and get rid of the zeros on the left side. Therefore, iterate the numbers from n = 10to n = 1.

On C, the program will look something like this:

#define MAX_DIGITS 10voidprint_answer(int a){
    if (a < 0) { // если число отрицательное
        putc('-'); // вывести знак «минус»
        a = -a; // преобразовать в положительное число
    }
    int started = 0;
    for (int i = MAX_DIGITS; i > 0; i--) {
        int digit = (a % pow_10(i)) / pow_10(i-1);
        if (digit == 0 && started == 0) continue; // не выводить лишние нули
        started = 1;
        putc(digit + '0');
    }
}

Now you understand why we need these three functions. Let's implement this in assembler:

%define MAX_DIGITS 10
_print_answer:
    enter 1, 0              ; используем 1 байт для переменной "started" в коде C
    push ebx
    push edi
    push esi
    mov eax, [ebp+8]        ; наш аргумент "a"
    cmp eax, 0              ; если число не отрицательное, пропускаем этот условный 
                            ; оператор
    jge _print_answer_negate_end
    ; call putc for'-'push eax
    push0x2d               ; символ '-'
    call _putc
    add esp, 4pop eax
    neg eax                 ; преобразуем в положительное число
_print_answer_negate_end:
    mov byte [ebp-4], 0     ; started = 0
    mov ecx, MAX_DIGITS     ; переменная i
_print_answer_loop_start:
    cmp ecx, 0
    je _print_answer_loop_end
    ; вызов pow_10 для ecx. Попытаемся сделать ebx как переменную "digit" в коде C.
    ; Пока что назначим edx = pow_10(i-1), а ebx = pow_10(i)
    push eax
    push ecx
    dec ecx             ; i-1push ecx            ; первый аргумент для _pow_10
    call _pow_10
    mov edx, eax        ; edx = pow_10(i-1)
    add esp, 4pop ecx             ; восстанавливаем значение i для ecx
    pop eax
    ; end pow_10 call
    mov ebx, edx        ; digit = ebx = pow_10(i-1)
    imul ebx, 10        ; digit = ebx = pow_10(i)
    ; вызываем _mod для (a % pow_10(i)), то есть (eax mod ebx)
    push eax
    push ecx
    push edx
    push ebx            ; arg2, ebx = digit = pow_10(i)
    push eax            ; arg1, eax = a
    call _mod
    mov ebx, eax        ; digit = ebx = a % pow_10(i+1), almost there
    add esp, 8pop edx
    pop ecx
    pop eax
    ; завершение вызова mod
    ; делим ebx (переменная "digit" ) на pow_10(i) (edx). Придётся сохранить пару 
    ; регистров, потому что idiv использует для деления и edx, eax. Поскольку 
    ; edx является нашим делителем, переместим его в какой-нибудь 
    ; другой регистр
    push esi
    mov esi, edx
    push eax
    mov eax, ebx
    mov edx, 0
    idiv esi            ; eax хранит результат (цифру)
    mov ebx, eax        ; ebx = (a % pow_10(i)) / pow_10(i-1), переменная "digit" в коде C
    pop eax
    pop esi
    ; end division
    cmp ebx, 0                        ; если digit == 0
    jne _print_answer_trailing_zeroes_check_end
    cmp byte [ebp-4], 0               ; если started == 0
    jne _print_answer_trailing_zeroes_check_end
    jmp _print_answer_loop_continue   ; continue
_print_answer_trailing_zeroes_check_end:
    mov byte [ebp-4], 1     ; started = 1
    add ebx, 0x30           ; digit + '0'
    ; вызов putc
    push eax
    push ecx
    push edx
    push ebx
    call _putc
    add esp, 4pop edx
    pop ecx
    pop eax
    ; окончание вызова putc
_print_answer_loop_continue:
    subecx, 1
    jmp _print_answer_loop_start
_print_answer_loop_end:
    popesipopedipopebxleaveret

It was an ordeal! I hope the comments help to understand. If you are thinking now: “Why not just write printf("%d")?”, Then you will like the end of the article, where we will replace the function with just this!

Now we have all the necessary functions, it remains to implement the main logic in _start- and that's it!

Calculation of the reverse Polish record

As we have said, the reverse Polish entry is calculated using the stack. When reading, the number is put on the stack, and when reading, the operator is applied to two objects at the top of the stack.

For example, if we want to calculate 84/3+6*(this expression can also be written as 6384/+*), the process is as follows:

Step	Symbol	Stack up front	Stack after
one	`8`	`[]`	`[8]`
2	`4`	`[8]`	`[8, 4]`
3	`/`	`[8, 4]`	`[2]`
four	`3`	`[2]`	`[2, 3]`
five	`+`	`[2, 3]`	`[5]`
6	`6`	`[5]`	`[5, 6]`
7	`*`	`[5, 6]`	`[30]`

If the input contains a valid postfix expression, then at the end of the calculations there is only one element left on the stack - this is the answer, the result of the calculations. In our case, the number is 30.

In the assembler, you need to implement something like this C code:

intstack[256];         // наверное, 256 слишком много для нашего стекаint stack_size = 0;
intmain(int argc, char *argv[]){
    char *input = argv[0];
    size_t input_length = strlen(input);
    for (int i = 0; i < input_length; i++) {
        char c = input[i];
        if (c >= '0' && c <= '9') { // если символ — это цифра
            push(c - '0'); // преобразовать символ в целое число и поместить в стек
        } else {
            int b = pop();
            int a = pop();
            if (c == '+') {
                push(a+b);
            } elseif (c == '-') {
                push(a-b);
            } elseif (c == '*') {
                push(a*b);
            } elseif (c == '/') {
                push(a/b);
            } else {
                error("Invalid input\n");
                exit(1);
            }
        }
    }
    if (stack_size != 1) {
        error("Invalid input\n");
        exit(1);
    }
    print_answer(stack[0]);
    exit(0);
}

Now we have all the functions necessary to implement this, let's begin.

_start:
    ; аргументы _start получаются не так, как в других функциях.
    ; вместо этого esp указывает непосредственно на argc (число аргументов), а 
    ; esp+4 указывает на argv. Следовательно, esp+4 указывает на название
    ; программы, esp+8 - на первый аргумент и так далее
    mov esi, [esp+8]         ; esi = "input" = argv[0]
    ; вызываем _strlen для определения размера входных данных
    push esi
    call _strlen
    mov ebx, eax             ; ebx = input_length
    add esp, 4
    ; end _strlen call
    mov ecx, 0               ; ecx = "i"
_main_loop_start:
    cmp ecx, ebx             ; если (i >= input_length)
    jge _main_loop_end
    mov edx, 0
    mov dl, [esi + ecx]      ; то загрузить один байт из памяти в нижний байт
                             ; edx. Остальную часть edx обнуляем.
                             ; edx = переменная c = input[i]
    cmp edx, '0'
    jl _check_operator
    cmp edx, '9'
    jg _print_error
    subedx, '0'
    moveax, edx; eax = переменная c - '0' (цифра, не символ)
    jmp _push_eax_and_continue
_check_operator:
    ; дважды вызываем _pop для выноса переменной b в edi, a переменной b - в eax
    push ecx
    push ebx
    call _pop
    mov edi, eax             ; edi = b
    call _pop                ; eax = a
    pop ebx
    pop ecx
    ; end call _pop
    cmp edx, '+'
    jne _subtract
    add eax, edi                 ; eax = a+b
    jmp _push_eax_and_continue
_subtract:
    cmp edx, '-'
    jne _multiply
    subeax, edi; eax = a-b
    jmp _push_eax_and_continue
_multiply:
    cmp edx, '*'
    jne _divide
    imul eax, edi                ; eax = a*b
    jmp _push_eax_and_continue
_divide:
    cmp edx, '/'
    jne _print_error
    push edx                     ; сохраняем edx, потому что регистр обнулится для idiv
    mov edx, 0
    idiv edi                     ; eax = a/b
    pop edx
    ; теперь заносим eax на стек и продолжаем
_push_eax_and_continue:
    ; вызываем _push
    push eax
    push ecx
    push edx
    push eax          ; первый аргумент
    call _push
    add esp, 4pop edx
    pop ecx
    pop eax
    ; завершение call _push
    inc ecx
    jmp _main_loop_start
_main_loop_end:
    cmp byte [stack_size], 1      ; если (stack_size != 1), печать ошибки
    jne _print_error
    mov eax, [stack]
    push eax
    call _print_answer
    ; print a final newline
    push0xA
    call _putc
    ; exit successfully
    mov eax, 0x01           ; 0x01 = exit()
    mov ebx, 0              ; 0 = без ошибок
    int0x80                ; здесь выполнение завершается
_print_error:
    push error_msg
    call _print_msg
    mov eax, 0x01
    mov ebx, 1int0x80

It will be necessary to add another line error_msgto the section .rodata:

section .rodata
    ; Назначаем на некоторые байты error_msg. Псевдоинструкция db в NASM
    ; позволяет использовать однобайтовое значение, строковую константу или их 
    ; сочетание. 0xA = новая строка, 0x0 = нуль окончания строки
    error_msg: db "Invalid input", 0xA, 0x0

And we are done! Surprise all your friends if you have them. I hope that now you will treat high-level languages with greater warmth, especially if we recall that many old programs wrote completely or almost completely in assembler, for example, the original RollerCoaster Tycoon!

All code is here . Thank you for reading! I can continue if you're interested.

Next steps

You can practice by implementing several additional functions:

Issue an error message instead of segfault if the program does not get an argument.
Add support for additional spaces between operands and operators in the input.
Add support for multi-bit operands.
Allow negative input.
Replace _strlenwith a function from the standard C library , and _print_answerreplace with a call printf.

Additional materials

The Guide to the x86 assembler of the University of Virginia is a more detailed account of many of the topics we have covered, including additional information on all popular x86 instructions.
"The Art of Selecting Intel Registers" . Although most of the x86 registers are general-purpose registers, many have historical significance. Following these conventions can improve the readability of the code and, as an interesting side effect, even slightly optimize the size of binary files.
NASM: Intel x86 Instruction Reference is the complete guide to all obscure x86 instructions.

Tags: