ASM on Unix
Of course, assembler for Unix differs from assembler for dos or windows. While in ASM under these operating systems, the syntax imposed by intellect was used, full of various ambiguities (ambiguities, if you like) solved by type coercion (byte ptr, word ptr, dword ptr), in ASM under Nyx we used AT&T and SysV / 386, which was developed specifically to eliminate the ambiguity of the interpretation of commands. Of course, there are assemblers for Unix with intelligent syntax, such as NASM, but this article will discuss the syntax of standard assemblers for this platform.
In general, of course, it would be worth starting with the rules. We will do so. In the assembler using AT&T syntaxes, all Latin letters, numbers, and also additional symbols, such as percentage, comma, period, underline, asterisk, dollar sign, are used for work. Processor commands: any sequence of allowed characters that starts not with a special sign or a digit and does not end with a colon is considered an assembler processor command:
if such a sequence starts with a percent sign, then this is the processor register:
if the sequence starts with a period, then this is considered an assembler derrick:
Well, if the sequence ends with a colon, then this is a label (used just like in assembler under doc and windows). It is worth noting a special dot label - this label, as in acma under d, characterizes the current address.
Type conversion commands in AT&T syntax have four-letter names: C, source size, T, receiver size:
where:
b- byte
w- word
l- double word
q- quadruple word
s- 32bit floating
-point number l-64bit floating -point number
t- 80bit floating -point number
One of the most important differences in assemblers is the source and source record, and unlike dos-asma, in Unix the source operand is always written in the first position
Types of addressing: as mentioned earlier, the register operand and the direct operand are distinguished by the prefixes% and $:
For indirect addressing, the unmodified variable name is used, as it was in the intellectual version: More complex addressing methods are best considered on the basis of operations with shift, base and indexing:
The programming process itself is divided into programming using the libc library and programming without using it. Since the system itself is written in C and many functions access this library, so programs written in assembler have the ability to access it. The library function is called using the call command. But there is one problem: since not all Unix systems are similar, in some systems you need to put an underscore before the library function. Consider the following program that displays the famous phrase:
Without glibc, the program will look like this:
In this example, we used two system calls to write to the screen: write and exit. The write call corresponds to placing in the% eax register a 4-value value under which this function is recorded in the system call table.
This function is called by calling the interrupt $ 0x80 Exiting the program, i.e. its completion corresponds to the system call $ 1.
In general, of course, it would be worth starting with the rules. We will do so. In the assembler using AT&T syntaxes, all Latin letters, numbers, and also additional symbols, such as percentage, comma, period, underline, asterisk, dollar sign, are used for work. Processor commands: any sequence of allowed characters that starts not with a special sign or a digit and does not end with a colon is considered an assembler processor command:
//останов процессора
hlt
if such a sequence starts with a percent sign, then this is the processor register:
pushl %eax
// помещает содержимое регистра %eax в стэка если начинается с доллара ($), то это непосредственный операнд. Нижеприведенный код помещает в стэк число 0, 10h, и адрес переменной qwerty:
pushl $0
pushl $0x10
pushl $qwerty
if the sequence starts with a period, then this is considered an assembler derrick:
.aling 2
Well, if the sequence ends with a colon, then this is a label (used just like in assembler under doc and windows). It is worth noting a special dot label - this label, as in acma under d, characterizes the current address.
Type conversion commands in AT&T syntax have four-letter names: C, source size, T, receiver size:
//cbw
cbtw
//cwde
cwtl
//cwd
cwtl
//cdq
cltd
where:
b- byte
w- word
l- double word
q- quadruple word
s- 32bit floating
-point number l-64bit floating -point number
t- 80bit floating -point number
One of the most important differences in assemblers is the source and source record, and unlike dos-asma, in Unix the source operand is always written in the first position
//mov ax,bx
movw %bx,%ax
//imul eax,ecx,16
imull $16,%ecx,%eax
Types of addressing: as mentioned earlier, the register operand and the direct operand are distinguished by the prefixes% and $:
//xor ebx,ebx
xorl %ebx,%ebx
//mov edx,offset qwerty
movl $qwerty,%edx
For indirect addressing, the unmodified variable name is used, as it was in the intellectual version: More complex addressing methods are best considered on the basis of operations with shift, base and indexing:
//push dword ptr qwerty
pushl $qwerty
//mov eax,base_addr[ebx+edi*4]
movl base_addr(%ebx+%edi*4),%eax
//lea eax,[eax,eax*4]
leal (%eax,%eax*4),%eax
//mov ax,word ptr [bp-2]
movw -2(%ebp),%ax
//mov edx,dword ptr [edi*2]
movl (%edi*2),%edx
The programming process itself is divided into programming using the libc library and programming without using it. Since the system itself is written in C and many functions access this library, so programs written in assembler have the ability to access it. The library function is called using the call command. But there is one problem: since not all Unix systems are similar, in some systems you need to put an underscore before the library function. Consider the following program that displays the famous phrase:
.text
.globl main
main:
pushl $message
call puts
popl %ebx
ret
.data
message:
.string "Hello world!\0"
Without glibc, the program will look like this:
.text
.globl _start
_start:
movl $4,eax
xorl %ebx.%ebx
incl %ebc
movl $message,%ecx
movl $mesg_len,%edx
int $0x80
xorl %eax,%eax
incl %eax
xorl %ebx,%ebx
int $0x80
hlt
.data
message:
.string "Hello World!\012"
mesg_len= .-message
In this example, we used two system calls to write to the screen: write and exit. The write call corresponds to placing in the% eax register a 4-value value under which this function is recorded in the system call table.
This function is called by calling the interrupt $ 0x80 Exiting the program, i.e. its completion corresponds to the system call $ 1.