ASM on Unix

Of course, assembler for Unix differs from assembler for dos or windows. While in ASM under these operating systems, the syntax imposed by intellect was used, full of various ambiguities (ambiguities, if you like) solved by type coercion (byte ptr, word ptr, dword ptr), in ASM under Nyx we used AT&T and SysV / 386, which was developed specifically to eliminate the ambiguity of the interpretation of commands. Of course, there are assemblers for Unix with intelligent syntax, such as NASM, but this article will discuss the syntax of standard assemblers for this platform.

In general, of course, it would be worth starting with the rules. We will do so. In the assembler using AT&T syntaxes, all Latin letters, numbers, and also additional symbols, such as percentage, comma, period, underline, asterisk, dollar sign, are used for work. Processor commands: any sequence of allowed characters that starts not with a special sign or a digit and does not end with a colon is considered an assembler processor command:

//останов процессора
hlt


if such a sequence starts with a percent sign, then this is the processor register:

pushl %eax
// помещает содержимое регистра %eax в стэка если начинается с доллара ($), то это непосредственный операнд. Нижеприведенный код помещает в стэк число 0, 10h, и адрес переменной qwerty:
pushl         $0
pushl    $0x10
pushl $qwerty

if the sequence starts with a period, then this is considered an assembler derrick:

.aling 2


Well, if the sequence ends with a colon, then this is a label (used just like in assembler under doc and windows). It is worth noting a special dot label - this label, as in acma under d, characterizes the current address.

Type conversion commands in AT&T syntax have four-letter names: C, source size, T, receiver size:

//cbw	
	cbtw
//cwde
	cwtl
//cwd
	cwtl
//cdq
	cltd

where:
b- byte
w- word
l- double word
q- quadruple word
s- 32bit floating
-point number l-64bit floating -point number
t- 80bit floating -point number

One of the most important differences in assemblers is the source and source record, and unlike dos-asma, in Unix the source operand is always written in the first position

//mov ax,bx
	movw	%bx,%ax
//imul eax,ecx,16
	imull $16,%ecx,%eax


Types of addressing: as mentioned earlier, the register operand and the direct operand are distinguished by the prefixes% and $:

//xor ebx,ebx
	xorl %ebx,%ebx
//mov edx,offset qwerty
	movl $qwerty,%edx


For indirect addressing, the unmodified variable name is used, as it was in the intellectual version: More complex addressing methods are best considered on the basis of operations with shift, base and indexing:

//push dword ptr qwerty
pushl $qwerty




//mov eax,base_addr[ebx+edi*4]
	movl base_addr(%ebx+%edi*4),%eax
//lea eax,[eax,eax*4]
	leal (%eax,%eax*4),%eax
//mov ax,word ptr [bp-2]
	movw -2(%ebp),%ax
//mov edx,dword ptr [edi*2]
	movl (%edi*2),%edx


The programming process itself is divided into programming using the libc library and programming without using it. Since the system itself is written in C and many functions access this library, so programs written in assembler have the ability to access it. The library function is called using the call command. But there is one problem: since not all Unix systems are similar, in some systems you need to put an underscore before the library function. Consider the following program that displays the famous phrase:

.text
.globl main
main:
	pushl $message
	call puts
	popl %ebx
	ret
.data
message:
	.string	"Hello world!\0"

Without glibc, the program will look like this:

.text
.globl _start
_start:
	movl 	$4,eax
	xorl 	%ebx.%ebx
	incl 	%ebc
	movl	$message,%ecx
	movl	$mesg_len,%edx
	int 	$0x80
	xorl	 %eax,%eax
	incl 	 %eax	
	xorl	 %ebx,%ebx
	int 	$0x80
	hlt
.data
message:
	.string	"Hello World!\012"
mesg_len= .-message


In this example, we used two system calls to write to the screen: write and exit. The write call corresponds to placing in the% eax register a 4-value value under which this function is recorded in the system call table.

This function is called by calling the interrupt $ 0x80 Exiting the program, i.e. its completion corresponds to the system call $ 1.

Also popular now: