File Formats for FASM Programs for Windows

When creating an assembler program (for example, FASM will be shown) from under Windows, the question arises as to which file format to choose.
To determine the format of the created executable file, the “format” directive is used with the format identifier following it.
Under the cut is a brief description of the COM-program and the EXE-program of the MZ and PE formats with a program template (in the form of the traditional "Hello World!").

The default format is a simple binary file, it can also be selected with the “format binary” directive, which forms programs like .COM.
“Use16” and “use32” instruct the assembler to generate 16-bit or 32-bit code, neglecting the default setting for the selected output format. “Use64” enables code generation for long mode x86 processors.
Various output formats with directives specific to them are described below.

.Com programs

Programs like .com, after being loaded into memory, are an unmodified representation of a program in machine language on disk. The .com format is one of the simplest x86 executable file formats. The size of the .com file is limited to the size of 1 segment and is equal to 64 KB, all data must be defined in the same code segment. When the COM program starts working, all segment registers contain the address of the program segment prefix (PSP), which is a 256-byte (100h) block that is reserved by the DOS operating system immediately before the COM or EXE program in memory. Since addressing starts at offset 100h from the beginning of the PSP, the ORG directive 100h is encoded in the program. This directive sets the relative address for starting a program. The boot loader uses this address for the command pointer.

An example of a simple program in .COM format:

use16               ;Генерировать 16-битный код
org 100h            ;Программа начинается с адреса 100h
    mov dx,hello    ;В DX адрес строки.
    mov ah,9        ;Номер функции DOS.
    int21h         ;Обращение к функции DOS.
    mov ax,4C00h    ;В регистр AH помещаем 4Ch, в AL – 00h.
    int21h         ;Завершение программы
;-------------------------------------------------------
hello db 'Hello, world!$'

The “use 16” directive indicates the generation of 16-bit code. "Org 100h" declares a 256 byte pass (addresses 0000h - 00FFh). These addresses are reserved for service data (PSP).
The following are the commands. The address of the hello string is placed in the DX register. Then function number 9 of interrupt 21h is called to display the string on the screen.
The program is terminated by calling function 4C with the same interrupt parameter 21h.
The hello line ends with the '$' character, which on DOS indicates the end of the line.

Keep in mind that programs like COM are not supported by 64-bit Windows. To run such programs under these operating systems, you should use the DOSBox program, or use the PE format described below.

MZ format

MZ is a standard format for 16-bit executables with the .EXE extension for DOS. It is named after the signature - ASCII characters MZ (4D 5A) in the first two bytes.

An example of a simple program using the MZ format:

format MZ                       ;Исполняемый файл DOS EXE (MZ EXE)
entry code_seg:start            ;Точка входа в программу
stack 200h                      ;Размер стека 
;--------------------------------------------------------------------
segment data_seg                ;Cегмент данных
    hello db 'Hello, asmworld!$'    ;Строка 
;--------------------------------------------------------------------
segment code_seg                ;Сегмент кода
start:                          ;Точка входа в программу
    mov ax,data_seg             ;Инициализация регистра DS
    mov ds,ax 
    mov ah,09h 
    mov dx,hello                ;Вывод строки
    int21h 
    mov ax,4C00h
    int21h                     ;Завершение программы


To create, you need to use the “format MZ” directive. By default, the code for this format is 16-bit.
“Segment” defines a new segment, followed by a label whose value is the number of the segment to be determined. Optionally, this directive can be followed by "use16" or "use32" to indicate the bit depth of the code in the segment. The beginning of the segment is aligned with the paragraph (16 bytes). All labels defined below will have values ​​relative to the beginning of this segment. In the example above, 2 segments are declared: "data_seg" and "code_seg".
“Entry” sets the entry point for the MZ format, followed by the distant address (segment name, colon and offset in the segment) of the desired entry point. In our case, the label “start” is declared.
"Stack" sets the stack for MZ. The directive can be followed by a numerical expression indicating the size of the stack for automatic creation, or the distant address of the initial stack frame, if you want to set the stack manually. If the stack is not defined, it will be created with a default size of 4096 bytes.
A “heap” with a value following it determines the maximum size of extra space in paragraphs (this place is in addition to the stack and for undefined data). Use heap 0 to always allocate only the memory that the program really needs.

The MZ format, like COM programs, is not supported by 64-bit Windows.

PE format

PE is short for Portable Executable, i.e. portable (universal) executable file. This format appeared in the late days of Windows 3.11, but it was widely spread with the heyday of Windows 95. We can say that now on computers with Windows 9x / 2K / XP / Vista / 7 95% are executable (exe, dll, drivers (sys) ) files are PE files.

To select the PE format, you need to use the “format PE” directive, it can be followed by additional format settings: “console”, “GUI” or the “native” operator to select the target subsystem (then a floating-point value indicating the version of the subsystem can follow ), The “DLL” marks the output file as a dynamic linking library. Then the “at” operator and a numerical expression indicating the base of the PE image can follow, and optionally the “on” operator with the next line in quotation marks containing the file name, choosing the MZ stub for the PE program (if the specified file is not in the MZ format, then it is treated as a simple binary executable file and is converted to the MZ format). By default, the code for this format is 32-bit.

An example of a PE format declaration with all the properties:
format PE GUI 4.0 DLL at 7000000h on 'stub.exe'
“section” defines a new section, it must be followed by a quoted string indicating the name of the section, and then one or more section flags may follow. Possible flags are: “code”, “data”, “readable”, “writeable”, “executable”, “shareable”, “discardable”, “notpageable”. The beginning of the section is aligned on the page (4096 bytes).

Example PE section declaration:
section '.text' code readable executable
Together with the flags, one of the special PE data identifiers can also be defined, marking the entire section as special data, possible identifiers: “export”, “import”, “resource” and “fixups”. If a section is marked for the contents of the address settings, they are automatically generated and no data is required to be determined anymore. Also, resource data can be generated automatically from resource files, this can be achieved by writing the "from" operator after the identifier "resourse" and the file name in quotation marks.

Below you can see examples of sections containing some special data:
section '.reloc' data discardable fixups
section '.rsrc' data readable resource from 'my.res'
“entry” creates an entry point for PE, then the entry point value should follow.
“Stack” sets the stack size for PE, then the value of the reserved stack size should follow, optionally a comma-separated value of the beginning of the stack can follow. If the stack is not defined, it is assigned a default size of 4096 bytes.
“Heap” selects the size of the extra space for the PE, then the value for the space reserved for it should follow, optionally there may also be a value for its beginning, separated by a comma. If the extra space is not defined, it is set to 65536 bytes by default; if its beginning is not specified, then it is set to 0.
"Data" begins the definition of special PE data, the directive must be followed by one of the data identifiers ("export", "import", "resource" or "fixups") or the number of the data record in the PE header. Data should be defined on the following lines and end with the “end data” directive. If you select the address settings definition, they are generated automatically, and no data is required to be determined anymore. The same applies to resources if the “from” statement follows the identifier “resourse” and the file name in quotation marks — in this case, data is taken from this resource file.

An example of a simple program using the PE format:

format PE console              ;Исполняемый файл Windows EXE
entry start                    ;Точка входа в программу
include 'win32a.inc'
section '.text' code executable
start:
    push hello
    call [printf]
    push 0
    ccall   [getchar]
    call [ExitProcess] 
section'.rdata'data readable
    hello db 'Hello World!', 0section'.idata'data readable importlibrary kernel32, 'kernel32.dll', \
        msvcrt,   'msvcrt.dll'import kernel32, ExitProcess, 'ExitProcess'import msvcrt, printf, 'printf', getchar,'_fgetchar'

In this example, the WinAPI functions are used to work with the console.

Here's a short (hopefully useful for someone) review of the use of the PE and MZ formats. Overboard this article were ELF and COFF, for which I ask you not to judge much.

Also popular now: