PE (Portable Executable): On Stranger Tides



This article is a story about how the executable files are arranged (to the point! These are exactly the things that are obtained after compiling applications with the extension .exe). After the code is written, the libraries are connected, the resources (icons for windows, any text files, pictures, etc.) are loaded into the project, all this is compiled into one single executable file, mainly with the .exe extension. It is in this pool that we will plunge.
* The article is under the auspices of "for beginners" and therefore will be replete with diagrams and descriptions of important elements of the download.

Introduction




PE format is the executable file format of all 32-bit and 64-bit Windows systems. There are currently two formats for PE files: PE32 and PE32 +. PE32 format for x86 systems, and PE32 + for x64. The described structures can be seen in the WINNT.h header file that comes with the SDK. A description of this format from microsoft can be downloaded here , but for now I will leave a small schematic diagram here. Just go over your eyes, in the course of the article you will begin to grasp and everything will be put in order.



Any file is just a sequence of bytes. And the format is like a special map (treasures) for him. That is, it shows where where it is, where the islands with coconuts, where with bananas, where the sandy beaches, and where the Somali, where it would be better not to meddle. So let's explore the vast expanses of this ocean. Give the mooring lines!

“Now you will hear a sad story. about the boy Bobby "
(Treasure Island)

Dos-Header (IMAGE_DOS_HEADER) and Dos-stub




Dos header. This is the very first structure (the very first island we met on the way) in the file and it has a size of 64 bytes. In this structure, the most important fields are e_magic and e_lfnew . Let's see what the structure looks like:



To study all the fields at this stage is useless, because they do not carry a special semantic load. Consider only those that are necessary for download and are of particular interest. (Further and lower in the text, the field description format will be of the form name : TYPE - description).

e_magic : WORD - signature located at offset 0 from the beginning of the file and equal to “MZ”. Rumor has it that the MZ abbreviation from Mark Zbinowski is the most vicious pirate in the entire body of waterLeading developer of MS DOS and EXE format. If this signature is not equal to MZ, then the file will not load.

e_lfnew : DWORD - offset of the PE header relative to the beginning of the file. The PE header must begin with the signature (characteristic record / signature) PE \ x0 \ x0. The PE header can be located anywhere in the file. If you look at the structure, you can see that e_lfnew is located at offset 0x3C (60 in decimal). That is, to read this value, we must “add” 60 bytes from the pointer to the beginning of the file (we introduce the notation - ptrFile ) and then we will face to face before e_lfnew. We read this value (let's peStep ) and plyusuem to ptrFile value peStep. Mission completed - we are the boss, it should be a PE heading. And we can probably find out by checking the first four bytes of this header. As mentioned above, they must equal PE \ x0 \ x0.

After the first 64 bytes of the file, dos-stub starts (pirates also call it dos a stub). This is an area in memory that is mostly packed with zeros. (Take another look at the structure - the stub lies after the dos-header (a) and before the PE header) It serves only for backward compatibility, it is useless for current systems. A mini version of the dos program limited to 192 bytes can be written into it (256 is the end of the stub, 64 is the size of the dos header). But it’s easier to find an Access Point in Zimbabwe than such a program. The standard behavior, if you run the program on dos, it will display messages like “This program cannot be run in DOS mode.” or “This program must be run under win32”. If you see these lines, it means that you are ... in the distant 85th.



“Fuck money, I'm talking about Flint's papers!”
(Treasure Island)

PE-Header (IMAGE_NT_HEADER)




We read e_lfnew , departed from the beginning of the file by peStep bytes. Now we can begin to analyze the PE header. This is a new island for us and it should be located on the open spaces of the following 0x18 bytes. The structure is presented below:

typedef struct _IMAGE_NT_HEADERS {
  DWORD                 Signature;
  IMAGE_FILE_HEADER     FileHeader;
  IMAGE_OPTIONAL_HEADER OptionalHeader;
} IMAGE_NT_HEADERS, *PIMAGE_NT_HEADERS;

This is an interesting structure because it contains substructures. If you imagine a PE file as an ocean, each structure is a mainland (or island). On the mainland are states that can talk about their territory. A story is made up of the history of individual cities (fields) in this state. So - NT Header is the mainland, which contains such countries as Signature (city-state), FileHeader, OptionalHeader. As already mentioned, Signature : DWORD - contains a 4-byte signature that characterizes the file format. Consider what else this continent can tell us.

File-Header (IMAGE_FILE_HEADER)


This is a country where they always shoot, sell drugs and engage in prostitution where each city tells in what ideal state it is located. This is with regard to the informal description, and the formal is as follows - a set of fields that describes the basic characteristics of the file. Let's look at this power structure :

typedef struct _IMAGE_FILE_HEADER {
  WORD  Machine;
  WORD  NumberOfSections;
  DWORD TimeDateStamp;
  DWORD PointerToSymbolTable;
  DWORD NumberOfSymbols;
  WORD  SizeOfOptionalHeader;
  WORD  Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

I will only dryly describe these fields, because the names are intuitive and represent immediate meanings, not VA, RVA, RAW and other scary intriguing things that we have only heard about from old pirates. Although we have already encountered RAW - these are just offsets relative to the beginning of the file (they are also called raw pointers or file offset). That is, if we have a RAW address, it means that we need to step from the beginning of the file to RAW positions ( ptrFile + RAW). After you can start reading the values. A striking example of this kind is e_lfnew - which we examined above in the Dos header.

* Machine : WORD - this number (2 bytes) defines the processor architecture on which this application can run.
NumberOfSections : DWORD - the number of sections in the file. Sections (hereinafter referred to as the table of sections) follow immediately after the header (PE-Header). The documentation says that the number of sections is limited to 96.
TimeDateStamp : WORD - the number that stores the date and time the file was created.
PointerToSymbolTable : DWORD is the offset (RAW) to the character table, and SizeOfOptionalHeader is the size of this table. This table is intended to serve for storing debugging information, but the squad did not notice the loss of a fighter from the very beginning of the service. Most often this field is cleared with zeros.
SIzeOfOptionHeader : WORD - the size of the optional header (which immediately follows the current one). The documentation states that for the object file it is set to 0 ...
*Characteristics : WORD - file characteristics.

* - fields that are defined by the range of values. Tables of possible values ​​are presented in the description of the structure at the office. site will not be given here, because they don’t carry anything especially important for understanding the format.

Let's leave this island! We need to move on. Landmark - a country called Optional-Header.

“- Where's the map, Billy?” I need a map. ”
(Treasure Island)

Optional-Header (IMAGE_OPTIONAL_HEADER)





The title of this title mainland is not very successful. This header is required and has 2 formats PE32 and PE32 + (IMAGE_OPTIONAL_HEADER32 and IMAGE_OPTIONAL_HEADER64 respectively). The format is stored in the Magic : WORD field . The header contains the necessary information to download the file. As always :

IMAGE_OPTIONAL_HEADER
typedef struct _IMAGE_OPTIONAL_HEADER {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  DWORD                SizeOfStackReserve;
  DWORD                SizeOfStackCommit;
  DWORD                SizeOfHeapReserve;
  DWORD                SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADE
R, *PIMAGE_OPTIONAL_HEADER;


* As always, we will study only the main fields that have the greatest impact on the idea of ​​loading and how to move on through the file. Let's agree - the fields of this structure contain values ​​with VA (Virtual address) and RVA (Relative virtual address) addresses. These are already addresses not like RAW, and they need to be able to read (more precisely, count). We will certainly learn how to do this, but only for a start we will analyze the structures that follow each other so as not to get confused. For now, just remember - these are addresses that, after calculations, indicate a specific place in the file. There will also be a new concept - alignment. We will consider it in a compartment with RVA addresses, as these they are quite closely related.

AddressOfEntryPoint: DWORD - RVA address of the entry point. Can point anywhere in the address space. For .exe files, the entry point corresponds to the address from which the program starts to run and cannot be zero!
BaseOfCode : DWORD - RVA of the beginning of the program code (code section).
BaseOfData : DWORD - RVA of the beginning of the program code (data section).
ImageBase : DWORD is the preferred base address for loading the program. Must be a multiple of 64kb. In most cases, it is 0x00400000.
SectionAligment : DWORD - alignment size (bytes) of the section when unloading into virtual memory.
FileAligment : DWORD - alignment size (bytes) of the section inside the file.
Sizeoffage: DWORD - file size (in bytes) in memory, including all headers. Must be a multiple of SectionAligment.
SizeOfHeaders : DWORD - the size of all headers (DOS, DOS-Stub, PE, Section) aligned to FileAligment.
NumberOfRvaAndSizes : DWORD - the number of directories in the directory table (below the table itself). At the moment, this field is always equal to the symbolic constant IMAGE_NUMBEROF_DIRECTORY_ENTRIES, which is 16.
DataDirectory [NumberOfRvaAndSizes]: IMAGE_DATA_DIRECTORY - data directory. Simply put, this is an array (size 16), each element of which contains a structure of 2 DWORD values.

Consider what the IMAGE_DATA_DIRECTORY structure is :

typedef struct _IMAGE_DATA_DIRECTORY {
  DWORD VirtualAddress;
  DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

What we have? We have an array of 16 elements, each element of which contains an address and size (why? How? Why? All in a minute). The question is what exactly are these characteristics. For this, microsoft has special constants for matching. They can be seen at the very end of the structure description. In the meantime:

// Directory Entries
#define IMAGE_DIRECTORY_ENTRY_EXPORT          	0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          	        1   // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE        	2   // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       	3   // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY        	4   // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC       	5   // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG           	        6   // Debug Directory
//      IMAGE_DIRECTORY_ENTRY_COPYRIGHT       		7   // (X86 usage)
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    	7   // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       	8   // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS             		9   // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    	10   // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT  	11   // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT            		12   // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   	13   // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 	14   // COM Runtime descriptor

Yeah! We see that each element of the array is responsible for the table attached to it. But alas, oh, while these shores are inaccessible to us, because we do not know how to work with VA and RVA addresses. And in order to learn, we need to study what sections are. They will talk about their structure and work, after which it will become clear why VA, RVA and alignment are needed. In the framework of this article, we will only affect export and import. The purpose of the remaining fields can be found in the office. documentation, or in books. So here. Actually fields:

VirtualAddress : DWORD - RVA on the table to which the array element corresponds.
Size : DWORD - the size of the table in bytes.

So! To get to such exotic shores as tables of import, export, resources and others, we need to go through a quest with sections. Well then, for a young man, take a look at the general map, determine where we are now and move on:



And we are located directly in front of the wide open spaces of the sections. We need to certainly try out what they are hiding and finally deal with another type of addressing. We want real adventure! We want to quickly go to such republics as import and export tables. Old pirates say that not everyone could get to them, and the one who got back came back with gold and women with sacred knowledge about the ocean. We set sail and hold the path to the Section header.

“- You are deposed, Silver! Get off the barrel! ”
(Treasure Island)

Section-header (IMAGE_SECTION_HEADER)




Sections follow each other immediately after the DataDirectory array . The section table is a sovereign state, which is divided into NumberOfSections of cities. Each city has its own craft, its rights, as well as the size of 0x28 bytes. The number of sections is indicated in the NumberOfSections field , which is stored in the File-header. So, consider the structure :

typedef struct _IMAGE_SECTION_HEADER {
  BYTE  Name[IMAGE_SIZEOF_SHORT_NAME];
  union {
    DWORD PhysicalAddress;
    DWORD VirtualSize;
  } Misc;
  DWORD VirtualAddress;
  DWORD SizeOfRawData;
  DWORD PointerToRawData;
  DWORD PointerToRelocations;
  DWORD PointerToLinenumbers;
  WORD  NumberOfRelocations;
  WORD  NumberOfLinenumbers;
  DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

Name : BYTE [IMAGE_SIZEOF_SHORT_NAME] - section name. Currently has a length of 8 characters.
VirtualSize : DWORD - section size in virtual memory.
SizeOfRawData : DWORD - the size of the section in the file.
VirtualAddress : DWORD - RVA address of the section.
SizeOfRawData : DWORD - the size of the section in the file. Must be a multiple of FileAligment .
PointerToRawData : DWORD - RAW offset to the beginning of the section. Must also be a multiple of FileAligment ...
Characteristics: DWORD - access attributes to the section and the rules for loading it into the virtual. memory. For example, an attribute to determine the contents of a section (initial. Data, not initial. Data, code). Or access attributes - read, write, execute. This is not their entire spectrum. Characteristics are set by constants from the same WINNT.h, which begin with IMAGE_SCN_. More details on section attributes can be found here . Attributes in Chris Kaspersky's books are also well described - a list of references at the end of the article.

As for the name, the following should be remembered - the section with resources should always have the name .rsrc. Otherwise, the resources will not be loaded. As for the remaining sections, the name can be anything. Usually there are meaningful names, for example .data, .src, etc. ... But it also happens:



Sections, this is an area that is unloaded into virtual memory and all work happens directly with this data. An address in virtual memory, without any bias, is called Virtual address, abbreviated as VA. The preferred address for downloading the application is specified in the ImageBase field . This is like the point at which the application area in virtual memory begins. And relative to this point, the offsets of the RVA (Relative virtual address) are counted. That is, VA = ImageBase + RVA; ImageBase is always known to us, and when we have VA or RVA at our disposal, we can express one through the other.

It seems to have gotten used to it. But this is virtual memory! And then we are in the physical. Virtual memory for us now is like a journey to other galaxies, which we can only imagine. So we can’t get into virtual memory at the moment, but we can find out what will be there, because this is taken from our file.

Alignment




In order to correctly represent the unloading in the virtual. memory, you need to deal with such a mechanism as alignment. To get started, let's take a look at a diagram of how sections are unloaded into memory.



As you can see, the section is unloaded into memory not in its size. Alignments are used here. This is the value to which the size of the section in memory should be a multiple of. If you look at the diagram, we will see that the size of the section is 0x28, and it is unloaded at a size of 0x50. This is due to the alignment size. 0x28 “does not reach” 0x50 and as a result, the section will be unloaded, and the rest of the space in the amount of 0x50-0x28 will be nullified. And if the section size would be larger than the alignment size, then what? For example, sectionSize = 0x78, and sectionAligment= 0x50, i.e. remained unchanged. In this case, the section would occupy 0xA0 (0xA0 = 0x28 * 0x04) bytes in memory. That is, a value that is a multiple of sectionAligment and completely covers sectionSize . It should be noted that sections in the file are aligned in the same way, only by the size of the FileAligment . Having received the necessary base, we can figure out how to convert from RVA to RAW.

“Here you are not flat, here the climate is different.”
(V.S. Vysotsky)

A little lesson in arithmetic




Before starting execution, some part of the program must be sent to the address space of the processor. Address space is the amount of RAM physically addressed by the processor. The “piece” in the address space where the program is downloaded is called a virtual image. The image is characterized by the base load address (Image base) and size (Image size). So VA (Virtual address) is the address relative to the beginning of virtual memory, and RVA (Relative Virtual Address) is relative to the place where the program was downloaded. How to find the base address of the application download? To do this, there is a separate field in the optional header called ImageBase . It was a little prelude to refresh. Now consider a schematic representation of different addresses:



So how do you still read information from a file without unloading it into virtual memory? To do this, convert the addresses to RAW format. Then we can step inside the file to the area we need and read the necessary data. Since RVA is an address in virtual memory, data from which was projected from a file, we can perform the reverse process. To do this, we need a key of nine to sixteen simple arithmetic. Here are some formulas:

VA = ImageBase + RVA;
RAW = RVA - sectionRVA + rawSection;
// rawSection - смещение до секции от начала файла
// sectionRVA - RVA секции (это поле хранится внутри секции)

As you can see, to calculate the RAW, we need to determine the section to which the RVA belongs. To do this, go through all sections and check the following conditions:

RVA >= sectionVitualAddress && RVA < ALIGN_UP(sectionVirtualSize, sectionAligment)
// sectionAligment - выравнивание для секции. Значение можно узнать в Optional-header.
// sectionVitualAddress - RVA секции - хранится непосредственно в секции
// ALIGN_UP() - функция, определяющая сколько занимает секция в памяти, учитывая выравнивание

Putting all the puzzles together, we get this listing:

typedef uint32_t DWORD;
typedef uint16_t WORD;
typedef uint8_t BYTE;
#define ALIGN_DOWN(x, align)  (x & ~(align-1))
#define ALIGN_UP(x, align)    ((x & (align-1))?ALIGN_DOWN(x,align)+align:x)
//	IMAGE_SECTION_HEADER sections[numbersOfSections];
// 	init array sections
int defSection(DWORD rva)
{
    for (int i = 0; i < numberOfSection; ++i)
    {
        DWORD start = sections[i].VirtualAddress;
        DWORD end = start + ALIGN_UP(sections[i].VirtualSize, sectionAligment);
        if(rva >= start && rva < end)
            return i;
    }
    return -1;
}
DWORD rvaToOff(DWORD rva)
{
    int indexSection = defSection(rva);
    if(indexSection != -1)
        return rva - sections[indexSection].VirtualAddress + sections[indexSection].PointerToRawData;
    else
        return 0;
}

* I did not include the type declaration and array initialization in the code, but only provided functions that will help in calculating addresses. As you can see, the code is not very complicated. Is that a little confusing. It goes ... if you take a little more time to bump into .exe through a disassembler.

HURRAH! Understood. Now we can go to the edge of resources, import and export libraries, and generally wherever the soul desires. We just learned how to work with a new kind of addressing. Let's hit the road!

"-Not bad, not bad! Yet they got their rations for today! ”
(Treasure Island)

Export table




In the very first element of the DataDirectory array, the RVA is stored on the export table, which is represented by the IMAGE_EXPORT_DIRECTORY structure. This table is specific to dynamic-link library (.dll) files. The main objective of the table is to link the exported functions with their RVA. The description is presented in of. specifics :

typedef struct _IMAGE_EXPORT_DIRECTORY {
                		DWORD   Characteristics;
                		DWORD   TimeDateStamp;
                		WORD    MajorVersion;
                		WORD    MinorVersion;
                		DWORD   Name;
                		DWORD   Base;
                		DWORD   NumberOfFunctions;
                		DWORD   NumberOfNames;
			DWORD   AddressOfFunctions;
			DWORD   AddressOfNames;
			DWORD   AddressOfNameOrdinals;
	} IMAGE_EXPORT_DIRECTORY,*PIMAGE_EXPORT_DIRECTORY;

This structure contains three pointers to three different tables. This is a table of names (functions) ( AddressOfNames ), ordinals ( AddressOfNamesOrdinals ), addresses ( AddressOfFunctions ). The Name field stores the RVA of the dynamic library name. An ordinal is like an intermediary between a table of names and a table of addresses, and it is an array of indices (the size of the index is 2 bytes). For greater clarity, consider the scheme:



Consider an example. Suppose the ith element of an array of names indicates the name of the function. Then the address of this function can be obtained by contacting the ith element in the address array. Those. i is the ordinal.

Attention!If you take for example the 2nd element in the ordinal table, this does not mean 2 - this is the ordinal for the name and address tables. An index is a value stored in the second element of an array of ordinals.

The number of values ​​in the name tables ( NumberOfNames ) and ordinals are equal and do not always coincide with the number of elements in the address table ( NumberOfFunctions ).

“They came for me. Thanks for attention. They must be killing now! ”
(Treasure Island)

Import table




An import table is an integral part of any application that uses dynamic libraries. This table helps to correlate the function calls of dynamic libraries with the corresponding addresses. Import can occur in three different modes: standard, bound import and delay import. Because the import theme is quite multifaceted and draws on a separate article, I will describe only the standard mechanism, and the rest I will describe only with the “skeleton”.

Standard import - to DataDirectoryunder the index IMAGE_DIRECTORY_ENTRY_IMPORT (= 1) the import table is stored. It is an array of elements of type IMAGE_IMPORT_DESCRIPTOR. The import table stores (in an array) the names of the functions / ordinals and in which place the loader should write the effective address of this function. This mechanism is not very effective, because frankly speaking, it comes down to sorting out the entire export table for each required function.

Bound import - with this scheme of work, -1 is entered into the fields (in the first element of the standard import table) TimeDateStamp and ForwardChain and information about binding is stored in the DataDirectory cellwith index IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT (= 11). That is, this is a kind of flag for the loader to use bound import. Also for the “chain of bound imports” their structures appear. The algorithm of work is as follows - the necessary library is unloaded into the virtual memory of the application and all the necessary addresses are “bound” at the compilation stage. One of the drawbacks is that when recompiling the dll, you will need to recompile the application itself, because function addresses will be changed.

Delay import- with this method, it is understood that the .dll file is attached to the executable, but it is not loaded immediately into memory (as in the previous two methods), but only when the application first accesses the symbol (this is the name of unloaded elements from dynamic libraries). That is, the program is executed in memory and as soon as the process reaches the function call from the dynamic library, a special handler is called that loads the dll and distributes the effective addresses of its functions. For deferred import, the loader accesses DataDirectory [IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT] (item number 15).

Having a little coverage of import methods, we proceed directly to the import table.

“-This is a sailor!” His clothes were marine. - Yah? Have you thought to find a bishop here? ”
(Treasure Island - John Silver)

Import-descriptor (IMAGE_IMPORT_DESCRIPTOR)




In order to find out the coordinates of the import table, we need to access the DataDirectory array . Namely, to the element IMAGE_DIRECTORY_ENTRY_IMPORT (= 1). And read the RVA address of the table. Here is a general outline of the path that needs to be done:



Then from RVA we get RAW, in accordance with the formulas given above, and then “step” through the file. Now we are right next to the array of structures called IMAGE_IMPORT_DESCRIPTOR. A sign of the end of the array is the “zero” structure.

typedef struct _IMAGE_IMPORT_DESCRIPTOR {
               	union {
                             	DWORD   Characteristics;
			     	DWORD   OriginalFirstThunk; 
			} DUMMYUNIONNAME;
			DWORD   TimeDateStamp;
			DWORD   ForwarderChain;
			DWORD   Name;
			DWORD   FirstThunk;
		} IMAGE_IMPORT_DESCRIPTOR,*PIMAGE_IMPORT_DESCRIPTOR;

I could not fetch a link to the description of the structure on msdn, but you can watch it in the WINNT.h file. Let’s get started.

OriginalFirstThunk : DWORD - RVA import name table (INT).
TimeDateStamp : DWORD - date and time.
ForwarderChain : DWORD - index of the first forwarded character.
Name : DWORD - RVA line with the name of the library.
FirstThunk : DWORD - RVA Import Address Table (IAT).

Everything here is somewhat similar to export. Also a table of names (INT) and also rags on itaddresses (IAT). Also an RVA named library. Only now, INT and IAT refer to an array of IMAGE_THUNK_DATA structures. It is presented in two forms - for the 64th and 32nd systems and differ only in the size of the fields. Consider the x86 example:

typedef struct _IMAGE_THUNK_DATA32 {
                		union {
                	         DWORD ForwarderString;
                	         DWORD Function;
                	         DWORD Ordinal;
                	         DWORD AddressOfData;
                	     } u1;
                	 } IMAGE_THUNK_DATA32,*PIMAGE_THUNK_DATA32;

It is important to answer that further actions depend on the high bit of the structure. If it is set, then the remaining bits are the number of the imported symbol (import by number). Otherwise (the most significant bit is cleared), the remaining bits specify the RVA of the character to be imported (import by name). If we have import by name, then the pointer stores the address of the following structure:

typedef struct _IMAGE_IMPORT_BY_NAME {
    WORD    Hint;
    BYTE    Name[1];
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;

Here, Hint is the function number, and Name is the name.

What is all this for? All these arrays, structures ... Let's take a look at the remarkable exelab scheme :



What is happening here ... The OriginalFirstThunk field refers to the array where information on imported functions is stored. The FirstThunk field refers to a similar array of the same dimension, but it is populated with effective addresses of functions during loading. Those. the bootloader analyzes OriginalFirstThunk , determines the real address of the function for each of its elements, and puts this address in FirstThunk . In other words, the binding of the imported characters occurs.

“-I don't like this expedition!” I do not like these sailors! And in general ... what? !!! Oh yes! Not! I don’t like anything at all, sir! ”
(Treasure Island - Captain Smollett)

Overboard




The article presented only the base for executable files. Other types of imports are not affected, behavior in cases of conflict (for example, the physical size of the section is larger than the virtual one) or ambiguous (in the same import, it is a question of which method to resort to) situations. But this is all for a more detailed study and depends on the specific loaders in the OS and the compilers that compiled the program. Also not affected are directories of resources, debugging and others. Those who are interested, you can read more detailed manuals presented in the list of references at the end of the article.

“Tell me, ham, how long will we wag like a maritime boat?” I’m sick and tired of the captain. Stop commanding him! I want to live in his cabin. ”
(Treasure Island)

Conclusion




After we returned from the trip, I will summarize a bit what we saw and what we endured. Today we understood a lot . Namely, I will describe the process of downloading the application in general terms.

  • First, the headers are read and checked that the file is executable. Otherwise, the work stops before it starts.
  • The loader allocates the required amount of virtual memory for the application. If possible, the application will be downloaded at the preferred address. If not, then another piece of memory will be allocated for the application and loaded from this address.
  • Затем для каждой секции вычисляется её адрес в виртуальной памяти (относительно базового адреса загрузки) и требуемый размер. После чего для данной области устанавливаются атрибуты и секция выгружается в память.
  • Если базовый адрес отличается от предпочтительного, то происходит настройка адресов.
  • Выполняется анализ таблицы импорта и подтягивются необходимые dll. Затем происходит процесс связывания.

This article ends. I think this information is enough to have a basic understanding of executable files. The most curious way to exelab , wasm , msdn , assembler and disassembler.

For a clearer understanding, it is recommended to study the diagrams. It really gives a more complete picture of what is happening inside. As an example, I can offer this article by alizar or more general schemes in Google . I hope you enjoyed our trip.

List of references


en.wikipedia.org/wiki/Portable_Executable
msdn.microsoft.com/en-us/library/ms809762.aspx
acmvm2.srv.mst.edu/wp-content/uploads/2014/07/PE-Header-Bible.pdf
cs .usu.edu.ru / docs / pe
Chris Kaspersky - Technique for debugging programs without source code
exelab.ru/faq

Also popular now: