Intercepting API function calls

    - Dad, I ran for a trolley bus and saved five cents!
    - Son, I would run after a taxi - I would save five rubles!


    Today I want to tell you how to save 10 thousand dollars. And at the same time, it’s much less interesting to learn how to intercept calls to Win32 API functions, and not only. Although, in the first place - of course, it is them.


    Key Points


    There are exactly two well-known methods for intercepting API functions; all the rest are their variations.

    The idea of ​​the first method is based on the fact that calls to any functions in the process from third-party DLLs are performed through the function import table. This table is filled in when the DLL is loaded into the process and the addresses of all imported functions that the process may need are written in it. Accordingly, in order to intercept an API function call, it is necessary to find the import table, in it - the function we want to intercept, save the address stored there (the same pointer to the function body) in some variable (in order to be able to independently call original), and then put a pointer to your function there. Naturally, this must be done for each module (exe or dll) that is in the process, since each of them has its own import table. In addition, to implement the interception of functions, which are called using the late binding mechanism, should be similarly implemented in the export table of the module that exports this function (this time, only one), and make a similar replacement. After that, you should prohibit unloading your DLL for the time of interception (for example, DllCanUnloadNow should return false, or make an extra Lock) so that the dll is not unloaded during operation, the interception address does not become invalid and you do not receive access violation with all the consequences.

    This method, in principle, has been repeatedly described in the relevant literature, and ready-made implementations can be found, for example, on RSDN [1] , [2] . Therefore, we will not dwell on it.

    The second method is much more interesting - intercepting a function through code injection. His idea is also quite primitive, and has been repeatedly described. All we need to do is to erase the first few bytes of the original function code, insert the instruction for unconditional transition to our intercept function there, perform the necessary processing, after which, if we need to call the original function, first execute the code of the erased function start, and then do unconditional transition to the body of the original function, skipping, naturally, the erased beginning.

    It sounds simple enough, however, for a person who has written all his life in high-level languages, it can become an insoluble task. The problem is further complicated by the fact that there are no ready-made implementations of this method in view of certain problems, which I will discuss a bit later. Although ... of course, I am slightly disingenuous. Microsoft has a whole framework dedicated to solving this particular problem. It is called Microsoft Detours [3] , it is easy to google and costs 10 thousand dollars for the commercial version.

    Naturally, for that kind of money they will only buy it if that’s very necessary. And if you don’t need it very much, but I want it, then my implementation of the second method, which I will now describe here, will do. Naturally, this implementation is far from universal, but some features of the Win32 API allow it to work in our applications and successfully replace the expensive framework.

    Code injection method implementation



    Let's start from the very beginning. We will prepare for ourselves a small test bench at which we will verify the success of our actions. This will be a console project in C ++. For development, I will use MS Visual Studio 2010 BETA, and you can adjust my actions depending on the IDE used.

    Copy Source | Copy HTML
    1. int _tmain(int argc, _TCHAR* argv[])
    2. {
    3.         HANDLE hFile = CreateFile(L"d:\\test.txt", GENERIC_WRITE,  0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    4.         CloseHandle(hFile);
    5.         return  0;
    6. }


    Our task will be to intercept the CreateFile and CloseHandle functions.

    So, let's start from the very beginning. Run the program by setting breakpoint to the CreateFile function. As soon as the program stops, select Go To Disassembly from the context menu of our code. And this is what we will see there.

    HANDLE hFile = CreateFile(L"d:\\test.txt", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    01138A8B mov esi,esp
    01138A8D push 0
    01138A8F push 80h
    01138A94 push 2
    01138A96 push 0
    01138A98 push 0
    01138A9A push 40000000h
    01138A9F push offset string L"d:\\test.txt" (11415B0h)
    01138AA4 call dword ptr [__imp__CreateFileW@28 (114527Ch)]
    01138AAA cmp esi,esp
    01138AAC call @ILT+1000(__RTC_CheckEsp) (11313EDh)
    01138AB1 mov dword ptr [hFile],eax


    Now, by pressing F10, we get to the call dword ptr [__imp__CreateFileW @ 28 (114527Ch)] statement - this, in fact, is a function call, and press F11. We will fall into the body of the CreateFile function.

    76D60B7D mov edi,edi
    76D60B7F push ebp
    76D60B80 mov ebp,esp
    76D60B82 push ecx
    76D60B83 push ecx
    76D60B84 push dword ptr [ebp+8]
    76D60B87 lea eax,[ebp-8]
    76D60B8A push eax
    76D60B8B call dword ptr ds:[76D11568h]


    So what do we see here?

    The first command is mov edi, edi is nothing more than a double-byte nop (not an operation). The meaning of this command is to gobble up one processor cycle without doing anything. Well, at the same time, take two bytes in the code. It would seem that wastefulness, however, the presence of this instruction is very beneficial to us.

    The next two commands occupy three bytes and they do the following. The esp register, as you know, points to the top of the stack, which stores all the parameters passed to the function through the push instruction. At the top of the esp register (in assembler this address is written as [esp]) is the return point address, which is placed there with the call instruction (in our case, it will be 0x01138AAA), and then up the stack (the stack grows down, as you know) at the address [esp + 4] is the name of the file, [esp + 8] - opening options, and so on.

    The stack also stores local variables that are used by the function itself. If you look closely at the code, you will see two instructions.

    76D60B82 push ecx
    76D60B83 push ecx


    These instructions simply reserve 8 bytes on the stack, i.e. leave room for two variables of the DWORD type. This call is interpreted in this way, because the function uses the stdcall calling convention (that is, it passes parameters through the stack, and not through registers, such as fastcall), and the ecx register is a general-purpose register, and if the function did not put any into it Any values, then it can contain any garbage that was left there with the previous code. There is no sense in passing any junk data to a higher-level function, which is why we interpret this call in this way.

    However, after executing the push instruction, the top of the esp stack will move 4 bytes down, and [esp] will no longer point to the address of the return point, but to the garbage value just put there. That is, we will lose access to the variables passed to the function! This cannot be allowed, and therefore the following thing is done.

    76D60B7F push ebp
    76D60B80 mov ebp,esp


    The current value of the base register is stored in the stack, and the current beginning of the stack register is placed in the base register. Now we can address the variables passed to the function through the base register (at the address [ebp] we have the stored value of the stack register, [ebp + 4] is the address of the return point, [ebp + 8] is the file name, etc.) by freely manipulating the stack.

    This pair of instructions (push ebp / mov ebp, esp) is called a standard prolog and has its own mirror image - a standard epilogue that looks like this:

    pop ebp


    However, we will not find him here - he is replaced by the leave command, which does the same.

    76D60BC7 leave
    76D60BC8 ret 1Ch


    The last command is the return from the function with the extraction of 0x1c bytes from the stack, which is required by convention stdcall, when the function is required to clear the stack itself after the end of work.

    After analyzing the other API functions, we can understand that they all start exactly the same:

    mov edi,edi
    push ebp
    mov ebp,esp


    That is, in 99% of cases for us, for “household” interception, we are guaranteed at the beginning of the function 5 bytes, which we can easily replace with our own code, and then restore it somewhere else. This is good, so the size limit of our jump instruction can be 5 bytes. This is more than enough.

    So, now we have figured out how the function is called and are ready to intercept it. There remains one detail - but how to actually make an interception?

    To do this, all we need to do is put the jmp instruction at the beginning of the function with an address that will point to the beginning of our function. However, not all so simple. The fact is that the jmp instruction, which would take the absolute address of our function, of 5 bytes in size simply does not exist. The only jump that works with absolute addresses is jump far, which takes 6 bytes.

    Therefore, we will use jump near, which takes a relative address (that is, the difference between the address of the destination point and the statement following the jump near instruction). In fact, to calculate the parameter of the jump near operation, it is necessary to subtract the address of the starting point from the address of the destination point and add 5 bytes (this is how much this instruction takes).

    Copy Source | Copy HTML
    1. size_t _CalculateDispacement(void* lpFirst, void* lpSecond)
    2. {
    3.     return reinterpret_cast(lpSecond) - (reinterpret_cast(lpFirst) + 5);
    4. }


    Turning to the literature, we learn that the jump near function opcode is 0xe9. Thus, we can perform the interception as follows:

    Copy Source | Copy HTML
    1. HANDLE WINAPI _My_CreateFileW(LPCWSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurity, DWORD dwCreationDisp, DWORD dwFlags, HANDLE hTemplate)
    2. {
    3.     OutputDebugStringA(__FUNCTION__);
    4.     return (HANDLE)-1;
    5. }
    6.  
    7.  
    8. #pragma pack(push, 1)
    9. struct jump_near
    10. {
    11.     BYTE opcode; // 0xe9
    12.     DWORD relativeAddress;
    13. };
    14. #pragma pack(pop)
    15.  
    16. int _tmain(int argc, _TCHAR* argv[])
    17. {
    18.     HMODULE hKernel32 = GetModuleHandle(L"kernel32.dll");
    19.     jump_near* lpFunc = reinterpret_cast(GetProcAddress(hKernel32, "CreateFileW"));
    20.     lpFunc->opcode = 0xe9;
    21.     lpFunc->relativeAddress = _CalculateDispacement(lpFunc, &_My_CreateFileW);
    22.  
    23.     HANDLE hFile = CreateFile(L"d:\\test.txt", GENERIC_WRITE,  0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    24.     CloseHandle(hFile);
    25.     return  0;
    26. }


    In order to eliminate the spurious effects of optimization by the compiler of structures for aligning data sections, we will use the directive #pragma pack, which, in our case, aligns data by the byte boundary (that is, does not align at all).

    We start on execution, and ... op, access violation. The fact is that code pages, for protection against buffer overflows, are write-protected.

    However, not everything is so bad. They are protected from the outside, and we work from the inside, and therefore we can bypass this mechanism using the VirtualProtect function. Let's put a call before writing the opcode:

    DWORD dwProtect = PAGE_READWRITE;
    VirtualProtect(lpFunc, sizeof(jump_near), dwProtect, &dwProtect);


    And after the call:

    VirtualProtect(lpFunc, sizeof(jump_near), dwProtect, &dwProtect);


    We launch it and, voila, the interception is completed.

    Now, there is a second problem - we need to call the original function. For this, we must do the following:
    1. Save a pointer to the beginning of the function.
    2. Create a 10-byte block in memory with rights to execute the code (without them, when we try to execute the code, we will get access violation due to the implementation of the NX-Bit security system)
    3. Copy the first 5 bytes of the original function there before installing it there interceptor.
    4. Create a similar jump near instruction in the last 5 bytes, which will redirect the execution of the function to the original handler, skipping the 5 bytes that we have wiped.
    5. Save the address of the 10-byte block and cast it to the CreateFileWProc type, which is described as follows:
    typedef HANDLE (WINAPI * CreateFileWProc) (LPCWSTR, DWORD, DWORD, LPSECURITY_ATTRIBUTES, DWORD, DWORD, HANDLE);
    6. Now, if we need to call the original, we simply use this pointer.

    The code that implements this functionality in a more general case is available here:

    pastebin.com/5gZdr6Hm (Detours.h header file)
    pastebin.com/RCJ896TM (Detours.cpp implementation)

    I’ll briefly tell you how it works.

    We will include both files in our project, define a pair of interceptors and run the execution code from breakpoint to CreateFile.

    Copy Source | Copy HTML
    1. #include "Detours.h"
    2.  
    3. typedef HANDLE (WINAPI *CreateFileWProc)(LPCWSTR, DWORD, DWORD, LPSECURITY_ATTRIBUTES, DWORD, DWORD, HANDLE);
    4. typedef BOOL (WINAPI* CloseHandleProc)(HANDLE);
    5.  
    6. CreateFileWProc _Std_CreateFileW;
    7. CloseHandleProc _Std_CloseHandle;
    8.  
    9. HANDLE WINAPI _My_CreateFileW(LPCWSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurity, DWORD dwCreationDisp, DWORD dwFlags, HANDLE hTemplate)
    10. {
    11.     OutputDebugStringA(__FUNCTION__);
    12.     return _Std_CreateFileW(lpFileName, dwDesiredAccess, dwShareMode, lpSecurity, dwCreationDisp, dwFlags, hTemplate);
    13. }
    14.  
    15. BOOL WINAPI _My_CloseHandle(HANDLE handle)
    16. {
    17.     OutputDebugStringA(__FUNCTION__);
    18.     return _Std_CloseHandle(handle);
    19. }
    20.  
    21. int _tmain(int argc, _TCHAR* argv[])
    22. {
    23.     HMODULE hKernel32 = GetModuleHandle(L"kernel32.dll");
    24.     void* lpFunc = GetProcAddress(hKernel32, "CreateFileW");
    25.     Detours::HookFunction(lpFunc, _My_CreateFileW, reinterpret_cast(&_Std_CreateFileW));
    26.     lpFunc = GetProcAddress(hKernel32, "CloseHandle");
    27.     Detours::HookFunction(lpFunc, _My_CloseHandle, reinterpret_cast(&_Std_CloseHandle));
    28.     HANDLE hFile = CreateFile(L"d:\\test.txt", GENERIC_WRITE,  0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    29.     CloseHandle(hFile);
    30.     return  0;
    31. }


    Out of habit, let's go to Disassembly, get to the instructions

    000D8AD0 call dword ptr [__imp__CreateFileW@28 (0E527Ch)]


    And press F11.

    Where did we go?

    76D60B7D jmp _My_CreateFileW (0D13E8h)


    This is the transition code set by our interceptor. This means that the interception was successful!

    Press F10 a couple of times (jumping one more intermediate buffer, which the compiler puts in the DEBUG versions), and ...

    HANDLE WINAPI _My_CreateFileW(LPCWSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurity, DWORD dwCreationDisp, DWORD dwFlags, HANDLE hTemplate)
    {
    000D8910 push ebp
    000D8911 mov ebp,esp
    000D8913 sub esp,0C0h
    000D8919 push ebx
    000D891A push esi
    000D891B push edi
    000D891C lea edi,[ebp-0C0h]
    000D8922 mov ecx,30h
    000D8927 mov eax,0CCCCCCCCh
    000D892C rep stos dword ptr es:[edi]


    Now, the most interesting moment. We get to the call of the original function.

    000D8960 call dword ptr [_Std_CreateFileW (0E4240h)]


    And press F11.

    00060000 mov edi,edi
    00060002 push ebp
    00060003 mov ebp,esp
    00060005 jmp 76D60B82


    We fell into the so-called springboard - this is a piece of code that performs the operations we have replaced and transfers control to the original function. We get to jmp, press F10 and see a wonderful picture.

    76D60B7D jmp _My_CreateFileW (0D13E8h)
    76D60B82 push ecx
    76D60B83 push ecx


    This time we skipped the jmp instruction and immediately got to the first significant instruction - push ecx. So, everything works as it should.

    Potential issues and opportunities for modernization



    Unfortunately, the code is not universal - it determines the possibility of interception by the presence of a standard WinAPI prolog. It is difficult to build a universal solution - at the beginning of a function, in the general case, there can be absolutely any instructions, including instructions for indirect addressing, which will have to be adjusted during the transfer. Microsoft Detours solves this problem by having a table disassembler and an instruction corrector.

    In addition, if the size of the function is less than 5 bytes, interception is simply not possible. Such functions are sometimes found, but I have never come across tasks that require their interception. Microsoft Detours folds its paws in this case.

    To create a springboard function in memory, you cannot use the new and delete operators to work with dynamic memory, since memory is allocated in the data section with a ban on code execution, and by changing the rights to dynamic memory you open the door for enemies to overflow the buffer. Now the program works irrationally, allocating 4 KB of memory for each interceptor - this is due to the fact that this size is minimal for allocating virtual memory. In theory, you need to write your own memory manager and use it. MS Detours does just that.

    However, what I wrote is quite a working code, which will come in handy if you really need it, but there is no money. The absence of a table analyzer in it can be replaced with an analyzer of your own - for this you need to demopulate the required functions and analyze their code, then add the signatures to _Analyze. And 4 KB of memory per interceptor, if the program has 5-6 interceptors - not so much.

    References


    1. Barry Bray. Intel microprocessors. Architecture, programming and interfaces. Sixth Edition. "BHV-Petersburg", 2005

    Also popular now: