We write the debugger under Windows [part 2]

Original author: Ajay Vijayvargiya
  • Transfer
  • Tutorial


Be sure to read the first part if you have not done so yet. Otherwise, it will be difficult to understand the second part.

Foreword


This article is a continuation of the previous part, “Writing Your Windows Debugger.” It is very important that you read and understand it. Without a full understanding of what is written in the first part, you cannot understand this article and evaluate the entire material.
The only point left unmentioned in the previous article is that our debugger can only debug machine code. You cannot start debugging managed code. Maybe if there is a fourth part of the article, I will also discuss debugging and managed code in it.
I would like to show you some important aspects of debugging. They will include showing the source code and the call stack (callstack), setting breakpoints, entering the executable function (step into), attaching the debugger to the process, installing the system default debugger, and some others.

Task list:
  • Start debugging with main () function
  • Getting the start address of the process
  • Breakpoint settings at start address
  • Stop at breakpoint, cancel instruction
  • Stopping debugging and waiting for user actions
  • Continued debugging by user command

  • CDebuggerCore - debug interface class

  • Getting source codes and line numbers
  • Setting custom breakpoints
  • Code Trace (step-in, step-out, step-over)
  • Conditional Breakpoints
  • Debugging a running process
  • Disconnecting from a process, terminating or waiting?
  • Debugging a crashed process
  • Manually connecting a debugger


Let's start debugging!


So what do you do when you want to debug your program? Well, for the most part, we press F5 to start debugging applications and the Visual Studio debugger will stop the program at the places where you set breakpoints (including conditional ones). Clicking the Repeat button in the Debug Assertion Failed dialog box also opens the source code in the right place and stops execution. Calling DebugBreak or _asm int 3 does the same thing. And this is a small part of the “how to debug the application” options.
Rarely or from time to time, we start debugging from the very beginning by pressing F11 (step-into), and VS starts debugging with the main / wmain or WinMain / wWinMain functions (or with the _t prefix). Well, this is the logical start address of the process being debugged. I call it “logical” because it is not a real start address, which is also known as the entry point of the module. For console applications, this is the mainCRTStartup function, which then calls the main function and the Visual Studio debugger starts with main. Dll libraries can also have their own entry point. If you want to know a little more, read the / ENTRY flag information.
All this means that we must pause the program at the entry point of the application and allow the developer to continue debugging. Yes, I said “suspend” the execution of the program at the input point of the module - the process is already running and if we do not suspend it, it will end somewhere. The call stack (image below) will appear as soon as you press F11.

What do we need to do to pause the process at the entry point?
In a nutshell:
  1. Get process start address
  2. Change the instruction at this address - for example, replace it with a breakpoint instruction (_asm int 3)
  3. Process a program stop as soon as execution reaches this breakpoint, restore the original instruction
  4. Stop execution, show call stack, registers and source code, if possible
  5. Continue execution as requested by user

Only five points! But the task, in fact, is not an easy one.

Getting the start address of the process


The starting address of the entry point and the logical * entry point (main / WinMain function) is such a jungle! Before telling you in a nutshell about these concepts, let me give you a good idea about this. But the first thing you should understand: the first instruction at this address is the point where the program starts, and the debugger only works with this address.
* [This term was coined by me and is relevant only to this article!]
This is how the WinMain function looks in disassembled form in Visual Studio (with annotations):

You can switch to the same view by launching debugging, right-clicking and choosing “to disassembled code. " Code bytes are not displayed by default (highlighted in green), but you can enable them through the context menu.
Relax You do not need to understand the instructions in machine language or in any dialect of assembler! This is just for illustration. In the above example, 0x00978F10 is the start address, and 8B FF is the first instruction. We just need to replace it with a breakpoint instruction. We know that such an API function is called DebugBreak, but at such a low level we cannot use it. For x86, the breakpoint instruction is _asm int 3. It has a naming code of 0xCC (204).
It turns out that we just need to replace the value of byte 8B with SS and that's it! When the program starts, an EXCEPTION_DEBUG_EVENT exception will be thrown at this location with the code EXCEPTION_BREAKPOINT. We know that we did it and after that we handle this exception as we need. If you do not understand this paragraph, I ask you for the last time, first read the first part of the article [http://habrahabr.ru/post/154847/].
X86 instructions are not fixed length, but who cares? We do not need to look at how many bytes (1, 2, N) the instruction takes. We just change the first byte. The first byte of the instruction can be anything, not just 8B. But we must guarantee that as soon as the time comes to continue executing the program, we will restore the original byte.
A small remark for those who know everything and for those who do not know something. First, breakpoints are not the only way to stop program execution at the start address. A more appropriate alternative is a one-time breakpoint, which we will discuss later. Secondly, the CC instruction is not the only breakpoint instruction, but for us it is enough for the eyes.
There are some difficulties with getting the starting address, but to keep your interest, let me show you the C ++ code to get the starting address right away. The lpStartAddress member in the CREATE_PROCESS_DEBUG_INFO structure contains the start address. We can read this information while processing the very first debugging event:

// This is inside Debugger-loop controlled by WaitForDebugEvent, ContinueDebugEventswitch(debug_event.dwDebugEventCode)
{
   case CREATE_PROCESS_DEBUG_EVENT:
   {
        LPVOID pStartAddress = (LPVOID)debug_event.u.CreateProcessInfo.lpStartAddress;
        // Do something with pStartAddress to set BREAKPOINT.
   ...
...


Type CREATE_PROCESS_DEBUG_INFO :: lpStartAddress is LPTHREAD_START_ROUTINE, and I think you know what it is (a pointer to a function). But, as I said, there are some difficulties with the starting address. In short, this address is relative to where the application image was loaded in memory. To be more convincing, let me show you the dumpbin utility output with the / headers option:

dumpbin /headers DebugMe.exe
...
OPTIONALHEADERVALUES10B magic # (PE32)
            8.00 linker version
            A000 size of code
            F000 size of initialized data0 size of uninitialized data11767 entry point (00411767) @ILT+1890(_wWinMainCRTStartup)
           1000 base of code


This address (00411767) is stored in lpStartAddress during debugging of our application. But when I started debugging from under Visual Studio, the address of wWinMainCRTStartup was different from this (@ILT has nothing to do with it).
Thus, let me postpone the discussion of the intricacies of getting the start address and just use the GetStartAddress () function, whose code will be shown later. It will return the exact address where the breakpoint should be set!

Change instruction at start address to breakpoint instruction


Once we get the start address, changing the instruction at this point to a breakpoint (CC) is completely trivial. We need to do:
  1. Read one byte at this address and save it
  2. Write in its place byte 0xCC
  3. Clear instruction cache
  4. Continue Debugging

Now you have to ask two important questions:
  1. How to read, write and reset instructions?
  2. When do we do this?

Let me answer the second question first. We will read, write and reset instructions during the processing of the CREATE_PROCESS_DEBUG_EVENT event (or, at your discretion, at the time of EXCEPTION_BREAKPOINT). When the process starts to load, we get the real start address (I mean the CRT-Main address), read the first instruction at this address, save it and write 0xCC byte to this place. Then we call ContinueDebugEvent () on our debugger.
For a better understanding, let me show you the code:

DWORD dwStartAddress = GetStartAddress(m_cProcessInfo.hProcess, m_cProcessInfo.hThread);    
BYTE cInstruction;
DWORD dwReadBytes;
// Read the first instruction    
ReadProcessMemory(m_cProcessInfo.hProcess, (void*)dwStartAddress, &cInstruction, 1, &dwReadBytes);
// Save it!
m_OriginalInstruction = cInstruction;
// Replace it with Breakpoint
cInstruction = 0xCC;
WriteProcessMemory(m_cProcessInfo.hProcess, (void*)dwStartAddress,&cInstruction, 1, &dwReadBytes);
FlushInstructionCache(m_cProcessInfo.hProcess,(void*)dwStartAddress,1);


A little about the code:
• M_cProcessInfo is a member of our class, which is nothing more than PROCESS_INFORMATION, filled with the CreateProcess function.
• The GetStartAddress () function returns the start address of the process. For a UIC application, this is the address of the wWinMainCRTStartup () function;
• Then we call ReadProcessMemory to get the byte located at the start address and save its value
• After that, we write the breakpoint instruction (0xCC) at this address using the WriteProcessMemory function
• In conclusion, we call FlushInstructionCache so that the CPU reads the new instruction, and not any cached old one. The CPU, of course, may not cache the instruction, but you should always call FlushInstructionCache.
Note that ReadProcessMemory requires PROCESS_VM_READ privileges. In addition, WriteProcessMemory requires PROCESS_VM_READ | PROCESS_VM_OPERATION - All these permissions are granted to the debugger as soon as it passes the debug flag to CreateProcess. Thus, we do not need to do anything and reading / writing will always be successful (with valid memory addresses, of course!).

Processing breakpoint instructions and restoring the original instruction


As you know, the breakpoint statement (EXCEPTION_BREAKPOINT) is the type of exception that comes with the EXCEPTION_DEBUG_EVENT debugging event. We handle debugging events using the EXCEPTION_DEBUG_INFO structure. The code below will help you remember and understand:

// Inside debugger-loopswitch(debug_event.dwDebugEventCode)
{
   case EXCEPTION_DEBUG_EVENT:
   {
        EXCEPTION_DEBUG_INFO & Exception = debug_event.u.Exception; // Out of union// Exception.ExceptionCode would be the actual exception code.
...


The operating system will always send one breakpoint instruction to the debugger, which will indicate that the process is loading. That is why you can "set the breakpoint instruction at the start address" on the very first exception of the breakpoint. This ensures that all breakpoints after the first are yours.
No matter where your breakpoints are, you still need to ignore the first breakpoint event. Although debuggers, such as WinDbg, will show you this breakpoint, the Visual Studio debugger will ignore this breakpoint and start execution from the logical beginning of the program (main / WinMain, not CRT-Main).
Thus, the interrupt handling code will look like this:

// 'Exception' is the same variable declared aboveswitch(Exception.ExceptionRecord.ExceptionCode)
 {
 case EXCEPTION_BREAKPOINT:
  if(m_bBreakpointOnceHit) // Would be set to false, before debugging starts
  {
     // Handle the actual breakpoint event
  }
  else
  {
     // This is first breakpoint event sent by kernel, just ignore it.// Optionally display to the user that first BP was ignored.
     m_bBreakpointOnceHit = true; 
  }
  break;
...


You can also use the else part to set a breakpoint instead of setting it during a process start event. In any case, the main processing of the breakpoint event occurs in the if-part. We need to handle the breakpoint that we placed at the start address.
It becomes difficult and intriguing - concentrate, read carefully, sit down relaxed. If you did not have a break while you read this article, do it!
In simple words, the breakpoint event occurred where we placed it. Now we just interrupt the execution, show the call stack (and other useful information), return the original instruction and wait for any action from the user to continue debugging.
At the assembler or machine code level, when a breakpoint event was generated and sent to the debugger, the instruction was already executed, although it was only one byte in size. The instruction pointer has already moved to this byte.
Thus, in addition to writing the original instructions at our address, we also need to adjust the processor registers. We can get and set the registers of our particular process using the GetThreadContext and SetThreadContext functions. Both functions take a context structure. Strictly speaking, members of this structure depend on the processor architecture. Since this article is about x86 architecture, we will follow the same structure definition that can be found in the winnt.h header file.
Here's how we can get the flow context:

CONTEXT lcContext;
lcContext.ContextFlags = CONTEXT_ALL;
GetThreadContext(m_cProcessInfo.hThread, &lcContext);
 Окей, мы получили его. Что теперь?
В регистре EIP содержится адрес следующей инструкции для выполнения. Он представлен членом Eip структуры CONTEXT. Как я уже упоминал раннее, EIP продвинулся вперёд и мы должны вернуть его обратно. К счастью для нас, нам надо всего лишь переместить его ровно на один байт, так как инструкция точки останова равна по длине одному байту. Именно это и делает код ниже:
lcContext.Eip --; // Move back one byte
SetThreadContext(m_cProcessInfo.hThread, &lcContext);


EIP is the address at which the processor will read the next instruction and execute it. You must have the rights THREAD_GET_CONTEXT and THREAD_SET_CONTEXT to successfully perform these functions, and you already have them.
Let me briefly switch to another topic: restoring the original instructions! To write the original instruction in the running process, we must call WriteProcessMemory, followed by FlushInstructionCache. Here's how to do it:

DWORD dwWriteSize;
WriteProcessMemory(m_cProcessInfo.hProcess, StartAddress, &m_cOriginalInstruction, 1,&dwWriteSize);
FlushInstructionCache(m_cProcessInfo.hProcess,StartAddress, 1);


Original instructions restored. We can call ContinueDebugEvent. What have we done:
  1. GetThreadContext, reduce EIP by one, SetThreadContext.
  2. Restore original instructions
  3. Continue Debugging

Well, where's the call stack? Registers? Source? And when will the program end? All this will be without user interaction!

Stop execution, call stack, register values ​​and source code, if any


To display the call stack, we need to load the debug symbols that are stored in existing * .PDB files. A set of functions from DbgHelp.dll will help us download symbols, list the source code of files, trace the call stack and much more. And all this will be considered later.
To display the CPU registers, we just need to display the actual data from the CONTEXT structure. To display 10 registers as in the Visual Studio debugger (Debug -> Windows -> Registers or Alt + F5) you can use the following code:

CString strRegisters;
strRegisters.Format(
  L"EAX = %08X\nEBX = %08X\nECX = %08X\n"
  L"EDX = %08X\nESI = %08X\nEDI = %08X\n"
  L"EIP = %08X\nESP = %08X\nEBP = %08X\n"
  L"EFL = %08X",
  lcContext.Eax, lcContext.Ebx, lcContext.Ecx,
  lcContext.Edx, lcContext.Esi, lcContext.Edi,
  lcContext.Eip, lcContext.Esp, lcContext.Ebp,
  lcContext.EFlags
  );


And that’s it! Display this text in the appropriate window.
To pause the program until the user gives the appropriate command (Continue, Step-in, Stop Debugging and others), we should not call ContinueDebugEvent. Since the debug stream and the GUI stream are different, we simply ask the GUI stream to display the current information and freeze the debug stream until some kind of “event” comes, for example, from the user.
Confused? The word "event" is in quotation marks, since it is nothing more than an event generated by the CreateEvent function. To pause program execution, we call WaitForSingleObject (in the debugger thread). To resume the debugger thread, we simply call SetEvent from the GUI thread. Of course, depending on your preferences, you can use other technologies to synchronize streams. This item gives only a general idea of ​​the implementation of the function "suspend execution - continue execution".
Now, thanks to this reasoning, we can write the code logic:
  1. GetThreadContext, reduce EIP by one, SetThreadContext
  2. Return original instruction using WriteProcessMemory, FlushInstructionCache
  3. Display current register values
  4. Using the functions of symbolic information * .PDB files, display the source code and line number (if possible)
  5. Using the call stack trace functions and symbol information functions, get the call stack and display it
  6. Waiting for user response
  7. Execution of the event requested by the user (Continue, Step, Stop, ...)
  8. Call ContinueDebugEvent

Surprised? Fine! I hope you enjoy debugging!
One important point worth mentioning is that a thread that is being debugged may not be the primary thread being debugged, but it must call a breakpoint instruction. So far, we are still processing the first breakpoint event to pause program execution. But the eight steps that I have listed above will apply to all debugging events (from any debugged thread) that can pause program execution.
There are still minor difficulties with changing EIP. Let me tell you about the problem itself, and I will show you the solution to it later. The breakpoint can be set by the user, and we also replace the instructions at these addresses with CC (of course, keeping the original instructions). As soon as the execution of the program reaches the next breakpoint, we simply return the instruction and perform those 8 steps that I described above. In sufficient detail? Well, if we do this, then the program will be suspended in this place only once, and if we do not return the original instructions, we get complete confusion!

Anyway, let me continue!
Oh yes! Source! I know that you already want to know how to do it to death!
Any * .EXE and * .DLL image can have debugging information supplied with it in a * .PDB file. A little bit about it:
  • Debugging information will be available only if the / DEBUG flag was set on the linker during compilation. In Visual Studio, you can change this in the project properties (Linker-> Debugging -> Generate Debug Info).
  • The / DEBUG flag does not mean that the EXE / DLL will be built in the debug configuration. The _DEBUG / DEBUG preprocessor macros are executed at compile time. But the rest is already during linking.
  • This means that even in the Release configuration, the image may contain debugging information, and in the Debug configuration it may not.
  • A file with the * .PDB extension stores debugging information and usually has the name <program_name> .pdb, but it can be renamed using the linker options. The file contains all the information about the source code: functions, classes, types, and much more.
  • The linker places a small piece of * .PDB file information in the header of the EXE / DLL image. Since this information is placed in the header, this does not affect file performance, only the file size increases by several bytes / kilobytes.

To get debugging information, we have to use the Sym * functions inside DbgHelp.Dll. This library is the most important source-level debugging component. It also contains call stack trace functions and for obtaining image / exe / dll image information. To use them, you need to connect Dbghelp.h and DbgHelp.lib.
To get debugging information, you need to initialize the character handler for this process. Since our target process is debugee, we initialize it with the debugee identifier. To initialize the character handler, we need to call the SymInitialize function:

BOOL bSuccess = SymInitialize(m_cProcessInfo.hProcess, NULL, false);


The first parameter is the identifier of the running process, which requires symbolic information. The second parameter is the path where to look for the * .PDB file, separated by a semicolon. The third parameter says whether the character handler should automatically load characters for all modules or not.
Now the lines below make sense:

'Debugger.exe': Loaded 'C:\Windows\SysWOW64\msvcrt.dll', Cannot findoropen the PDB file
'Debugger.exe': Loaded 'C:\Windows\SysWOW64\mfc100ud.dll', Symbols loaded.


Visual Studio 2010 could not find the characters for msvcrt.dll. And the mfc100ud.dll library has its own debugging symbols, so Visual Studio was able to download them. In essence, this means that for MFC libraries, Visual Studio will display symbolic information, source code, class / function names, call stack, etc. To explicitly load the symbols for the respective libraries / exe files, we call the SymLoadModule64 / SymLoadModuleEx function.
Where and when should we call these functions? It took me a lot of time while I was trying to initialize and load debugging information before the debugging cycle (i.e. before any debugging event, but after CreateProcess). This did not work. This must be done when processing CREATE_PROCESS_DEBUG_EVENT. Since we refuse to automatically load characters from dependent modules, we need to call the SymLoadModule64 / Ex function for the newly loaded EXE file. For incoming events LOAD_DLL_DEBUG_EVENT, we also need to call this function. Depending on the settings of the module, we can either show debugging information to the user or not.
Below you can see an example of the debugging information loading code when processing the library loading event. The GetFileNameFromHandle function is described in the previous part of the article .

case LOAD_DLL_DEBUG_EVENT:
   {
    CStringA sDLLName;
    sDLLName = GetFileNameFromHandle(debug_event.u.LoadDll.hFile);
    DWORD64 dwBase = SymLoadModule64 (m_cProcessInfo.hProcess, NULL, sDLLName,
     0, (DWORD64)debug_event.u.LoadDll.lpBaseOfDll, 0);
    strEventMessage.Format(L"Loaded DLL '%s' at address %x.", 
                            sDLLName, debug_event.u.LoadDll.lpBaseOfDll);  
...


Of course, the similar code will be also at process loading. A small caveat: successful initialization of debug information and its successful loading does not mean that the source code will be available! We need to call SymGetModuleInfo64 to load information from * .PDB, if available. Here's how to do it:

// Code continues from above
IMAGEHLP_MODULE64 module_info;
module_info.SizeOfStruct = sizeof(module_info);
BOOL bSuccess = SymGetModuleInfo64(m_cProcessInfo.hProcess,dwBase, &module_info);
// Checkandnotifyif (bSuccess && module_info.SymType == SymPdb)
{
     strEventMessage += ", Symbols Loaded";
}
else
{
     strEventMessage +=", No debugging symbols found.";
}


I am very grateful to Jochen Kalmbach for his excellent article on stack tracing, which helped me in finding information about the source code and stack tracing.
When the symbol type is SymPdb, we have information about the source code. * .PDB contains only information about the source code, the source code itself (* .h and * .cpp files) must be available at the specified path! * .PDB contains character names, file names, line numbers, and more. Stack tracing (without reviewing the source code) is entirely possible if we have function names.
Finally, upon the arrival of the breakpoint event, we can get the call stack and show it. To do this, we need to call the StackWalk64 function. Below you can look at a stripped-down code sample that uses this function. Please, for a complete understanding, read the article by Jochen Kalmbach of which I spoke.

voidRetrieveCallstack(HANDLE hThread){
   STACKFRAME64 stack={0};
   // Initialize 'stack' with some required stuff.
   StackWalk64(IMAGE_FILE_MACHINE_I386, m_cProcessInfo.hProcess, hThread, &stack,
               &context, _ProcessMemoryReader, SymFunctionTableAccess64,
               SymGetModuleBase64, 0);
...


STACKFRAME64 is a data structure that contains the addresses from which call stack information is retrieved. As Jochen writes, for x86 we need to initialize this structure before calling the StackWalk64 function:

CONTEXT context;
 context.ContextFlags = CONTEXT_FULL;
 GetThreadContext(hThread, &context);
 // Must be like thisstack.AddrPC.Offset = context.Eip; // EIP - Instruction Pointerstack.AddrPC.Mode = AddrModeFlat;
 stack.AddrFrame.Offset = context.Ebp; // EBPstack.AddrFrame.Mode = AddrModeFlat;
 stack.AddrStack.Offset = context.Esp; // ESP - Stack Pointerstack.AddrStack.Mode = AddrModeFlat;


When calling StackWalk64, the first constant defines the type of machine that is x86. The next argument is the identifier of the process being debugged. The third is the thread identifier in which we will receive the call stack (not necessarily the main thread). The fourth parameter is the most important parameter for us. The fifth is the context of the structure, which has the necessary addresses for initialization. The _ProcessMemoryReader function is a function that we declared that does nothing but call ReadProcessMemory. The other two Sym * functions are from DbgHelp.dll. The last parameter is also a function pointer, but we do not need it.
To trace the call stack, you definitely need a loop until the trace ends. While questions such as: invalid call stack, endless call stack and some others are open, I decided to make it simple: call the function until the return address becomes NULL, or until StackWalk64 fails. The following shows how we will receive the call stack (getting function names will be a bit later):

BOOL bSuccess;
do
{
    bSuccess = StackWalk64(IMAGE_FILE_MACHINE_I386, ... ,0);
    if(!bTempBool)        
       break;
    // Symbol retrieval code goes here.// The contents of 'stack' would help determining symbols.// Which would put information in a vector.
}while ( stack.AddrReturn.Offset != 0 );


The debug symbol has several properties:
  • Module Name (exe or dll)
  • Symbol name - decorated or not decorated
  • Symbol type: function, class, parameter, local variable, etc.
  • The virtual address of the character

The stack trace also includes: the source file, line number, the first processor instruction on this line.

Although we don’t need the first processor instruction on this line, until we disassemble the code, we may need to move relative to the first instruction. This happens, at the source code level, several instructions are indicated on one line (for example, multiple function calls). So far, I'm omitting it.
Thus, we need: the name of the module, the name of the called function and the line number in order to form full-fledged stack data.
To get the name of the module corresponding to the address on the stack, we need to call SymGetModuleInfo64. If you recall, there is a similar function for loading information about the module - SymLoadModuleXX, which must be called before calling the SymGetModuleInfo64 function for the debugger to work correctly. The following code (which is written immediately after calling StackWalk64 in a loop), demonstrates obtaining information about the module at the specified address:

IMAGEHLP_MODULE64 module={0};
module.SizeOfStruct = sizeof(module);
SymGetModuleInfo64(m_cProcessInfo.hProcess, (DWORD64)stack.AddrPC.Offset, &module);
Переменная module.ModuleName будет содержать в себе имя модуля, без расширения или пути. Поле module.MoadedImageModule будет содержать в себе полное имя файла. Module.LineNumbers будет указывать, доступна ли информация по строкам или нет (1 – доступна). Там же есть ещё несколько полезных полей структуры.
После этого мы получаем имя функции для этого стека, используя функцию SymGetSymFromAddr64 или SymFromAddr. Первая функция возвращает информацию через структуру PIMAGEHLP_SYMBOL64, которая содержит в себе 6 полей, а вторая возвращает информацию (к слову, более подробную) через SYMBOL_INFO. Обе принимают четыре аргумента, из которых три одинаковы, и последний аргумент – указатель на структуру. Ниже показан пример первой функции:
IMAGEHLP_SYMBOL64 *pSymbol;
DWORD dwDisplacement;
pSymbol = (IMAGEHLP_SYMBOL64*)new BYTE[sizeof(IMAGEHLP_SYMBOL64)+MAX_SYM_NAME];
memset(pSymbol, 0, sizeof(IMAGEHLP_SYMBOL64) + MAX_SYM_NAME);
pSymbol->SizeOfStruct = sizeof(IMAGEHLP_SYMBOL64); // Required
pSymbol->MaxNameLength = MAX_SYM_NAME;             // Required
SymGetSymFromAddr64(m_cProcessInfo.hProcess, stack.AddrPC.Offset, 
                   &dwDisplacement, pSymbol); // Retruns true on success


A little bit about this strange code:
  • The character name may be a pancake variable. Thus, we need to allocate a sufficiently large buffer for this variable. The predefined macro MAX_SYM_NAME has a value of 2000.
  • The IMAGEHLP_SYMBOL64 structure may have different sizes in the DbgHlp.dll library and during compilation. Therefore, we must explicitly indicate its size during initialization (the standard mechanism of protection against different versions of structures - approx. Per.), SizeOfStruct must be initialized before we start using it. The purpose of MaxNameLength is pretty obvious.
  • The most important fields of the structure for us are: Name (a line ending with zero) and Address, which is the virtual address of the character (including the base address of the module).

Using SymFromAddr and initializing SYMBOL_INFO is pretty much the same, and I prefer to use this new feature. Although now it cannot give us any additional information, then I will explain all the fields of this structure as needed.
Finally, to complete the call stack, we need to get the source code path and line number. I remind you that the * .PDB file contains this information only if the loading of debugging characters was successful. Also, the PDB contains only information about the source code, not the source code.
To obtain information with a line number, we need to use SymGetLineFromAddr64 and get it through the IMAGEHLP_LINE64 structure. This function takes 4 arguments, and the first three coincide with the function described above. It is only necessary to initialize the structure with the correct size. It looks like this:

IMAGEHLP_LINE64 line;
line.SizeOfStruct = sizeof(line);
bSuccess = SymGetLineFromAddr64(m_cProcessInfo.hProcess, 
                               (DWORD)stack.AddrPC.Offset, 
                               &dwDisplacement, &line);
if(bSuccess)
{
   // Use line.FileName, andline.LineNumber
}


Debugging symbol features or any other functions from DbgHlp.dll do not support downloading source code and displaying it. We need to do it ourselves. If the line information or source code file is not available, we will not be able to show the source code.
At the time of writing, I have not yet decided what will be displayed if the source code is not available. We can show a set of instructions, but x86 instructions are not fixed in size. We can show just a sequence of bytes (for example, "55 04 FF 76 78 AE ...") on one line. Or we can disassemble the instructions and show the result. Although I have a module for disassembling x86 code, it does not understand all the instructions.
At the moment, I have shown you important steps to stop debugging at a specific address. These include obtaining a base address, setting a breakpoint at that address, handling an interrupt event at a breakpoint, returning the original instruction, getting register values, a call stack, source code, and basic information on how to display debugging information in the UI. I also clarified that we need to use Windows events to interrupt and continue the suspended debugged application in accordance with the user's request.

Suspend a debugged application and resume its execution at the user's request


As mentioned earlier, to pause a debugged application, we simply do not call ContinueDebugEvent. The debugger architecture implies that all threads of the debugged application are also suspended.
Here is an example of pausing and resuming debugging (UT is the user thread, DT is the debugger thread):
[UT] The user initializes the debug thread using the user interface. This operation does not stop drawing the UI.
[DT] Initializes a debug event through CreateEvent.
[DT] The debugger launches the debugged application through CreateProcess and enters the debugging cycle.
[DT] The debugger reaches a breakpoint, displays information, and pauses execution.
[DT] The debugger uses WaitForSingleObject to keep the DT paused
[UT] The user performs any debugging actions (Continue, Stop and Enter the function)
[UT] The debugger calls the appropriate functions to resume execution and SetEvent to wake DT
[ DT] Continues execution, analyzes the user's actions and continues the debugging cycle or stops debugging.
Everything will be a little clearer when I show you the interface (class) of the debugging kernel. If you have a good memory and / or you are curious, you should have noticed that I did not say a couple more things:
Actual code for obtaining the base address (GetStartAddress! Function). I remind you, CREATE_PROCESS_DEBUG_INFO :: lpStartAddress is the starting address, but it is not always correct. And how to handle breakpoints that need to be stopped more than once? The code that is written now sets breakpoints that fire once, since they restore the original address for the program to function normally.
In any case, after I described the DbgHelp.dll and Sym * functions, I can show you the function of getting the start address of the process. The function name is SymFromName and it takes the name of the symbol, and returns SYMBOL_INFO. The previous similar function SymGetSymFromName64, which returns information through PIMAGEHLP_SYMBOL64. Using the code below, we can get the address of wWinMainCRTStartup using the SymFromName function:

DWORD GetStartAddress( HANDLE hProcess, HANDLE hThread ){
   SYMBOL_INFO *pSymbol;
   pSymbol = (SYMBOL_INFO *)new BYTE[sizeof(SYMBOL_INFO )+MAX_SYM_NAME];
   pSymbol->SizeOfStruct= sizeof(SYMBOL_INFO );
   pSymbol->MaxNameLen = MAX_SYM_NAME;
   SymFromName(hProcess,"wWinMainCRTStartup",pSymbol);
   // Store address, before deleting pointer  
   DWORD dwAddress = pSymbol->Address;
   delete [](BYTE*)pSymbol; // Valid syntax!return dwAddress;
}


Of course, it only extracts the wWinmainCRTStartup address, which may not be the starting point of the program. Well, this is beyond the scope of our article, such as determining whether an EXE file is beaten, whether it is 32-bit, compiled as unmanaged code, a Unicode or ANSI build, or the like.
What about custom breakpoints? A little later I will write about it.

CDebuggerCore - The debugging-interface class


I wrote an abstract class that has only a few pure virtual functions needed for debugging. Unlike the previous article, which was closely integrated into the user interface and MFC, I made this class independent of nothing. I used the native Windows identifiers, STL and CString class in this class. Please note that you can use CString from non-MFC applications through the <atlstr.h> connection. It is not necessary to compose the program with MFC libraries, just one header file is enough. If CString still bothers you, replace it with your favorite string class and that’s it.
Here is the basic skeleton of CDebuggerCore:

classCDebuggerCore
{
    HANDLE m_hDebuggerThread;    // ИHandle to debugger thread
    HANDLE m_heResumeDebugging;  // Set by ResumeDebugging
    PROCESS_INFORMATION m_cProcessInfo;
    // Other member not shown for now.public:
    // Asynchronous call to start debugging// Spawns a thread, that has the debugging-loop, which// calls following virtual functions to notify debugging events.intStartDebugging(const CString& strExeFullPath);
    // How the user responded to continue debugging, // it may also include stop-debugging.// To be called from UI-threadintResumeDebugging(EResumeMode);
    // Don't want to listen anything! Terminate!voidStopDebugging();
protected:
    // Abstract methods virtualvoidOnDebugOutput(const TDebugOutput&) = 0;
    virtualvoidOnDllLoad(const TDllLoadEvent&) = 0;
    virtualvoidOnUpdateRegisters(const TRegisters&) = 0;
    virtualvoidOnUpdateCallStack(const TCallStack&) = 0;
    virtualvoidOnHaltDebugging(EHaltReason) = 0;
};


Some readers may not like this class. But in order to explain how the debugger code works, I have to write the code itself!
Since this class is abstract, it must be the base for some other class and all virtual methods (On *) must be overloaded. Needless to say, these virtual functions are called from the main class depending on different debugging events. No virtual function requires a return value, so you can leave the implementation empty.
Suppose you created the CDebugger class inherited from CDebuggerCore and implemented all the virtual functions. Then you can start debugging with this code:

// In header, or at some persist-able location
CDebugger theDebugger;
// The point in code where you ask it to start debugging:
theDebugger.StartDebugging("Path to executable");


Which simply initializes the necessary variables to set the debug state, creates the event handler that I wrote about above, and starts the debugger stream. After that, the method ends - this means that StartDebugging is asynchronous.
In fact, the debugging loop is in the DebuggerThread method (it is not in this code). It runs the executable file through CreateProcess, enters the debugging cycle and controls its progress using the WaitForDebugEvent and ContinueDebugEvent functions. When any debugging event occurs, this method calls one of the corresponding On * virtual methods. For example, if the OUTPUT_DEBUG_STRING_EVENT event arrives, it calls OnDebugOutput with a string parameter. And for other debugging events, it calls the appropriate methods. And the inherited class already processes all events properly.
For some debugging events that interrupt debugging, such as a breakpoint event, the debugging loop will first call the appropriate On * method, and then call HaltDebugging with the appropriate code. This function is described as private for CDebuggerCore, and is declared as follows:

// Enumenum EHaltReason
{
     // Reason codes, like Breakpoint
};
// In CDebuggerCoreprivate:
   voidHaltDebugging(EHaltReason);


The description of this method is below:

void CDebuggerCore::HaltDebugging(EHaltReason eHaltReason)
{
   // Halt the debugging
   OnHaltDebugging(eHaltReason);
   // And wait for it, until ResumeDebugging iscalled, which would set the event
   WaitForSingleObject(m_heResumeDebugging,INFINITE);
}


Since the debugging loop knows exactly the exact reason for the stop, it passes it to HaltDebugging, which delegates it to OnHaltDebugging. Overriding OnHaltDebugging rests entirely with the developer, and he already decides how to process this or that event. The interface thread does not freeze, but waits for further user response. DT is paused.
With the correct UI, such as: menu, hot keys, etc. the UI thread calls ResumeDebugging with the appropriate resume mode (for example, “Continue program execution (Continue)”, “Enter the function (StepIp)” or “Stop debugging (Stop)”). The ResumeDebugging method, which takes an EResumeMode flag as an argument, sets a variable member of the class to this flag, then calls SetEvent to signal an event. This will continue to execute the debug thread.
Now that HaltDebugging has returned a value, the debug loop checks what action the user has taken. To do this, check the m_eResumeMode variable that was set by ResumeDebugging and continue debugging; or complete debugging if the appropriate code arrives. Just for example, EResumeMode looks something like this:

// What action was initiated by user to resume enumEResumeMode{
 Continue, // F5Stop,     // Shift+F5StepOver, // F10// More .. 
};


In the final part will be:
  • Getting source codes and line numbers
  • Setting custom breakpoints
  • Code Trace (step-in, step-out, step-over)
  • Conditional Breakpoints
  • Debugging a running process
  • Disconnecting from a process, terminating or waiting?
  • Debugging a crashed process
  • Manually connecting a debugger

Also popular now: