What happens behind the scenes C #: the basics of working with the stack

I propose to look at all that is behind the simple lines of initializing objects, calling methods, and passing parameters. Well and, of course, the use of this information in practice is the subtraction of the stack of the calling method.

Disclaimer

Before proceeding with the story, I strongly recommend that you read the first post about StructLayout , because there is an example that will be used in this article.

All the code behind the high-level one is presented for the debug mode , it is he who shows the conceptual basis. Also, all of the above is considered for a 32 bit platform. JIT optimization is a separate and big topic that will not be covered here.

I would also like to warn that this article does not contain material that should be used in real projects.

We start with the theory

Any code eventually becomes a set of machine commands. Most understandable is their representation in the form of Assembly language instructions that directly correspond to one (or several) machine instructions.

Before turning to a simple example, I propose to get acquainted with what a software stack is. A software stack is primarily a section of memory that is used, as a rule, for storing all sorts of data (as a rule, they can be called temporary data ). It is also worth remembering that the stack grows towards smaller addresses. That is, the later an object is placed on the stack, the less will be its address.

Now let's look at what the next piece of code in Assembly language looks like (I’ve dropped some of the calls that are inherent in the debug mode).

C #:

public class StubClass 
{
    public static int StubMethod(int fromEcx, int fromEdx, int fromStack) 
    {
        int local = 5;
        return local + fromEcx + fromEdx + fromStack;
    }
    public static void CallingMethod()
    {
        int local1 = 7, local2 = 8, local3 = 9;
        int result = StubMethod(local1, local2, local3);
    }
}

Asm:

StubClass.StubMethod(Int32, Int32, Int32)
    1: push ebp
    2: mov ebp, esp
    3: sub esp, 0x10
    4: mov [ebp-0x4], ecx
    5: mov [ebp-0x8], edx
    6: xor edx, edx
    7: mov [ebp-0xc], edx
    8: xor edx, edx
    9: mov [ebp-0x10], edx
    10: nop
    11: mov dword [ebp-0xc], 0x5
    12: mov eax, [ebp-0xc]
    13: add eax, [ebp-0x4]
    14: add eax, [ebp-0x8]
    15: add eax, [ebp+0x8]
    16: mov [ebp-0x10], eax
    17: mov eax, [ebp-0x10]
    18: mov esp, ebp
    19: pop ebp
    20: ret 0x4
StubClass.CallingMethod()
    1: push ebp
    2: mov ebp, esp
    3: sub esp, 0x14
    4: xor eax, eax
    5: mov [ebp-0x14], eax
    6: xor edx, edx
    7: mov [ebp-0xc], edx
    8: xor edx, edx
    9: mov [ebp-0x8], edx
    10: xor edx, edx
    11: mov [ebp-0x4], edx
    12: xor edx, edx
    13: mov [ebp-0x10], edx
    14: nop
    15: mov dword [ebp-0x4], 0x7
    16: mov dword [ebp-0x8], 0x8
    17: mov dword [ebp-0xc], 0x9
    18: push dword [ebp-0xc]
    19: mov ecx, [ebp-0x4]
    20: mov edx, [ebp-0x8]
    21: call StubClass.StubMethod(Int32, Int32, Int32)
    22: mov [ebp-0x14], eax
    23: mov eax, [ebp-0x14]
    24: mov [ebp-0x10], eax
    25: nop
    26: mov esp, ebp
    27: pop ebp
    28: ret

The first thing you should pay attention to is the EBP and ESP registers and operations with them.

It is a common misconception among my friends that the EBP register is somehow related to a pointer to the top of the stack. I must say that it is not.

The pointer to the top of the stack is the ESP register . Correspondingly, with each PUSH command (putting a value at the top of the stack) the value of this register is decremented (the stack grows towards smaller addresses), and with each POP operation it is incremented. Also the CALL command enters the return address on the stack, thereby decrementing the value of the ESP register. In fact, the change of the ESP register is performed not only when these instructions are executed (for example, even when interrupt calls are made, the same happens as when CALL instructions are executed ).

Consider StubMethod.

In the first line, the contents of the EBP register are saved (put on the stack). Before returning from a function, this value will be restored.

The second line stores the current value of the address of the top of the stack (the value of the ESP register is entered into EBP ). In this case, the EBP registeris a peculiar zero in the context of the current call. Addressing is done relative to it. Next, we move the top of the stack to as many positions as we need to store local variables and parameters (third row). Something like memory allocation for all local needs.

All of the above is called the function prologue.

After this, access to variables on the stack occurs through the stored EBP , which indicates the place where the variables of this particular method begin.
Next comes the initialization of local variables. Fastcall

reminder : in the native .net, the fastcall calling convention is used .
The agreement regulates the location and order of the parameters passed to the function.
With fastcall, the first and second parameters are transmitted via the ECX and EDX registers, respectively , the subsequent parameters are transmitted via the stack.

For non-static methods, the first parameter is implicit and contains the address of the object on which the method is called (this address).

In lines 4 and 5, the parameters that were passed through the registers (the first 2) are stored on the stack.

Next comes the cleaning of space on the stack for local variables and the initialization of local variables.

It is worth recalling that the result of the function is in the EAX register .

In lines 12-16, the addition of the desired variables occurs. I draw your attention to line 15. There is a call to an address greater than the beginning of the stack, that is, to the stack of the previous method. Before calling, the caller pushes a parameter to the top of the stack. Here we read it. The result of the addition is taken from the EAX register and pushed onto the stack. Since this is the return value of the StubMethod, it is placed again in EAX . Of course, such absurd sets of instructions are inherent only in the debug mode, but they show exactly how our code looks like without a smart optimizer that performs the lion’s share of the work.

In lines 18 and 19, the previous EBP (calling method) and the pointer to the stack top are restored (at the time the method is called).

The last line returns. About the value 0x4 I will tell a little lower.
This sequence of commands is called the function epilogue.

Now let's take a look at CallingMethod. Let's go straight to line 18. Here we put the third parameter on the top of the stack. Please note that we do this using the PUSH instruction , that is, the ESP value is decremented. The other 2 parameters are placed in registers ( fastcall ). Next comes the StubMethod method call. And now let's remember the instruction RET 0x4. Here the following question is possible: what is 0x4? As I mentioned above, we pushed the parameters of the called function onto the stack. But now we do not need them. 0x4 indicates how many bytes need to be cleared from the stack after the function call. Since the parameter was one, you need to clear 4 bytes.

Here is an example image of the stack:

Thus, if we turn around and see what lies behind the stack, immediately after the method call, the first thing we will see is an EBP stack pushed on the stack (in fact, it was the first line of the current method). Next will be the return address, which says where execution will continue ( RET is usedinstruction). And through these fields we will see the parameters of the current function themselves (Starting from the 3rd, the parameters are passed through registers before). And behind them the stack of the calling method hides!
The first and second fields mentioned explain the offset in + 0x8 when referring to parameters.
Correspondingly, the parameters must lie at the top of the stack in a strictly defined order when calling a function. Therefore, before calling the method, each parameter is pushed onto the stack.
But what if they do not push, and the function will still take them?

Small example

So, all the above facts have caused me an overwhelming desire to read the stack of the method that will call my function. The idea that literally in one position from the third argument (it will be closest to the stack of the calling method) is the cherished data that I so want to receive, did not let me sleep.

Thus, to read the stack of the calling method, I need to climb a little further than the parameters.

When referring to parameters, the calculation of the address of a particular parameter is based only on the fact that the caller has pushed them all onto the stack.

But the implicit transfer via the EDX parameter (who cares is the previous article ) suggests that we can outsmart the compiler in some cases.

The tool with which I did it is called StructLayoutAttribute (features in the first article ). // Someday I will learn something other than this attribute, I promise.

We use the same favorite method with reference types.

At the same time, if overlapping methods have a different number of parameters, we get that the compiler does not push the required ones onto the stack (at least, because it does not know which ones).
However, the method that is actually called (with the same offset from a different type), turns into positive addresses relative to its stack, that is, those where it plans to find the parameters.

But there he does not find them and begins to read the stack of the calling method.

Spoiler code

using System;
using System.Runtime.InteropServices;
namespace Magic
{
    public class StubClass
    {
        public StubClass(int id)
        {
            Id = id;
        }
        public int Id;
    }
    [StructLayout(LayoutKind.Explicit)]
    public class CustomStructWithLayout
    {
        [FieldOffset(0)]
        public Test1 Test1;
        [FieldOffset(0)]
        public Test2 Test2;
    }
    public class Test1
    {
        public virtual void Useless(int skipFastcall1, int skipFastcall2, StubClass adressOnStack)
        {
            adressOnStack.Id = 189;
        }
    }
    public class Test2
    {
        public virtual int Useless()
        {
            return 888;
        }
    }
    class Program
    {
        static void Main()
        {
            Test2 objectWithLayout = new CustomStructWithLayout
            {
                Test2 = new Test2(),
                Test1 = new Test1()
            }.Test2;
            StubClass adressOnStack = new StubClass(3);
            objectWithLayout.Useless();
            Console.WriteLine($"MAGIC - {adressOnStack.Id}"); // MAGIC - 189
        }
    }
}

I will not give the assembly language code, everything is pretty clear, but if I have questions, I will try to answer them in the comments.

I understand perfectly well that this example cannot be used in practice, but in my opinion, it can be very useful for understanding the general scheme of work. .

Tags:

What happens behind the scenes C #: the basics of working with the stack

Disclaimer

We start with the theory

Small example

Also popular now: