Boxing and unboxing - which is faster?

Interested in the issue of the speed of packing and unpacking operations in .NET, I decided to publish my small and extremely subjective observations and measurements on this topic.
The sample code is available on github , so I invite everyone to report their measurement results in the comments.
Theory
The boxing packing operation is characterized by allocating memory in a managed heap for a value type object and further assigning a pointer to this memory location to a variable on the stack.
Unpacking unboxing , on the contrary, allocates memory in the execution stack for the object derived from a managed heap using the pointer.
It would seem that in both cases memory is allocated and there shouldn’t be much difference if it weren’t for one but- extremely important detail is the memory area.
Remembering that garbage collector (Garbage Collector) is responsible for allocating memory in .NET in a managed heap, it is important to note that it does this nonlinearly, due to its possible fragmentation (presence of free memory areas) and the search for the necessary free area of the required size.
Update:
As blanabrother noted in the comments, when allocating memory / copying the value in the managed heap, there is no process of searching for a free piece of memory and its possible fragmentation due to the incriminating pointer and its further compactification using GC. However, based on the following measurements of the speed of memory allocation in C ++, I dare to assume that the area (type) of memory is the main reason for this difference in performance.
In the case of unpacking, memory is allocated in the execution stack, which contains a pointer to its end, which in combination is the beginning of a piece of memory for a new object.
The conclusion from this I make is that the packing process should take much longer than unpacking, due to possible side effects associated with the GC and the slow speed of memory allocation / copying values in the managed heap.
Practice
To verify this statement, I sketched 4 small functions: 2 for boxing and 2 for unboxing types int and struct.
public class BoxingUnboxingBenchmark {
private long LoopCount = 1000000;
private object BoxedInt = 1;
private object BoxedStruct = new ExampleStruct {
Amount = 1000,
Currency = "RUB"
};
[Benchmark]
public object BoxingInt() {
int unboxed = 1000;
for (var i = 0; i < LoopCount; i++) {
BoxedInt = (object) unboxed;
}
return BoxedInt;
}
[Benchmark]
public int UnboxingInt() {
int unboxed = 1000;
for (var i = 0; i < LoopCount; i++) {
unboxed = (int)BoxedInt;
}
return unboxed;
}
[Benchmark]
public object BoxingStruct() {
ExampleStruct unboxed = new ExampleStruct()
{
Amount = 1000,
Currency = "RUB"
};
for (var i = 0; i < LoopCount; i++) {
BoxedStruct = (object) unboxed;
}
return BoxedStruct;
}
[Benchmark]
public ExampleStruct UnBoxingStruct() {
ExampleStruct unboxed = new ExampleStruct();
for (var i = 0; i < LoopCount; i++) {
unboxed = (ExampleStruct) BoxedStruct;
}
return unboxed;
}
}To measure performance, the BenchmarkDotNet library was used in Release mode (I will be glad if DreamWalker tells me how to make these measurements more objective). The following is the measurement result:


I must say right away that I can’t be firmly convinced of the absence of optimizations by the compiler of the final code, however, judging by the IL code, each of the functions contains a singular test operation.
The measurements were carried out on several machines with different numbers of LoopCount, however, the unpacking speed from time to time exceeded the packaging by 3-8 times .
.method public hidebysig instance object
BoxingInt() cil managed
{
.custom instance void [BenchmarkDotNet.Core]BenchmarkDotNet.Attributes.BenchmarkAttribute::.ctor() = ( 01 00 00 00 )
// Code size 43 (0x2b)
.maxstack 2
.locals init ([0] int32 unboxed,
[1] int32 i)
IL_0000: ldc.i4 0x3e8
IL_0005: stloc.0
IL_0006: ldc.i4.0
IL_0007: stloc.1
IL_0008: br.s IL_001a
IL_000a: ldarg.0
IL_000b: ldloc.0
IL_000c: box [mscorlib]System.Int32
IL_0011: stfld object ConsoleApp1.BoxingUnboxingBenchmark::BoxedInt
IL_0016: ldloc.1
IL_0017: ldc.i4.1
IL_0018: add
IL_0019: stloc.1
IL_001a: ldloc.1
IL_001b: conv.i8
IL_001c: ldarg.0
IL_001d: ldfld int64 ConsoleApp1.BoxingUnboxingBenchmark::LoopCount
IL_0022: blt.s IL_000a
IL_0024: ldarg.0
IL_0025: ldfld object ConsoleApp1.BoxingUnboxingBenchmark::BoxedInt
IL_002a: ret
} // end of method BoxingUnboxingBenchmark::BoxingInt
.method public hidebysig instance valuetype ConsoleApp1.ExampleStruct
UnBoxingStruct() cil managed
{
.custom instance void [BenchmarkDotNet.Core]BenchmarkDotNet.Attributes.BenchmarkAttribute::.ctor() = ( 01 00 00 00 )
// Code size 40 (0x28)
.maxstack 2
.locals init ([0] valuetype ConsoleApp1.ExampleStruct unboxed,
[1] int32 i)
IL_0000: ldloca.s unboxed
IL_0002: initobj ConsoleApp1.ExampleStruct
IL_0008: ldc.i4.0
IL_0009: stloc.1
IL_000a: br.s IL_001c
IL_000c: ldarg.0
IL_000d: ldfld object ConsoleApp1.BoxingUnboxingBenchmark::BoxedStruct
IL_0012: unbox.any ConsoleApp1.ExampleStruct
IL_0017: stloc.0
IL_0018: ldloc.1
IL_0019: ldc.i4.1
IL_001a: add
IL_001b: stloc.1
IL_001c: ldloc.1
IL_001d: conv.i8
IL_001e: ldarg.0
IL_001f: ldfld int64 ConsoleApp1.BoxingUnboxingBenchmark::LoopCount
IL_0024: blt.s IL_000c
IL_0026: ldloc.0
IL_0027: ret
} // end of method BoxingUnboxingBenchmark::UnBoxingStruct