When this == null: a fictional story from the CLR world
I once had a chance to debug such a code in C #, which "out of the blue" fell from
Yes, here on this very line with property assignment fell
As it turned out - in a way, yes, I stopped . True, the compiler turned out to be not what it claims to be, and the checks are not guaranteed at all by runtime ... More details are under the cut.
For those who are not familiar with the specifics of C #, I will explain the chain of my thoughts. So, in the class
In a normal situation, this exception could mean that there is a given string in the string
And nevertheless, here he is, pointing to this line! We begin to doubt everything, including our own sanity, and write the following test program in C #:
Compile, execute - yes, the program crashes
Not really. (Rantime does not perform this check at all .) The compiler is not to blame for everything that happens, of course. Only here is not the C # compiler (which, obviously, complies with the laws and does not allow the method y to be called
Well, let's continue our experiments and write the same program in C ++ / CLI (for this you need to add a link to the assembly containing the class
Compile, run - bam! Drops
What is going on? We have in our hands two completely identical programs in different languages. We assume that they should be compiled into almost the same (or at least similar) bytecode if the compilers of both languages meet the CLI specifications. We begin to deal with the received bytecode. We take
The most interesting thing here is the line
Of the changes interesting to us, in addition to double zeroing the variable, here the method call is not through
The CIL instruction
The instruction
It turns out that the C # compiler just uses a feature of the instructionwild West undefined behavior: if the contents of the link are not defined, then the behavior of the program is also not defined. If the compiler knows that the method cannot be virtual, then it will not try to generate virtual calls.
Whether this behavior of the C # compiler affects performance, and if so, to what extent is an open question. In principle, in most cases, JIT should cope with the optimization and inlining of such code, if in fact the called methods are not virtual. In this regard, the C # compiler relies entirely on JIT and for its part does not make any optimization attempts.
In the context of the facts investigated, it is also interesting, for example, here is a fragment of the published class code
Now it becomes clear what the comment says (however, these comments were not always there), and under what conditions this check may work.
In several methods, the developers of the framework had to defend themselves against method calls
UPD: The user a553 correctly notes in the comments that, with this code, the developers, among other things, corrected a potential error in which the call
All the code used in the article is available on github .
NullReferenceException
: public class Tester {
public string Property { get; set; }
public void Foo() {
this.Property = "Some string"; // NullReferenceException
}
}
Yes, here on this very line with property assignment fell
NullReferenceException
. What’s the matter, I think - has the runtime stopped checking for an instance before calling the instance methods? As it turned out - in a way, yes, I stopped . True, the compiler turned out to be not what it claims to be, and the checks are not guaranteed at all by runtime ... More details are under the cut.
For those who are not familiar with the specifics of C #, I will explain the chain of my thoughts. So, in the class
Tester
there is an instance method Foo
and an instance property Property
. Somebody call the method Foo
, but in reference to this.Property
revealed a surprise that led to the generation of the runtime exception NullReferenceException
.In a normal situation, this exception could mean that there is a given string in the string
this == null
, and therefore the string this.Property = smth
cannot access the property. But for a C # programmer, this sounds completely impossible - because if a method was somehow called Foo
, then an instance of the class exists and this
cannot equal null
! How could you call method y null
? And nevertheless, here he is, pointing to this line! We begin to doubt everything, including our own sanity, and write the following test program in C #:
static class Program {
static void Main() {
Tester t = null;
t.Foo();
}
}
Compile, execute - yes, the program crashes
NullReferenceException
on the line t.Foo();
, but Foo
does not enter the method . This is what happens, under some conditions runtime forgot to perform a check on null
? Not really. (Rantime does not perform this check at all .) The compiler is not to blame for everything that happens, of course. Only here is not the C # compiler (which, obviously, complies with the laws and does not allow the method y to be called
null
), but the C ++ / CLI compiler, with the help of which the code was compiled, which called the method in the original way Foo
. Yes, the participation of C ++ / CLI in this story would immediately arouse a lot of suspicions, and I initially specially kept silent about this to make it more interesting :)Well, let's continue our experiments and write the same program in C ++ / CLI (for this you need to add a link to the assembly containing the class
Tester
):int main() {
Tester ^t = nullptr;
t->Foo();
}
Compile, run - bam! Drops
NullReferenceException
inside the method Foo
, just like in the original case. That is, the instance method was Foo
somehow called at the null reference, bypassing any checks. What is going on? We have in our hands two completely identical programs in different languages. We assume that they should be compiled into almost the same (or at least similar) bytecode if the compilers of both languages meet the CLI specifications. We begin to deal with the received bytecode. We take
ildasm
and parse the program code in C #. I give a complete listing of the method Program.Main
(in the comments I quoted the source code lines corresponding to the bytecode):.method private hidebysig static void Main() cil managed
{
.entrypoint
// Code size 11 (0xb)
.maxstack 1
.locals init ([0] class [Shared]ThisIsNull.Tester t)
IL_0000: nop
IL_0001: ldnull
IL_0002: stloc.0 // Tester t = null;
IL_0003: ldloc.0
IL_0004: callvirt instance void [Shared]ThisIsNull.Tester::Foo() // t.Foo()
IL_0009: nop
IL_000a: ret
}
The most interesting thing here is the line
IL_0004
. We see that the compiler called the method Foo
using the instruction callvirt
. Now compare with the corresponding C ++ / CLI code:.method assembly static int32 modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl)
main() cil managed
{
.vtentry 1 : 1
// Code size 12 (0xc)
.maxstack 1
.locals ([0] class [Shared]ThisIsNull.Tester t)
IL_0000: ldnull
IL_0001: stloc.0 // Tester ^t = nullptr;
IL_0002: ldnull
IL_0003: stloc.0 // t = nullptr;
IL_0004: ldloc.0
IL_0005: call instance void [Shared]ThisIsNull.Tester::Foo() // t->Foo();
IL_000a: ldc.i4.0
IL_000b: ret
}
Of the changes interesting to us, in addition to double zeroing the variable, here the method call is not through
callvirt
, but through call
. The CIL instruction
callvirt
is actually for virtual calls. However, it has one more small feature - since virtual calls are usually made in the CLI via the virtual method table, the responsibility of the instruction callvirt
is also to check the link to null
and throw an exception NullReferenceException
if something goes wrong. The instruction
call
simply calls the method without checking the links (and without using virtual dispatching mechanisms). It turns out that the C # compiler just uses a feature of the instruction
callvirt
and therefore generates it for all calls in general (except for static and explicit calls to methods of the base class through base.
) - just because it protects the code from calling the method at the null reference. At the same time, the C ++ / CLI compiler acts according to the good old laws of the Whether this behavior of the C # compiler affects performance, and if so, to what extent is an open question. In principle, in most cases, JIT should cope with the optimization and inlining of such code, if in fact the called methods are not virtual. In this regard, the C # compiler relies entirely on JIT and for its part does not make any optimization attempts.
In the context of the facts investigated, it is also interesting, for example, here is a fragment of the published class code
System.String
that once raised questions on StackOverflow : public bool Equals(String value) {
if (this == null) //this is necessary to guard against reverse-pinvokes and
throw new NullReferenceException(); //other callers who do not use the callvirt instruction
if (value == null)
return false;
if (Object.ReferenceEquals(this, value))
return true;
return EqualsHelper(this, value);
}
Now it becomes clear what the comment says (however, these comments were not always there), and under what conditions this check may work.
In several methods, the developers of the framework had to defend themselves against method calls
null
in this way. The fact is that string comparison in the method is EqualsHelper
implemented using a unsafe
-code, which may well try to access the memory at the zero address, which will surely lead to all kinds of bad consequences. UPD: The user a553 correctly notes in the comments that, with this code, the developers, among other things, corrected a potential error in which the call
((string)null).Equals(null)
could return false
rather than fall off NullReferenceException
as it should be.Conclusions:
- The CLI does not guarantee that
this != null
even when invoking instance methods and properties. - The C # compiler respects this rule when generating bytecode for C # code, but your code can also be called from other languages.
- In particular, the C ++ / CLI compiler does not comply with these rules and may well pass control to instance methods without defining the corresponding instance.
- It follows that your code can sometimes be called in context
this == null
for various reasons (code generation, reflection, compilers of other languages), and you need to be prepared for this. If you are developing a library intended for widespread use in an interop environment, it might even be worth adding tests fornull
public classes from externally accessible classes.
PS:
All the code used in the article is available on github .