Nullable Reference Types in C # 8.0 and Static Analysis

    Picture 9


    It's no secret that Microsoft has been working on the release of the eighth version of C # for quite some time. In the recent release of Visual Studio 2019, a new version of the language (C # 8.0) is already available, but so far only as a beta release. The plans for this new version have several features, the implementation of which may not seem quite obvious, or rather, not quite expected. One of these innovations is the ability to use Nullable Reference types. The stated meaning of this innovation is the fight against Null Reference Exceptions (NRE).

    We are pleased that the language is developing and new features should help developers. Coincidentally, in our PVS-Studio analyzer for C #, the capabilities to detect exactly the same NRE in the code have relatively recently expanded. And we asked ourselves - is there any sense now for static analyzers in general, and for PVS-Studio in particular, to try to look for potential dereferencing of null references, if, at least in the new code using the Nullable Reference, such dereferencing will become "impossible" ? Let's try to answer this question.

    Pros and Cons of Innovation


    To begin with, it is worth recalling that in the latest beta version of C # 8.0, available at the time of this writing, Nullable Reference is turned off by default, i.e. the behavior of reference types will not change.

    What are nullable reference types in C # 8.0 if you include them? This is the same good old reference type, with the difference that variables of this type must now be marked with '?' (e.g. string? ), similar to how it is already done for Nullable, i.e. nullable significant types (e.g. int? ). However, now the same string without '?' already starting to be interpreted as a non-nullable reference, i.e. this is a reference type whose variable cannot contain null values .

    Null Reference Exception is one of the most annoying exceptions because it says little about the source of the problem, especially if there are several dereferences in a row in the method that throws the exception. The ability to prohibit passing null to a reference variable of type looks fine, but if earlier null was passed to the method , and some logic of further execution was tied to this, then what should I do now? Of course, you can instead of nullpass a literal, constant or simply “impossible” value, which according to the logic of the program cannot be assigned anywhere in this variable. However, the fall of the entire program can be replaced by further “silent” incorrect execution. It will not always be better than seeing the error right away.

    And if instead throw an exception? A meaningful exception in a place where something went wrong is always better than an NRE somewhere higher or lower on the stack. But it’s good if we are talking about our own project, where we can fix consumers and insert a try-catch block,and when developing a library using the (non) Nullable Reference, we assume the responsibility that some method always returns a value. And it’s not always even in the native code that it will be (at least simple) to substitute returning null for throwing an exception (too much code can be affected).

    You can enable Nullable Reference at the project level by adding the NullableContextOptions property with the enable value to it , or at the file level using the preprocessor directive:
    #nullable enable 
    string cantBeNull = string.Empty;
    string? canBeNull = null;
    cantBeNull = canBeNull!;

    Types will now be more visual. By the signature of the method, it is possible to determine its behavior, whether it contains a check for null or not, it can return null or cannot. Now, if you try to access a nullable reference variable without checking, the compiler will generate a warning.

    Quite convenient when using third-party libraries, but there is a situation with possible misinformation. The fact is that passing null is still possible, for example, using the new null-forgiving operator (!). Those. it’s just that with the help of a single exclamation point, you can break all further assumptions that will be made about an interface using these variables:
    #nullable enable 
    String GetStr() { return _count > 0 ? _str : null!; }
    String str = GetStr();
    var len = str.Length;

    Yes, it can be said that it is wrong to write this way, and no one will ever do that, but so long as this opportunity remains, it is no longer possible to fully rely only on the contract imposed by the interface of this method (that it cannot return null).

    And you can, by the way, write the same thing with the help of several operators!, Because C # now allows you to write like this (and this code is completely compiled):
    cantBeNull = canBeNull!!!!!!!;

    Those. we would like to further emphasize: pay attention - this can be null !!! (we on the team call this “emotional” programming). In fact, the compiler (from Roslyn), when building a syntax tree of code, interprets the operator! similar to simple brackets, so their number, as is the case with brackets, is unlimited. Although, if you write a lot of them, the compiler can be "dumped". Perhaps this will change in the final version of C # 8.0.

    In a similar way, you can bypass the compiler warning when accessing a nullable reference variable without checking:
    canBeNull!.ToString();

    You can write more emotionally:
    canBeNull!!!?.ToString();

    This syntax is actually difficult to imagine in a real project, putting a null-forgiving operator we tell the compiler: everything is fine here, no verification is needed. Adding an elvis operator we say: but in general it may not be normal, let's check.

    And now a legitimate question arises - why, if the concept of a non-nullable reference type implies that a variable of this type cannot contain nullCan we still record it so easily? The fact is that “under the hood”, at the level of IL code, our non-nullable reference type remains ... all the same “ordinary” reference type. And the whole nullability syntax is actually only an annotation for the static analyzer built into the compiler (and, in our opinion, not the most convenient analyzer, but more on that later). In our opinion, including the new syntax in the language only as an annotation for a third-party tool (even if it is built into the compiler) is not the most “beautiful” solution, because for a programmer using this language that this is just an annotation may not be obvious at all - after all, very similar syntax for nullable structures works in a completely different way.

    Coming back to how it is still possible to "break" Nullable Reference types. At the time of writing, if there are several projects in the solution, when passing from a method declared in one project a reference variable, for example of type String, to a method from another project where NullableContextOptions is enabled , the compiler will decide that it is already a non-nullable String, and will not give a warning. And this despite the great number of [Nullable (1)] attributes added to each field and class method in IL code when Nullable Reference's are turned on . By the way, these attributes should be taken into account if you are working with a list of attributes through reflection, counting on the existence of only those attributes that you added yourself.

    This situation can create additional problems when converting a large code base to a Nullable Reference. Most likely this process will be gradual, project by project. Of course, with a competent approach to change, you can gradually switch to a new functional, but if you already have a working draft, any changes in it are dangerous and undesirable (it works - don’t touch it!). That's why when using the PVS-Studio analyzer there is no need to edit the source code or somehow mark it up to detect potential NREs . To check the places where a NullReferenceException can occur ,you just need to start the analyzer and look at the warnings of the V3080. No need to change project properties or source code. No need to add directives, attributes or operators. No need to change your code.

    With the support of Nullable Reference types in the PVS-Studio analyzer, we faced a choice - should the analyzer interpret non-nullable reference variables as always non-zero values? After studying the issue of the possibilities to “break” this guarantee, we came to the conclusion that there is no — the analyzer should not make such an assumption. Indeed, even if non-nullable reference types are used everywhere in the project, the analyzer can supplement their use by just discovering situations in which a null value may appear in such a variable .

    How PVS-Studio looks for Null Reference Exceptions


    Dataflow mechanisms in the C # analyzer PVS-Studio monitor the possible values ​​of variables during the analysis. In particular, PVS-Studio also performs interprocedural analysis, i.e. It tries to determine the possible value returned by the method, as well as the methods called in this method, etc. Among other things, the analyzer remembers variables that can potentially be null . If in the future the analyzer sees dereferencing without checking such a variable, again, either in the current code being checked, or inside the method called in this code, warning V3080 about a potential Null Reference Exception will be issued.

    At the same time, the main idea underlying this diagnostics is that the analyzer will swear only if it saw the assignment null somewhereinto a variable. This is the main difference between the behavior of this diagnostic and the analyzer built into the compiler that works with Nullable Reference types. The analyzer built into the compiler will swear at any dereference of an unverified nullable reference variable of the type, unless, of course, this analyzer is “tricked” by the operator! in any other way, absolutely any analyzer can be used, especially if you set yourself such a goal, and PVS-Studio is no exception).

    PVS-Studio swears only if it sees null(in a local context, or coming from a method). At the same time, even if the variable is a non-nullable reference variable, the behavior of the analyzer will not change - it will still swear if it sees that null was written to it. This approach seems to us more correct (or, at least, convenient for the analyzer user), since it does not require "coating" all code with null checksto find potential dereferences - this could have been done before, without the Nullable Reference, for example, with the same contracts. In addition, the analyzer can now be used for additional control over the same non-nullable reference variables. If they are used "honestly", and they are never assigned null - the analyzer will remain silent. If null is assigned and the variable is dereferenced without checking, the analyzer warns about this with message V3080:
    #nullable enable 
    String GetStr() { return _count > 0 ? _str : null!; }
    String str = GetStr();
    var len = str.Length; <== V3080: Possible null dereference. 
                                     Consider inspecting 'str'


    Let us consider some examples of such triggering of V3080 diagnostics in the code of Roslyn itself. We checked this project not so long ago , but this time we will consider only potential Null Reference Exception triggers that were not in previous articles. Let's see how the PVS-Studio analyzer can find potential dereferencing of null references, and how these places can be fixed using the new Nullable Reference syntax.

    V3080 [CWE-476] Possible null dereference inside method. Consider inspecting the 2nd argument: chainedTupleType. Microsoft.CodeAnalysis.CSharp TupleTypeSymbol.cs 244
    NamedTypeSymbol chainedTupleType;
    if (_underlyingType.Arity < TupleTypeSymbol.RestPosition)
      { ....  chainedTupleType = null; }
    else { .... }
    return Create(ConstructTupleUnderlyingType(firstTupleType,
      chainedTupleType, newElementTypes), elementNames: _elementNames);

    As you can see, the chainedTupleType variable can be null in one of the code execution branches. Then chainedTupleType is passed inside the ConstructTupleUnderlyingType method , and is used there with verification through Debug.Assert . This situation is very common in Roslyn, however, it is worth remembering that Debug.Assert is deleted in the release version of the assembly. Therefore, the analyzer still considers dereferencing inside the ConstructTupleUnderlyingType method to be dangerous. Next, we give the body of this method, where dereferencing occurs:
    internal static NamedTypeSymbol ConstructTupleUnderlyingType(
      NamedTypeSymbol firstTupleType, 
      NamedTypeSymbol chainedTupleTypeOpt, 
      ImmutableArray elementTypes)
    {
      Debug.Assert
        (chainedTupleTypeOpt is null ==
         elementTypes.Length < RestPosition);
      ....
      while (loop > 0)
      {   
        ....
        currentSymbol = chainedTupleTypeOpt.Construct(chainedTypes);
        loop--;
      }
      return currentSymbol;
    }

    Whether the analyzer should take such Assert into account is actually a moot point (some of our users want it to do this), because the contracts from System.Diagnostics.Contracts, for example, the analyzer now takes into account. I will tell you only a small example from our actual use of the same Roslyn in our analyzer. Recently, we supported the new version of Visual Studio , and at the same time updated the Roslyn analyzer to version 3. After that, the analyzer began to fall when checking a certain code on which it had not previously crashed. At the same time, the analyzer began to fall not inside our code, but inside the code of Roslyn itself - to fall with a Null Reference Exception. And further debugging showed that in the place where Roslyn now falls, exactly a couple of lines above,Debug.Assert . And she, as we see, did not save.

    This is a very good example of problems with the Nullable Reference , because the compiler considers Debug.Assert a valid check in any configuration. That is, if you simply enable #nullable enable and mark out the chainedTupleTypeOpt argument as a nullable reference , there will be no compiler warnings at the dereference location in the ConstructTupleUnderlyingType method .

    Consider the following PVS-Studio triggering example.

    V3080 Possible null dereference. Consider inspecting 'effectiveRuleset'. RuleSet.cs 146
    var effectiveRuleset = 
      ruleSet.GetEffectiveRuleSet(includedRulesetPaths);
    effectiveRuleset = 
      effectiveRuleset.WithEffectiveAction(ruleSetInclude.Action);
    if (IsStricterThan(effectiveRuleset.GeneralDiagnosticOption, ....))
       effectiveGeneralOption = effectiveRuleset.GeneralDiagnosticOption;
    

    This warning notes that calling the WithEffectiveAction method may return null , but the result is used without checking ( effectiveRuleset.GeneralDiagnosticOption ). The body of the WithEffectiveAction method , which can return null, is written to the effectiveRuleset variable :
    public RuleSet WithEffectiveAction(ReportDiagnostic action)
    {
      if (!_includes.IsEmpty)
        throw new ArgumentException(....);
      switch (action)
      {
        case ReportDiagnostic.Default:
          return this;
        case ReportDiagnostic.Suppress:
          return null;
        ....     
          return new RuleSet(....);
         default:
           return null;
       }
    }


    If you enable Nullable Reference mode for the GetEffectiveRuleSet method , we will have two places in which we need to change the behavior. Since there is an exception throw in the method above, it is logical to assume that the method call is wrapped in a try-catch block and it will correctly rewrite the method, throwing an exception instead of returning null. But climbing up the challenges, we see that the interception is high and the consequences can be quite unpredictable. Let's look at the consumer variable effectiveRuleset - IsStricterThan method
    private static bool 
      IsStricterThan(ReportDiagnostic action1, ReportDiagnostic action2)
    {
      switch (action2)
      {
        case ReportDiagnostic.Suppress:
          ....;
        case ReportDiagnostic.Warn:
          return action1 == ReportDiagnostic.Error;
        case ReportDiagnostic.Error:
          return false;
        default:
          return false;
      }
    }

    As you can see, this is a simple switch for two enumerations with a possible enumeration value of ReportDiagnostic.Default . So it’s best to rewrite the call as follows: The WithEffectiveAction

    signature will change:
    #nullable enable
    public RuleSet? WithEffectiveAction(ReportDiagnostic action)

    the call will look like this:
    RuleSet? effectiveRuleset = 
      ruleSet.GetEffectiveRuleSet(includedRulesetPaths);
    effectiveRuleset = 
      effectiveRuleset?.WithEffectiveAction(ruleSetInclude.Action);
    if (IsStricterThan(effectiveRuleset?.GeneralDiagnosticOption ?? 
                         ReportDiagnostic.Default,
                       effectiveGeneralOption))
       effectiveGeneralOption = effectiveRuleset.GeneralDiagnosticOption;

    knowing that IsStricterThan performs only comparison - the condition can be rewritten, for example like this:
    if (effectiveRuleset == null || 
        IsStricterThan(effectiveRuleset.GeneralDiagnosticOption,
                       effectiveGeneralOption))

    Let's move on to the next message from the analyzer.

    V3080 Possible null dereference. Consider inspecting 'propertySymbol'. BinderFactory.BinderFactoryVisitor.cs 372
    var propertySymbol = GetPropertySymbol(parent, resultBinder);
    var accessor = propertySymbol.GetMethod;
    if ((object)accessor != null)
      resultBinder = new InMethodBinder(accessor, resultBinder);

    The further use of the propertySymbol variable must be taken into account when correcting the analyzer warning.
    private SourcePropertySymbol GetPropertySymbol(
      BasePropertyDeclarationSyntax basePropertyDeclarationSyntax,
      Binder outerBinder)
    {
      ....
      NamedTypeSymbol container 
        = GetContainerType(outerBinder, basePropertyDeclarationSyntax);
      if ((object)container == null)
        return null;
      ....
      return (SourcePropertySymbol)GetMemberSymbol(propertyName,
        basePropertyDeclarationSyntax.Span, container,
        SymbolKind.Property);
    }

    The GetMemberSymbol method may also return null in some cases.
    private Symbol GetMemberSymbol(
      string memberName, 
      TextSpan memberSpan, 
      NamedTypeSymbol container, 
      SymbolKind kind)
    {
      foreach (Symbol sym in container.GetMembers(memberName))
      {
        if (sym.Kind != kind)
          continue;
        if (sym.Kind == SymbolKind.Method)
        {
          ....
          var implementation =
            ((MethodSymbol)sym).PartialImplementationPart;
          if ((object)implementation != null)
            if (InSpan(implementation.Locations[0],
                this.syntaxTree, memberSpan))
              return implementation;
        }
        else if (InSpan(sym.Locations, this.syntaxTree, memberSpan))
          return sym;
      }
      return null;
    }

    Using a nullable reference type, the call will change like this:
    #nullable enable
    SourcePropertySymbol? propertySymbol 
      = GetPropertySymbol(parent, resultBinder);
    MethodSymbol? accessor = propertySymbol?.GetMethod;
    if ((object)accessor != null)
      resultBinder = new InMethodBinder(accessor, resultBinder);

    Pretty simple when you know where to fix it. Static analysis easily finds this potential error by getting all possible field values ​​across all chains of procedure calls.

    V3080 Possible null dereference. Consider inspecting 'simpleName'. CSharpCommandLineParser.cs 1556
    string simpleName;
    simpleName = PathUtilities.RemoveExtension(
      PathUtilities.GetFileName(sourceFiles.FirstOrDefault().Path));
    outputFileName = simpleName + outputKind.GetDefaultExtension();
    if (simpleName.Length == 0 && !outputKind.IsNetModule())
      ....

    The problem is in the line with checking simpleName.Length. simpleName is the result of an entire chain of methods and may be null . By the way, for the sake of curiosity, you can look at the RemoveExtension method and find differences from Path.GetFileNameWithoutExtension. Here we could restrict ourselves to checking simpleName! = Null , but in the context of non-zero links, the code will look something like this:
    #nullable enable
    public static string? RemoveExtension(string path) { .... }
    string simpleName;

    The call will look like this:
    simpleName = PathUtilities.RemoveExtension(
      PathUtilities.GetFileName(sourceFiles.FirstOrDefault().Path)) ?? 
      String.Empty;

    Conclusion


    Nullable Reference types can be of great help in planning an architecture built from scratch, but reworking existing code can potentially require a lot of time and care, as it can cause many subtle errors. In this article, we did not aim to discourage anyone from using Nullable Reference types in our projects. We believe this innovation is generally useful for the language, although how it was implemented may raise questions.

    You should always remember the limitations inherent in this approach, and that the Nullable Reference mode turned on does not protect against errors with dereferencing of null links, and if used incorrectly, it can even lead to them. It is worth considering the use of a modern static analyzer, for example PVS-Studio, which supports interprocedural analysis, as an additional tool that, together with the Nullable Reference, can protect you from dereferencing null references. Each of these approaches - both in-depth interprocedural analysis and annotation of method signatures (which essentially makes the Nullable Reference), has its advantages and disadvantages. The analyzer will allow you to get a list of potentially dangerous places, and also, when changing an existing code, see all the consequences of such changes. If you assignnull in some case, the analyzer should immediately indicate all consumers of the variable, where it is not checked before dereferencing.

    You can independently search for some other errors both in the considered project, and in your own. To do this, you just need to download and try the PVS-Studio analyzer.



    If you want to share this article with an English-speaking audience, then please use the link to the translation: Paul Eremeev, Alexander Senichkin. Nullable Reference types in C # 8.0 and static analysis

    Also popular now: