Extending C # with Roslyn. Secure calls

    Have you ever had the feeling that in the X language in which you are currently programming something is missing? Some small, but pleasant bun that might not have made your life absolutely happy, but would definitely add a lot of joyful moments. And with black envy you look at the Y language in which this thing is, sadly sigh and secretly pour tears of powerlessness into your beloved pillow at night. It happened?

    Perhaps C # gives its adherents fewer reasons for such envy, in comparison with many others, because it is developing dynamically, adding more and more features that simplify life. And yet, there is no limit to perfection, and for each of us - his own.

    I note right away that the priority in this work was the desire to try on a Roslyn tooth, and the idea that I will describe later was more of an occasion and a test case for testing this library. However, in the process of studying and implementing, I found out that, although with some tambourines, the result can actually be used in practice to actually expand the language syntax. How to do this, I will briefly describe at the very end. In the meantime, let's get started.

    Safe Calls and Maybe Monad


    The idea of ​​safe calls is to get rid of annoying checks for any classes on null, which, being a necessity, at the same time significantly clog the code and impair its readability. At the same time, there is no desire to be under constant threat of a NullReferenceException.

    This problem was solved in functional programming languages ​​using the Maybe monad , the essence of which is that after boxing, the type used in pipeline computing may contain some value, or the value Nothing. If the previous calculation in the pipeline gave some result, then the next calculation is performed, if it returned Nothing, then Nothing is returned instead of the next calculation.

    In C #, all conditions for the implementation of this monad are created - null is used instead of Nothing, their Nullable can be used for structural typesversion. In principle, the idea is already in the air, and there were several articles that implemented this monad in C # using LINQ. One of them belongs to Dmitry Nesteruk mezastel , there is another .

    But it should be noted that for all the temptation of this approach, the resulting code using the monad looks very vague, because of the need to use wrappers from lambda functions and LINQ instead of direct calls. However, without the syntactic means of the language, it is hardly possible to implement it more elegantly.

    A rather elegant, as it seemed to me, way of implementing this idea I found in the specification of the Kotlin language for the JDK that has not yet been created by the guys from my beloved JetBrains (Null-safety ). As it turned out, this is already in Groovy, perhaps somewhere else.

    So what is this safe call statement? Suppose we have an expression:
    string text = SomeObject.ToString();

    If SomeObject is null, we will inevitably, as already mentioned, get a NullReferenceException. To avoid this, we define in addition to the direct call operator '.' also the safe call operator '?.' which is as follows:
    string text = SomeObject?.ToString();

    and is actually an expression:
    string text = SomeObject != null ? SomeObject.ToString() : null;

    In the event that a safely called method or property returns a structural type, it is necessary that the assigned variable be of type Nullable.
    int? count = SomeList?.Count;

    Like regular calls, these safe calls can be used in chains, for example:
    int? length = SomeObject?.ToString()?.Length;

    which translates to expression:
    int? length = SomeObject != null ? SomeObject.ToString() != null ? SomeObject.ToString().Length : null : null;

    This hides some of the drawbacks of the transformation I am proposing, since it generates additional function calls. In fact, it would be desirable to transform it, for example, to the form:
    var temp = SomeObject;
    string text = null;
    if (temp != null)
        text = temp.ToString();

    However, due to the somewhat verbose nature of Roslyn, so that the examples would not be too bloated and boring, I decided to make the conversion simpler. However, this is in the following parts.

    Project roslyn


    As you may have already heard, the CTP version of the Roslyn project was recently released , under which the C # and VB language developers completely rewrote the language compilers using managed code, and access to these compilers was open as an API. With it, developers can do many useful things, for example, it’s very convenient and simple to analyze, optimize, generate code, write extensions and code fixes for the studio, and possibly their own DSLs. True, it will not be released soon, already through one version of Visual Studio, but I want to feel it now.

    We turn to solving our problem and first of all imagine how we would like to see the use of this extension of the language in action? Obviously: we write the code, as usual, in our favorite IDE, use the safe call operators where necessary, press Build, during compilation, the utility we wrote with Project Roslyn converts all this into syntactically correct C # code and voila, everything was compiled. I hasten to disappoint you - Roslyn does not allow you to interfere with the operation of the current csc.exe compiler, which, in principle, is quite explainable. It is likely that if in the same vNext studio the compiler is replaced with its Managed counterpart, then such an opportunity will appear. But while she is gone.

    At the same time, there are already two workarounds:
    1. You can create your own compiler instead of the current csc.exe using the same Roslyn API, and change your build system by replacing csc.exe with your own analogue, including in addition to the default compilation (quite simply programmed, by the way) your preliminary transformations code.
    2. You can use your console program as a Pre-Build task that converts source code files and saves the received new sources to the Obj folder. WPF is currently being compiled in a very similar way, when xaml files in the pre-build phase are converted to .g.cs files.


    Project Roslyn provides several types of functionality, but one of the key ones is the construction, analysis and transformation of an abstract syntax tree. It is this functionality of it that we will use further.

    Implementation


    Of course, everything written below is just an example, suffers from many defects and cannot be used in reality without significant improvements, however, it shows that such things can be done in principle.
    Let's move on to implementation. In order to write a program, we first need to install the Roslyn SDK, which can be downloaded from the link , we also have to first install Service Pack 1 for Visual Studio 2010, and Visual Studio 2010 SDK SP1.
    After all these operations, the Roslyn sub-item will appear in the menu for creating new projects, which includes several project templates (some of which can be integrated into the IDE). We will create a simple console application.
    For an example we will use the following "source code":
    public class Example
    {
        public const string CODE =
        @"using System;
        using System.Linq;
        using System.Windows;
        namespace HelloWorld
        {
            public class TestClass
            {
                public string TestField;
                public string TestProperty { get; set; }
                public string TestMethod() { return null; }
                public string TestMethod2(int k, string p) { return null; }
                public TestClass ChainTest;
            }
            public class OtherClass
            {
                public void Test()
                {
                    TestClass test;
                    string testStr1;
                    testStr1 = test?.TestField;
                    string testStr3 = test?.TestProperty;
                    string testStr4 = test?.TestMethod();
                    string testStr5 = test?.TestMethod2(100, testStr3);
                    var test3 = test?.ChainTest?.TestField;
                }
            }
        }";
    }

    This source code, with the exception of safe call statements, is not only syntactically correct, but also compiled, although this is not necessary for our conversion.

    First of all, you need to build an abstract syntax tree from the source file. This is done in two ways:
    SyntaxTree tree = SyntaxTree.ParseCompilationUnit(Example.CODE);
    SyntaxNode root = tree.Root;

    The syntax tree is defined by the SyntaxTree class and, strangely enough, is a tree of nodes inheriting from the base SyntaxNode type, each of which represents a certain expression - binary expressions, conditional expressions, method invocation expressions, definitions of properties and variables. Naturally, absolutely any C # construct can be displayed by some instance of the SyntaxNode descendant class. In addition, the SyntaxTree class contains SyntaxToken sets that define source code parsing at the minimum syntax blocks - keywords, literals, identifiers and punctuation (curly and parentheses, commas, semicolons). Finally, SyntaxTree in contains SyntaxTrivia elements - those that by and large are not important for understanding the code - spaces and tabs, comments, preprocessor directives, etc.

    Here you should know one small detail - Roslyn is very tolerant to parsing files. That is, although in a good way, it needs to provide syntactically correct source code for parsing, in fact, it tries to convert absolutely any text in some way to some AST. Including our syntactically incorrect code. We will use this fact. Let's try to build a syntax tree and find out how Roslyn displays our safe call operator in the tree.

    It turns out that everything is simple: from the point of view of Roslyn, the expression test? .TestField is a ternary operator with the condition “test”, the expression “when true” is “.TestField”, and the empty expression “when is false”. Armed with this information, we will transform our tree. Here we come across another Roslyn feature - the syntax tree that it builds is immutable, that is, it will not work to fix anything directly in the existing structure. But it doesn’t matter. Roslyn suggests using the SyntaxRewriter class for such an operation, which inherits the SyntaxVisitor class, which, as the name implies, implements the notorious Visitor pattern. It contains many virtual methods that process a visit to a node of each particular type (for example, VisitFieldDeclaration, VisitEnumMemberDeclaration, ... there are about 180 of them in total).

    We need to create our own descendant of the SyntaxRewriter class and override the VisitConditionalExpression method, which is called when the visitor bypasses the expression, which is a ternary operator. Next, I will give the entire implementation code, especially since it is small, and I will add only some explanations:
    // Находит в синтаксическом дереве операторы безопасного вызова и заменяет их на тернарные операторы
    public class SafeCallRewriter : SyntaxRewriter
    {
        //Был ли в данный проход заменен хотя бы один оператор ?.
        public bool IsSafeCallRewrited { get; set; }
        protected override SyntaxNode VisitConditionalExpression(ConditionalExpressionSyntax node)
        {
            if (IsSafeCallExpression(node))
            {
                //Строим expression для объекта, проверяемого на null
                string identTxt = node.Condition.GetText();
                ExpressionSyntax ident = Syntax.ParseExpression(identTxt);
                //Строим expression для кода, вызываемого при успешной проверка на != null
                string exprTxt = node.WhenTrue.GetText();
                exprTxt = exprTxt.Substring(1, exprTxt.Length - 1);//убираем точку из записи выражения
                exprTxt = identTxt + '.' + exprTxt;
                ExpressionSyntax expr = Syntax.ParseExpression(exprTxt);
                ExpressionSyntax synt =
                    Syntax.ConditionalExpression(//тернарный оператор
                    condition: Syntax.BinaryExpression(//проверяемое условие ident != null
                        SyntaxKind.NotEqualsExpression,
                        left: ident, //левый операнд - проверяемый объект
                        right: Syntax.LiteralExpression(SyntaxKind.NullLiteralExpression)), //литерал null
                    whenTrue: expr,
                    whenFalse: Syntax.LiteralExpression(SyntaxKind.NullLiteralExpression));
                IsSafeCallRewrited = true;
                return synt;
            }
            return base.VisitConditionalExpression(node);
        }
        //Является ли тернарный оператор на самом деле оператором безопасного вызова
        private bool IsSafeCallExpression(ConditionalExpressionSyntax node)
        {
            return node.WhenTrue.GetText()[0] == '.';
        }
    }

    I note that my first implementation tried to work only with the logical structure of AST, disdaining work with the textual representation of expressions, but its complexity very soon began to exceed all conceivable limits. There were only three functions for defining a safe call and its type: for fields and properties, for calling methods, for chains of safe calls, because all this seemed to be different descendants of the SyntaxNode class, and many more functions for converting various types of safe operators. Having completely exhaled, I threw the first option into the trash and the second time I used the convenient GetText and ParseExpression functions provided by Roslyn and some dirty hacks at the line level :).

    I also advise you to pay attention to the process of creating a syntax node (in this case, ConditionalExpression) and the pleasantness of using a C # chip like named parameters in this case. I guarantee, if it weren’t, in the process of constructing syntactic nodes one could go crazy.

    Here is the code for the main procedure:
    static void Main(string[] args)
    {
        //Строим синтаксическое дерево
        SyntaxTree tree = SyntaxTree.ParseCompilationUnit(Example.CODE);
        SyntaxNode root = tree.Root;
        SafeCallRewriter rewriter = new SafeCallRewriter();
        do
        {
            rewriter.IsSafeCallRewrited = false;
            //Обходим дерево, производя заданные операции в различных типах узлов и переписывая дерево
            root = rewriter.Visit(root);
        } while (rewriter.IsSafeCallRewrited);//за предыдущий проход был найден и преобразован хоть 1 maybe-оператор
        root = root.Format();//программный Ctrl+K, Ctrl+D
        Console.WriteLine(root.ToString());
    }

    Let me explain that several rewrites of the tree are necessary in order to process call chains. Of course, this could be done by recursion, but perhaps in this case it would only blur the code. Also pay attention to the wonderful function of Format. It programmatically does the specified stylistic formatting of the code, i.e. adds to AST all the necessary SyntaxTrivia.

    As a result, we have the following code:
    using System;
    using System.Linq;
    using System.Windows;
    namespace HelloWorld
    {
        public class TestClass
        {
            public string TestField;
            public string TestProperty
            {
                get;
                set;
            }
            public string TestMethod()
            {
                return null;
            }
            public string TestMethod2(int k, string p)
            {
                return null;
            }
            public TestClass ChainTest;
        }
        public class OtherClass
        {
            public void Test()
            {
                TestClass test;
                string testStr1;
                testStr1 = test != null ? test.TestField : null;
                string testStr3 = test != null ? test.TestProperty : null;
                string testStr4 = test != null ? test.TestMethod() : null;
                string testStr5 = test != null ? test.TestMethod2(100, testStr3) : null;
                var test3 = test != null ? test.ChainTest != null ? test.ChainTest.TestField : null : null;
            }
        }
    }

    So, the first acquaintance with Roslyn was successful, and its prospects in general, not necessarily for writing language extensions, are very good. Perhaps, if there are enthusiasts, this could be dealt with deeper and more seriously. In C #, there is still much that we lack. :)

    PS Another example of similar use of Roslyn, which helped me a lot, is given here .

    Also popular now: