The inner world of Razor. Part 1 – Recursive Ping Pong

Original author: VibrantCode
  • Transfer
This is the first article about the new ASP.NET parser - Razor. On which we worked long enough, and I would like to tell readers how it works.

The Razor parser is very different from the existing ASPX parser. In fact, the ASPX parser, almost entirely, is built on regular expressions, because the syntax is simple enough to parse. The Razor parser is divided into three components:
  1. A markup parser that has a basic understanding of HTML syntax.
  2. A parser for a code that has a basic representation of C # or VB.
  3. And the main “conductor” who knows how to connect two parsers together.

When I say “basic presentation” I mean the basics, we are not talking about a completely independent C # and HTML parser. In our team, we joke, calling them “Markup identifier” and “Code interpreter” :) In

total, three “actors” play on the Razor scene: the kernel parser, the markup parser, and the code parser. All three work together to parse a Razor document. Now, let's take a Razor file and give a complete overview of the parsing procedure using the actors data. We will use the following example:

So, let's start from above. In fact, the Razor parser is in one of the states at any moment of parsing: parsing the markup of a document, parsing the markup of a block, or parsing a block of code. The first two are processed by the markup parser, and the last by the code parser. When the kernel parser is launched for the first time, it calls the markup parser and asks it to parse the markup of the document and return the result. Now the parser is parsing the markup of the document. In this state, he simply searches for the “@” symbol, he doesn’t care what tags he comes across and everything about HTML, the main goal is “@”. When he found @, he decides - is it switching to a code or email address? This solution is based on the characters before and after @, checking the email address for validity. This is just a standard procedure, there is a sequence of checks,

In this case, when we see the first “@” character, it is preceded by a space, which is not valid for the email address. So we know for sure that we need to switch to code mode. The markup parser calls the code parser inside and asks to parse the code block. A block, in the definition of a Razor parser, is basically a single piece of code or markup with a clear start and end. So “foreach” in our case is an example of a block of code. It starts with the character “f” and ends with “}”. The code parser knows enough about C # to understand this, so it starts parsing the code. The code parser does some simple tracking of C # statements, so when it gets to “
  • ”, Then it understands that the tag is at the beginning of the C # expression. “
  • ”Cannot be placed at the beginning of a C # expression, so the code parser knows that the markup block starts from this point. Therefore, it returns to the markup parser call in order to parse the HTML block. This creates a kind of recursive ping-pong between code and markup parsers. We started with the markup, then we called the code inside, then the markup again, and so on, until we got the result of the whole chain of calls:

    (Of course, I excluded many auxiliary methods from the list :).

    This sheds light on the fundamental difference between ASPX and Razor. In aspx files, you can think of code and markup as two parallel threads. You write the markup, then jump over and write the code, then come back and write the markup, etc. Razor same files as a tree. You write the markup, then put the code in it, then put the markup in the code, etc.

    So we just called the markup parser to parse the markup block, the block starts with “
  • ”And ends with“
  • " Until we find “”, we decide that the markup block is finished. So if you have a “}” somewhere inside “
  • ”He will not complete the“ foreach ”since we have not advanced far enough up the stack.

    During parsing “
  • ”, The markup parser sees a lot of“ @ ”characters, which implies many calls to the code parser. Thus, the call stack grows:

    I will delve into the details of processing blocks later, because the process is a bit complicated, as a result, we ended up with these blocks with codes and returned to the block “
  • " Next, we see “
  • ”, So that we complete this block and return to the“ foreach ”block. “}” Closes the block, so now we are again at the top of the stack, the markup document. After that, we read until we reach the end of the file, finding no more “@” characters. And voila! We parsed this file! "

    I hope the general structure of the parsing algorithm is clear. The main thing is to stop thinking that the code parser and markup work in separate threads, and instead the constructions are located in one another. I’ll hint, we got inspiration from PowerShell;).

    Also popular now: