yallie June 29, 2014 at 23:33

Sprache grammar inheritance (or another custom expression calculator for .NET)

Tutorial

The article demonstrates the technique of creating parsers using grammar inheritance. Inheritance allows you to describe new grammars based on existing ones by adding new rules or redefining inherited ones, which greatly simplifies the implementation of new parsers. Changes to the basic grammar automatically become available in all generated grammars. The main field of application of this technique is the support of several dialects or versions of languages.

Grammar inheritance support is available in some parser generators (for example, ANTLR, Nitra), and is automatically available in tools that use object-oriented languages as DSL grammar description languages (for example, Sprache and Irony).

As an example of the application for the article, a custom expression expression library with support for user-defined functions and variables was taken. The calculator compiles strings into LINQ expressions that can easily be converted to strongly typed delegates. Unlike interpreting calculators like NCalc, compiled expressions do not differ in speed from the methods written in C #. An example of using a ready-made calculator:

// выражение с переменными
var expr = calc.ParseExpression("Sin(y/x)", x => 2, y => System.Math.PI);
var func = expr.Compile();
Console.WriteLine("Result = {0}", func());
// пользовательские функции
calc.RegisterFunction("Mul", (a, b, c) => a * b * c);
expr = calc.ParseExpression("2 ^ Mul(PI, a, b)", a => 2, b => 10);
Console.WriteLine("Result = {0}", func.Compile()());

Sprache at a Glance

Sprache is a minimalistic functional library for building combinatorial parsers. As the authors of the library modestly claim, it occupies an intermediate position between regular expressions and full-fledged parser building tools like ANTLR.

I would say that Sprache is an excellent tool that is great for a wide range of tasks and has a special appeal because it encourages step-by-step development of grammar and TDD. Of course, combinatorial parsers have certain drawbacks (for example, difficulties with diagnosing and recovering from errors), however, such details are irrelevant for the topic of this article.

A parser in Sprache is a function that transforms an input string into some other object. Unlike most compiler building tools, Sprache does not use code generation. Parsers are defined directly in the text of the program, and they can immediately be used to parse the text. This allows you to write unit tests for them in parallel with the description of the parsers, which is very convenient. Here is an example of a simple parser that takes a string of repeating letters A:

var parseA = Parse.Char('A').AtLeastOnce();

Simple parsers are combined into more complex parsers. For a combination of parsers, Sprache defines a lot of extension methods (for example, Or, And, Many, and so on), but the definition of parsers as LINQ queries is especially impressive:

Parser identifier =
    from leading in Parse.WhiteSpace.Many()
    from first in Parse.Letter.Once()
    from rest in Parse.LetterOrDigit.Many()
    from trailing in Parse.WhiteSpace.Many()
    select new string(first.Concat(rest).ToArray());
var id = identifier.Parse(" abc123  ");
Assert.AreEqual("abc123", id);

The totality of all the rules, or grammar of a language, in Sprache usually looks like a static class with parser fields. You can read more about Sprache in a review article, for which there is a translation on the hub:

Building DSL in C # using parser combinators

Calculator device

Our calculator can work in three modes: simple, scientific and customizable.

A simple calculator supports the usual arithmetic operations on real floating-point numbers, unary minus and brackets. Scientific mode adds support for binary and hexadecimal numbers, exponential notation and calls to any functions from the System.Math class, and in custom mode, you can use parameters and register your own functions (with the possibility of overloading).

Each next mode supports all the features of previous modes and adds new ones. In the same way, a hierarchy of grammar classes describing the input languages of calculator expressions will be arranged. The calculator parser is a function that converts the input string to a LINQ expression, which can be compiled into a delegate and called like a regular function:

var expr = "4*(1/1-1/3+1/5-1/7+1/9-1/11+1/13-1/15+1/17-1/19+10/401)";
var func = calc.ParseExpression(expr).Compile();
var result = func();

Simple calculator

As a basis for a simple calculator, we took an example from the Sprache delivery - the ultra-compact LinqyCalculator. The grammar is broken down into rules so as to simplify the creation of LINQ expressions at compile time:

Expr ::= Term ("+"|"-" Term)*
Term ::= InnerTerm ("*"|"/"|"%" InnerTerm)*
InnerTerm ::= Operand ("^" Operand)
Operand ::= NegativeFactor | Factor
NegativeFactor ::= "-" Factor
Factor ::= "(" Expr ")" | Constant
Constant ::= Decimal

Sprache parsers are usually declared as static lambda functions. This does not suit us, because they cannot be redefined in descendant classes, so we will declare the rules as virtual properties.

// Было:
public static readonly Parser ExpressionInParentheses =
	from lparen in Parse.Char('(')
	from expr in Expr
	from rparen in Parse.Char(')')
	select expr;
// Стало:
protected virtual Parser ExpressionInParentheses
{
	get
	{
		return
			from lparen in Parse.Char('(')
			from expr in Expr
			from rparen in Parse.Char(')')
			select expr;
	}
}

After such an alteration, the grammar slightly increases in size, but now any rules can be redefined in descendant classes. In order to write unit tests for each rule, you will have to declare parser properties as public or protected internal.

I will not give the full text of the grammar of a simple calculator, it can be viewed on the github . In its substantial part, it practically repeats the standard LinqyCalculator example from Sprache.

Scientific calculator

Since a scientific calculator can at least do the same as a normal one, its class inherits from the grammar of a simple calculator. To support binary and hexadecimal numbers, we add new rules:

protected virtual Parser Binary
{
	get
	{
		return Parse.IgnoreCase("0b").Then(x =>
			Parse.Chars("01").AtLeastOnce().Text()).Token();
	}
}
protected virtual Parser Hexadecimal
{
	get
	{
		return Parse.IgnoreCase("0x").Then(x =>
			Parse.Chars("0123456789ABCDEFabcdef").AtLeastOnce().Text()).Token();
	}
}

Defining new rules is not enough, because the basic grammar does not know at what point they can be applied. Since binary and hexadecimal numbers are a form of constants, add them to the Constant parser.

Note: the Binary and Hexadecimal parsers return a string, and the Constant parser returns a LINQ expression. You will need helper methods that convert strings to Expression.Constant (double). Ready Constant parser with support for decimal, binary and hexadecimal numbers takes the following form:

protected override Parser Constant
{
	get
	{
		return
			Hexadecimal.Select(x => Expression.Constant((double)ConvertHexadecimal(x)))
			.Or(Binary.Select(b => Expression.Constant((double)ConvertBinary(b))))
			.Or(base.Constant);
	}
}

To support functions in expressions, you need two more rules:

protected virtual Parser Identifier
{
	get { return Parse.Letter.AtLeastOnce().Text().Token(); }
}
protected virtual Parser FunctionCall
{
	get
	{
		return
			from name in Identifier
			from lparen in Parse.Char('(')
			from expr in Expr.DelimitedBy(Parse.Char(',').Token())
			from rparen in Parse.Char(')')
			select CallFunction(name, expr.ToArray());
	}
}

The CallFunction helper method simply generates a LINQ expression to invoke a static one from the System.Math class with the specified name:

protected virtual Expression CallFunction(string name, Expression[] parameters)
{
	var methodInfo = typeof(Math).GetMethod(name, parameters.Select(e => e.Type).ToArray());
	if (methodInfo == null)
	{
		throw new ParseException(string.Format("Function '{0}({1})' does not exist.",
			name, string.Join(",", parameters.Select(e => e.Type.Name))));
	}
	return Expression.Call(methodInfo, parameters);
}

Since the basic grammar does not know anything about the new rules, you need to connect them to some rule of the basic grammar. Here, choosing the right rule is not as easy as in the case of constants. A suitable rule will be determined by the priority of the function call operation.

It is easy to notice that this priority should be the highest - the same as operations in parentheses. For example, when calculating the expression Sin (2) ^ Cos (3), you first need to calculate the values of the functions, and then perform the exponentiation operation.

In the basic grammar, bracketing appears in the Factor rule, so we need to redefine it:

protected override Parser Factor
{
	get { return base.Factor.XOr(FunctionCall); }
}

Adding custom features

For the most sophisticated version of the calculator, create a new class based on a scientific calculator. Adding custom functions obviously does not require the introduction of new grammar rules, because the syntax of the expressions remains the same. Only the method that is involved in calling the functions will change. Pseudocode:

override Expression CallFunction(string name, Expression[] parameters)
{
	если есть пользовательская функция с именем name,
	{
		возвращаем выражение для вызова этой функции с параметрами parameters;
	}
	// в противном случае пробуем обратиться к System.Math 
	return base.CallFunction(name, parameters);
}

Any custom function for the calculator can be represented as a delegate Func. Named functions are conveniently stored in the dictionary: Dictionary>. To allow overloading of functions, it is enough to attach a number of parameters to the name:

"Min:2" — функция Min с двумя параметрами
"Min:5" — функция Min с пятью параметрами

As a result, the above pseudocode will turn into something like this:

protected override Expression CallFunction(string name, Expression[] parameters)
{
	// попробовать найти пользовательскую функцию
	var mangledName = name + ":" + parameters.Length;
	if (CustomFuctions.ContainsKey(mangledName))
	{
		return Expression.Call(...); // вызвать функцию с этим именем
	}
	// вызвать метод System.Math
	return base.CallFunction(name, parameters);
}

Expression.Call expression, which needs to be generated to call a user-defined function, is of some complexity. The fact is that Expression.Call can only call existing methods, which obviously do not include user-defined functions. To get out in this situation, it is enough to define the following method in the calculator class:

protected virtual double CallCustomFunction(string mangledName, double[] parameters)
{
	return CustomFuctions[mangledName](parameters);
}

This method will call the Expression.Call expression, which we will generate at compilation. All that remains for us is to convert the list of parameters into one array parameter:

protected override Expression CallFunction(string name, Expression[] parameters)
{
	// попробовать найти пользовательскую функцию
	var mangledName = name + ":" + parameters.Length;
	if (CustomFuctions.ContainsKey(mangledName))
	{
		// подготовить параметры
		var newParameters = new List();
		newParameters.Add(Expression.Constant(mangledName));
		newParameters.Add(Expression.NewArrayInit(typeof(double), parameters));
		// вызвать this.CallCustomFunction(mangledName, double[]);
		var callCustomFunction = new Func(CallCustomFunction).Method;
		return Expression.Call(Expression.Constant(this), callCustomFunction, newParameters.ToArray());
	}
	// вызвать метод System.Math
	return base.CallFunction(name, parameters);
}

Adding Options

To support the parameters, you need to refine the grammar: a new rule and update the old rules. A parameter is just an identifier that can occur in the same place as a constant or function call:

protected virtual Parser Parameter
{
	get { return Identifier; }
}
protected override Parser Factor
{
	get { return Parameter.Or(base.Factor); }
}

Here we meet the conflict for the first time. The fact is that in the Factor rule there are now two alternatives that both begin with an identifier: a parameter and a function. If the parser encounters an identifier, it cannot determine the parameter in front of it or the function until it looks ahead. If the identifier is followed by the parenthesis "(", then this is a function, otherwise it is a parameter.

As far as I know, Sprache does not help in finding such conflicts in any way. You can detect them only with a closer look. You add rules, write unit tests for them and at one point you find that some tests fail, reporting parsing errors.The case of parameters and functions is quite trivial, but more often finding and resolving conflicts is a serious task that takes a lot of time and effort.

To resolve the conflict between parameters and functions, we can define the parameter as "Identifier, not followed by a bracket". Such a rule will not lead to conflict, since it eliminates ambiguity. It looks like this:

protected virtual Parser Parameter
{
	get
	{
		// identifier not followed by a '(' is a parameter reference
		return
			from id in Identifier
			from n in Parse.Not(Parse.Char('('))
			select GetParameterExpression(id);
	}
}

The Parse.Not parser is similar to a negative lookahead in regular expressions: it does not change the pointer of the current character and works successfully if the parser passed to it, in this case Parse.Char ('('), fails.

As with by calling functions, we need to somehow generate an expression that returns the value of the parameter. It is time to decide how the parameters will be passed to the calculator. At first glance, we can deal with the parameters in the same way as with user-defined functions: register them in a special dictionary e Dictionarystored in the calculator:

calc.Parameters["MyPI"] = 355/113d;
calc.Parameters["MyE"] = 2.718d;

Compilation of the call to such a parameter will be arranged similarly to calling a user-defined function. The calculator will generate a call to the GetParameterExpression method, passing it the parameter name. If the parameter is not defined, you can try to find it among the constants of the System.Math class:

protected virtual Expression GetParameterExpression(string id)
{
	// попробовать взять значение параметра, если оно определено
	if (Parameters.ContainsKey(id))
	{
		// вызвать this.GetParameterValue(id)
		var getParameterValue = new Func(GetParameterValue).Method;
		return Expression.Call(Expression.Constant(this), getParameterValue, Expression.Constant(id)) as Expression;
	}
	// попробовать найти константу с таким именем в классе System.Math
	var systemMathConstants = typeof(System.Math).GetFields(BindingFlags.Public | BindingFlags.Static);
	var constant = systemMathConstants.FirstOrDefault(c => c.Name == id);
	if (constant == null)
	{
		throw new ParseException(string.Format("Parameter or constant '{0}' does not exist.", id));
	}
	// вернуть значение константы System.Math
	return Expression.Constant(constant.GetValue(null));
}

Trying to use such a calculator, we immediately find the inconvenience of such a repository of parameters. The calculator can compile a lot of expressions, and it has one repository of parameters. All expressions will use the same parameter pool associated with the calculator instance.

calc.Parameters["P"] = 3.14d;
calc.Parameters["R"] = 10;
var func1 = calc.ParseExpression("2*P*R").Compile();
var result1 = func1();
var func2 = calc.ParseExpression("R+P").Compile();
calc.Parameters["P"] = 123;
var result2 = func2();

A common pool of parameters makes it impossible to use expressions in a multithreaded program. One thread will set one parameter value, another thread will set another, and the result of the calculation will become undefined. Obviously, for passing parameters you need to come up with a more reliable mechanism.

Another way to pass parameters

It would be logical to link the list of parameters not to the calculator, but to the expression. To do this, you need to change the type of calculator result expression: instead of Expression will need to generate Expression, double >>. Here's what it might look like:

var function = calc.ParseFunction("y/x").Compile();
var parameters = new Dictionary{ { "x", 2 }, { "y", 4 } };
var result = function(parameters);

For such a scheme, not so much alteration will be required. Instead of calling this.GetParameterValue, you just need to generate a call to the parameter dictionary: parameters [name]. An indexer in C # is compiled into a call to the get_Item method, so access to the parameter will look like this:

var getItemMethod = typeof(Dictionary).GetMethod("get_Item");
return Expression.Call(ParameterExpression, getItemMethod, Expression.Constant(name));

In order not to complicate the expression, we will not check whether there is a parameter with the specified name in the dictionary. If there is no parameter, the Dictionary class itself will complain about it. Here is the complete method for compiling the parameters:

protected virtual Expression GetParameterExpression(string name)
{
	// try to find a constant in System.Math
	var systemMathConstants = typeof(System.Math).GetFields(BindingFlags.Public | BindingFlags.Static);
	var constant = systemMathConstants.FirstOrDefault(c => c.Name == name);
	if (constant != null)
	{
		// return System.Math constant value
		return Expression.Constant(constant.GetValue(null));
	}
	// return parameter value: Parameters[name]
	var getItemMethod = typeof(ParameterList).GetMethod("get_Item");
	return Expression.Call(ParameterExpression, getItemMethod, Expression.Constant(name));
}

Syntactic sugar for parameters

At the beginning of the article, an example of using a calculator to calculate an expression with parameters is given:

var expr = calc.ParseExpression("Sin(y/x)", x => 2, y => System.Math.PI);

This syntax is read much better than creating and populating a Dictionary. It is convenient to use when the list of valid expression parameters is fixed. Although this is not related to the actual analysis of expressions, I’ll explain how this method works:

public Expression> ParseExpression(string text, params Expression>[] parameters)
{
	var paramList = new Dictionary();
	foreach (var p in parameters)
	{
		var paramName = p.Parameters.Single().Name;
		var paramValue = p.Compile()(0);
		paramList[paramName] = paramValue;
	}
	return ParseExpression(text, paramList);
}

Two words about unit tests

When developing parsers on Sprache, it is very convenient to write unit tests in parallel with the development of grammars. Added a new parser - immediately wrote a set of tests for it (TDD adherents will do in the reverse order). Since the Sprache library does not analyze the grammar in any way, it cannot report problems like conflicts or left recursion (although it can track simple left recursion at run time), and the set of unit tests becomes the only support.

Grammar inheritance puts additional responsibility on unit tests: for each class, you need to make sure that all inherited rules continue to work and interact normally with rules overridden in descendant classes. To do this, you can use the ForEachCalculator helper method, which runs tests on all versions of the calculator:

private void ForEachCalculator(Action fact)
{
	foreach (var calc in new[] { new SimpleCalculator(), new ScientificCalculator(), new XtensibleCalculator() })
	{
		fact(calc);
	}
}
[Fact]
public void ExprCombinesTermsWithAddSubOperators()
{
	ForEachCalculator(calc =>
	{
		Assert.Equal(4d, calc.Expr.Parse("2+2").Execute());
		Assert.Equal(2d, calc.Expr.Parse("2*3-4*1").Execute());
		Assert.Throws(() => calc.Expr.Parse("+"));
	});
}

However, a more elegant solution, which is used in unit tests of the calculator, is to inherit the tests. The base class of tests defines a virtual method CreateCalculator, which creates a calculator for testing. The base class of tests creates and tests SimpleCalculator, its descendant creates tests of ScientificCalculator, inheriting all the tests of the base class, but performing them for the descendant calculator, and so on.

Summary

So, we got three calculator options, differing in a set of features and syntax of input expressions. Thanks to inheritance, most of the rules of basic grammar are reused in descendant grammars, which allows you to focus development on differences from the basic version. The full source code of the calculator project is available on github .

Grammar inheritance is a powerful technique that in many cases allows you to control the complexity of grammars and speed up parser development. In the case of the Sprache library, grammar inheritance seems to be a good tool, further encouraging decomposition and step-by-step development in parallel with unit testing.

Tags: