Internal DSL & Expression Trees - dynamic creation of serialize, copy, clone, equals functions (Part I)
The article is devoted to the dual use of Expression Trees API - for parsing expressions and for generating code. Parsing expressions helps build representation structures (they are also representation structures of the problem-oriented Internal DSL language ), and code generation allows you to dynamically create effective functions — sets of instructions specified by representation structures.
I will demonstrate the dynamic creation of property iterators: serialize, copy, clone, equals . Using the example of serialize, I’ll show how to optimize serialization (as compared to stream serializers) in a classic situation where prior knowledge is used to improve performance. The idea is that the call to a streaming serializer always loses the "non-streaming" function that knows exactly which nodes in the tree need to be circumvented. At the same time, such a serializer is created "not by hand" but dynamically, but according to predetermined bypass rules. The proposed Inernal DSL solves the problem of a compact description of the rules for traversing tree structures of objects by their properties / properties (and, in general, bypassing the tree of computations with naming nodes). The serializer's benchmark is modest, but it is important in that it adds an approach built around the use of a particular Internal DSL Includes (the dialect of that Include / ThenInclude from EF Core ) and the application of Internal DSL as a whole, the necessary persuasiveness.
Introduction
Compare:
var p = new Point(){X=-1,Y=1};
// which has better performance ?var json1 = JsonConvert.SerializeObject(p);
var json2 = $"{{\"X\":{p.X}, \"Y\":{p.Y}}}";
The second method is obviously faster (the nodes are known and “crammed into code”), while the method is of course more complicated. But when you get this code as a function (dynamically generated and compiled) - the complexity is hidden (it hides even what becomes unclear
where the reflection is, and where the runtime code generation).
var p = new Point(){X=-1,Y=1};
// which has better performance ?var json1 = JsonConvert.SerializeObject(p);
var formatter = JsonManager.ComposeFormatter<Point>();
var json2 = formatter(p);
Here JsonManager.ComposeFormatter
is a real tool . The rule on which the structure traversal is generated during serialization is not obvious, but it sounds like "with the default parameters, go around all the first level fields for custom value type". If it is set explicitly:
// обход задан явноvar formatter2 = JsonManager.ComposeFormatter<Point>(
chain=>chain
.Include(e=>e.X)
.Include(e=>e.Y) // DSL Includes
)
This is the metadata description by DSL Includes. The analysis of the pros and cons of metadata descriptions by DSL is enlightened, but now ignoring the metadata entry form, I emphasize that C # provides the ability to compile and compile the “perfect serializer” using Expression Trees.
переход от formatter
к serilizer
(пока без expression trees):
Func<StringBuilder, Point, bool> serializer = ... // laterstringformatter(Point p)
{
var stringBuilder = new StringBuilder();
serializer(stringBuilder, p);
return stringBuilder.ToString();
}
В свою очередь serializer
строится такой (если задавать статическим кодом):
Expression<Func<StringBuilder, Point, bool>> serializerExpression =
SerializeAssociativeArray(sb, p,
(sb1, t1) => SerializeValueProperty(sb1, t1, "X", o => o.X, SerializeValueToString),
(sb4, t4) => SerializeValueProperty(sb1, t1, "Y", o => o.Y, SerializeValueToString)
);
Func<StringBuilder, Point, bool> serializer = serializerExpression.Compile();
Зачем так "функционально", почему нельзя задать сериализацию двух полей через "точку с запятой"? Коротко: потому что вот это выражение можно присвоить переменной типа Expression<Func<StringBuilder, Box, bool>>
, а "точку с запятой" нельзя.
Почему нельзя было прямо написать Func<StringBuilder, Point, bool> serializer = (sb,p)=>SerializeAssociativeArray(sb,p,...
? Можно, но я демонстрирую не создание делегата, а сборку (в данном случае статическим кодом) expression tree, с полседующей компиляцией в делегат, в практическом использовании serializerExpression
будут задаваться уже совсем по другому — динамически (ниже).
Но что важно в самом решении: SerializeAssociativeArray
принимает массив params Func<..> propertySerializers
по числу узлов которые надо обойти. Обход одних из них может быть задан сериалайзерами "листьев" SerializeValueProperty
(принимающим форматер SerializeValueToString
), а других опять SerializeAssociativeArray
(т.е. веток) и таким образом строится итератор (дерево) обхода.
Если бы Point содержал свойство NextPoint:
var @delegate =
SerializeAssociativeArray(sb, p,
(sb1, t1) => SerializeValueProperty(sb1, t1, "X", o => o.X, SerializeValueToString),
(sb4, t4) => SerializeValueProperty(sb1, t1, "Y", o => o.Y, SerializeValueToString),
(sb4, t4) => SerializeValueProperty(sb1, t1, "NextPoint", o => o.NextPoint,
(sb4, t4) =>SerializeAssociativeArray(sb1, p1,
(sb1, t1) => SerializeValueProperty(sb2, t2, "X", o => o.X, SerializeValueToString),
(sb4, t4) => SerializeValueProperty(sb2, t2, "Y", o => o.Y, SerializeValueToString)
)
)
);
Устройство трех функций SerializeAssociativeArray
, SerializeValueProperty
, SerializeValueToString
не сложное:
publicstaticbool SerializeAssociativeArray<T>(StringBuilder stringBuilder, T t, params Func<StringBuilder, T, bool>[] propertySerializers)
{
var @value = false;
stringBuilder.Append('{');
foreach (var propertySerializer in propertySerializers)
{
var notEmpty = propertySerializer(stringBuilder, t);
if (notEmpty)
{
if (!@value)
@value = true;
stringBuilder.Append(',');
}
};
stringBuilder.Length--;
if (@value)
stringBuilder.Append('}');
return @value;
}
publicstaticbool SerializeValueProperty<T, TProp>(StringBuilder stringBuilder, T t, string propertyName,
Func<T, TProp> getter, Func<StringBuilder, TProp, bool> serializer) where TProp : struct
{
stringBuilder.Append('"').Append(propertyName).Append('"').Append(':');
varvalue = getter(t);
var notEmpty = serializer(stringBuilder, value);
if (!notEmpty)
stringBuilder.Length -= (propertyName.Length + 3);
return notEmpty;
}
publicstaticbool SerializeValueToString<T>(StringBuilder stringBuilder, T t) where T : struct
{
stringBuilder.Append(t);
returntrue;
}
Многие детали тут не приведены (поддержка списков, ссылочного типа и nullable). И все же видно, что я действительно получу json на выходе, а все остальное это еще больше типовых функций SerializeArray
, SerializeNullable
, SerializeRef
.
Это было статическое Expression Tree, не динамиеческое, не eval в C#.
Увидеть как Expression Tree строится динамически можно в два шага:
Шаг 1 — decompiler'ом посмотреть на код присвоенный Expression<T>
Это конечно удивит по первому разу. Ничего не понятно но можно заметить как четырьмя первыми строчками скомпоновано что-то вроде:
("sb","t") .. SerializeAssociativeArray..
Тогда связь с исходным кодом улавливается. И должно стать понятно что если освоить такую запись (комбинируя 'Expression.Const', 'Expression.Parameter', 'Expression.Call', 'Expression.Lambda' etc ...) можно действительно компоновать динамически — любой обход узлов (исходя из метаданных). Это и есть eval в С#.
Шаг 2 — сходить по этой ссылке,
Тот же код декомпилера, но составленный человеком.
Втягиваться в это вышивание бисером обязательно только автору интерпретатора. Все эти художества остаются внутри библиотеки сериализации. Важно усвоить идею, что можно предоставлять библиотеки динамически генерирующие скомпилированные эффективные функции в С# (и .NET Standard).
However, the stream serializer will overtake the dynamically generated function if the compilation is called every time before serialization (compilation is inside ComposeFormatter
is a costly operation), but you can save the link and reuse it:
static Func<Point, string> formatter = JsonManager.ComposeFormatter<Point>();
publicstringGet(Point p){
// which has better performance ?var json1 = JsonConvert.SerializeObject(p);
var json2 = formatter(p);
return json2;
}
If, however, you need to build and save a serializer of anonymous types for reuse, then additional infrastructure is needed:
static CachedFormatter cachedFormatter = new CachedFormatter();
publicstringGet(List<Point> list){
// there json formatter will be build only for first call // and assigned to cachedFormatter.Formatter// in all next calls cachedFormatter.Formatter will be used.// since building of formatter is determenistic it is lock free var json3 = list.Select(e=> {X:e.X, Sum:e.X+E.Y})
.ToJson(cachedFormatter, e=>e.Sum);
return json3;
}
After that, we confidently count the first micro-optimization for ourselves and accumulate, accumulate, accumulate ... To whom is the joke, who does not, but before turning to the question that the new serializer knows how new - I fix the obvious advantage - it will be faster.
What in return?
The DSL Includes interpreter in serilize (and in the same way it is possible in iterators equals, copy, clone - and this will also be the case) required the following costs:
1 - costs of the infrastructure for storing references to compiled code.
These costs are generally not necessary, as is the use of Expression Trees with compilation - the interpreter can also create a serializer on reflections and even lick it so much that it approaches speed to streaming serializers (by the way, shown at the end of the article copy, clone and equals and do not gather through expression trees, and they didn’t lick, there is no such task, in contrast to “overtake” ServiceStack and Json.NET in the framework of the universally understood json optimization problem - a necessary condition for presenting a new solution).
2 - you need to keep abstractions leaking in your head as well as a similar problem: changes in semantics compared to existing solutions.
For example, Point and IEnumerable need two different serializers to serialize.
var formatter1 = JsonManager.ComposeFormatter<Point>();
var formatter2 = JsonManager.ComposeEnumerableFormatter<Point>();
// but not// var formatter2 = JsonManager.ComposeEnumerableFormatter<List<Point>>();
Or: "Does closure work / closure?". It works, only the node must be given a name (unique):
string DATEFORMAT= "YYYY";
var formatter3 = JsonManager.ComposeFormatter<Record>(
chain => chain
.Include(i => i.RecordId)
.Include(i => i.CreatedAt.ToString(DATEFORMAT) , "CreatedAt");
);
This behavior is dictated by the internal device specifically interpreter ComposeFormatter
.
The costs of this type are inevitable evil. Moreover, it is found that by increasing the functionality and expanding the scope of Internal DSL, leaks of abstraction are also increasing. The developer of the Internal DSL will of course be oppressed, here you need to stock up on a philosophical mood.
For a user, abstraction leaks are overcome by knowing the technical details of the Internal DSL ( what to expect? ) And the wealth of functionality of a particular DSL and its interpreters ( which in return? ). Therefore, the answer to the question: “is it worth creating and using the Internal DSL?” Can only be a story about the functionality of a particular DSL - about all its small things and amenities, and application possibilities (interpreters), i.e. story about overcoming costs.
Having all this in mind, I return to the effectiveness of a particular DSL Includes.
Much greater efficiency is achieved when the replacement of a triple (DTO, transformation to DTO, serialization of DTO) becomes one at the place of a detailed instructed and generated serialization function. At the end-ends, the dualism of the object-object allows you to state "DTO is such a function" and set a goal: learn how to set the DTO function.
Serialization must be configured:
- Bypass tree (describe the nodes that will be serialized, by the way it solves the problem of circular links), in the case of leaves - assign a formatter (by type).
- Leaf inclusion rule (if not specified) - property vs fields? readonly?
- To be able to set both a branch (a node with navigation) and a sheet not just MemberExpression (
e=>e.Name
), but generally by any function (`e => e.Name.ToUpper ()," MyMemberName ") - set the formatter to a specific node.
Other features that increase flexibility:
- serialize the sheet containing the line json "as is" (special formatter of strings);
- set formatters for groups, i.e. whole branches, in this branch so - in another differently (for example, here dates with time, and in this one without time).
Everywhere such constructions participate: walk tree, branch, leaf, and all this can be written using DSL Includes.
DSL Includes
Since everyone is familiar with EF Core - the meaning of subsequent expressions should be captured immediately (this is a subset of xpath).
// DSL Includes
Include<User> include1 = chain=> chain
.IncludeAll(e => e.Groups)
.IncludeAll(e => e.Roles)
.ThenIncludeAll(e => e.Privileges)
// EF Core syntax// https://docs.microsoft.com/en-us/ef/core/querying/related-datavar users = context.Users
.Include(blog => blog.Groups)
.Include(blog => blog.Roles)
.ThenInclude(blog => blog.Privileges);
Here are the nodes "with navigation" - "branches".
The answer to the question of which nodes "leaves" (fields / properties) are included in the so specified tree is none. To include the leaves, you must either list them explicitly:
Include<User> include2 = chain=> chain
.Include(e => e.UserName) // leaf member
.IncludeAll(e => e.Groups)
.ThenInclude(e => e.GroupName) // leaf member
.IncludeAll(e => e.Roles)
.ThenInclude(e => e.RoleName) // leaf member
.IncludeAll(e => e.Roles)
.ThenIncludeAll(e => e.Privileges)
.ThenInclude(e => e.PrivilegeName) // leaf member
Or add dynamically according to the rule, through a specialized interpreter:
// Func<ChainNode, MemberInfo> rule = ...var include2 = IncludeExtensions.AppendLeafs(include1, rule);
Here the rule is a rule that can be selected by ChainNode. Type ie by the type of expression returned by the node (ChainNode is an internal representation of DSL Includes what else will be said) properties (MemberInfo) for participation in serialization, for example. only property, or only read / write property, or only those for which there is a formatter, can be selected by the list of types, and even the include expression itself can specify a rule (if it lists leaf nodes — that is, the form of the tree association) .
Or ... leave it to the discretion of the user interpreter, who decides what to do with the nodes. DSL Includes is just a metadata entry - how to interpret this entry depends on the interpreter. He can interpret the metadata as he wants, even ignoring it. Some interpreters will perform the action themselves, others build a function ready to perform them (via the Expression Tree, or even Reflection.Emit). A good Internal DSL is designed for universal use and the existence of many interpreters, each of which has its own specifics, its own leaks of abstraction.
Code using Internal DSL can be very different from what it was before.
Out of the box
Integration with EF Core.
The running task is to “cut off circular references”, to allow only what is specified in the include expression to be serialized:
static CachedFormatter cachedFormatter1 = new CachedFormatter();
stringGetJson()
{
using (var dbContext = GetEfCoreContext())
{
string json =
EfCoreExtensions.ToJsonEf<User>(cachedFormatter1, dbContext, chain=>chain
.IncludeAll(e => e.Roles)
.ThenIncludeAll(e => e.Privileges));
}
}
It ToJsonEf
accepts the navigation sequence for the interpreter , uses it for serialization (selects leaves with the "default for EF Core" rule, that is, public read / write property), is interested in the model - where string / json to insert as is, uses field formatters on default (byte [] in string, datetime in ISO, etc.). Therefore, he must perform IQuaryable from under him.
In the case when the result is transformed, the rules change - there is no need to use DSL Includes to set the navigation (if there is no re-use of the rule), a different interpreter is used, and the configuration occurs locally:
static CachedFormatter cachedFormatter1 = new CachedFormatter();
stringGetJson()
{
using (var dbContext = GetEfCoreContext())
{
var json = dbContext.ParentRecords
// back to EF core includes// but .Include(include1) also possible
.IncludeAll(e => e.Roles)
.ThenIncludeAll(e => e.Privileges)
.Select(e => new { FieldA: e.FieldA, FieldJson:"[1,2,3]", Role: e.Roles().First() })
.ToJson(cachedFormatter1,
chain => chain.Include(e => e.Role),
LeafRuleManager.DefaultEfCore,
config: rules => rules
.AddRule<string[]>(GetStringArrayFormatter)
.SubTree(
chain => chain.Include(e => e.FieldJson),
stringAsJsonLiteral: true) // json as is
.SubTree(
chain => chain.Include(e => e.Role),
subRules => subRules
.AddRule<DateTime>(
dateTimeFormat: "YYYMMDD",
floatingPointFormat: "N2"
)
),
),
useToString: false, // no default ToString for unknown leaf type (throw exception)
dateTimeFormat: "YYMMDD",
floatingPointFormat: "N2"
}
}
It is clear that all these details, all this "by default", can be remembered only if it is very necessary and / or if this is your own interpreter. On the other hand, we once again return to the pluses: DTO is not smeared over the code, given by a specific function, interpreters are universal. The code becomes less - it is already good.
You need to be warned : although it would seem in ASP and prior knowledge is always available, and streaming serializer is not a necessary thing in the world of the web, where even databases send data to json, but the use of DSL Includes in ASP MVC is not the most simple . How to combine functional programming with ASP MVC - deserves a separate study.
In this article I will limit myself to the subtleties of DS Includes, I will show both new functionality and the leakage of abstractions, in order to show that the problem of analyzing "costs and acquisitions" is generally exhausted.
More DSL Includes
Include<Point> include = chain => chain.Include(e=>e.X).Include(e=>e.Y);
This is different from EF Core Includes built on static functions that cannot be assigned to variables and passed as parameters. DSL Includes itself was born from the need to pass "include" into my implementation of the Repository template without degrading information about the types that would appear if they were translated into strings in the standard way.
The most fundamental difference is still in the appointment. EF Core Includes — enables navigation properties (branch nodes), DSL Includes — a record of the computation tree traversal, assigning a name (path) to the result of each calculation.
Internal view of EF Core Includes - the list of strings received by MemberExpression.Member (Expression set e=>User.Name
can only be [MemberExpression] ( https://msdn.microsoft.com/en-us/library/system.linq.expressions.memberexpression(v=vs. 110) .aspx and only the line is saved in internal views Name
).
In DSL Includes an internal representation — the ChainNode and ChainMemberNode classes that store an entire expression (eg e=>User.Name
), which can be embedded in the Expression Tree as is. From this it follows that DSL Includes supports both fields and custom value types and function calls:
Execution of functions:
Include<User> include = chain => chain
.Include(i => i.UserName)
.Include(i => i.Email.ToUpper(),"EAddress");
What to do with it depends on the interpreter. CreateFormatter- will display {"UserName": "John", "EAddress": "JOHN@MAIL.COM"}
Execution can also be useful for specifying a traversal of nullable structures.
Include<StrangePointF> include
= chain => chain
.Include(e => e.NextPoint) // NextPoint is nullable struct
.ThenIncluding(e => e.Value.X)
.ThenInclude(e => e.Value.Y);
// but not this way (abstraction leak)// Include<StrangePointF> include// = chain => chain// now this can throw an exception// .Include(e => e.NextPoint.Value) // .ThenIncluding(e => e.X) // .ThenInclude(e => e.Y);
In DSL Includes there is also a short record of the multi-level traversal ThenIncluding.
Include<User> include = chain => chain
.Include(i => i.UserName)
.IncludeAll(i => i.Groups)
// ING-form - doesn't change current node
.ThenIncluding(e => e.GroupName) // leaf
.ThenIncluding(e => e.GroupDescription) // leaf
.ThenInclude(e => e.AdGroup); // leaf
compare with
Include<User> include = chain => chain
.Include(i => i.UserName)
.IncludeAll(i => i.Groups)
.ThenInclude(e => e.GroupName)
.IncludeAll(i => i.Groups)
.ThenInclude(e => e.GroupDescription)
.IncludeAll(i => i.Groups)
.ThenInclude(e => e.AdGroup);
And here, too, there is a leak of abstraction. If I recorded a similar form of navigation, I need to know how the interpetor that will call QuaryableExtensions works. And it translates calls to Include and ThenInclude to Include "string". What may matter (must be borne in mind).
Algebra include expressions .
Include expressions can be:
var b1 = InlcudeExtensions.IsEqualTo(include1, include2);
var b2 = InlcudeExtensions.IsSubTreeOf(include1, include2);
var b3 = InlcudeExtensions.IsSuperTreeOf(include1, include2);
var include2 = InlcudeExtensions.Clone(include1);
var include3 = InlcudeExtensions.Merge(include1, include2);
IReadOnlyCollection<string> paths1 = InlcudeExtensions.ListLeafXPaths(include); // as xpaths
IReadOnlyCollection<string[]> paths2 = InlcudeExtensions.ListLeafKeyPaths(include); // as string[]
etc.
The good news is that there are no leaks of abstractions here, a level of pure abstraction has been reached. There is metadata and work with metadata.
Dialectics
DSL Includes allows you to reach a new level of abstraction, but at the moment of achievement a need is formed to go to the next level: to generate the Include expressions themselves.
In this case, generating DSLs as fluent chains is not necessary, you just need to create structures for the internal representation.
var root = new ChainNode(typeof(Point));
var child = new ChainPropertyNode(
typeof(int),
expression: typeof(Point).CreatePropertyLambda("X"),
memberName:"X", isEnumerable:false, parent:root
);
root.Children.Add("X", child);
// or there is number of extension methods e.g.: var child = root.AddChild("X");
Include<Point> include = ChainNodeExtensions.ComposeInclude<Point>(root);
The interpreters can also be passed to the interpreters. Why, then, does the fluent DSL record include at all? This is a purely speculative question, the answer to which is: because in practice, to develop an internal representation (and it also develops) is obtained only with the development of DSL (that is, a brief expressive recording convenient for a static code). Once again this will be discussed closer to the conclusion.
Copy, Clone, Equals
All of the above is also true for interpreters of include expressions that implement iterators copy , clone , equals .
Сравнение только по листьям из Include-выражения.
Скрытая семантическая проблема: оценивать или нет порядок в списке
Include<User> include = chain=>chain.Include(e=>e.UserId).IncludeAll(e=>e.Groups).ThenInclude(e=>e.GroupId)
bool b1 = ObjectExtensions.Equals(user1, user2, include);
bool b2 = ObjectExtensions.EqualsAll(userList1, userList2, include);
Проход по узлам выражения. Копируются свойства подходящие под правило.
Include<User> include = chain=>chain.Include(e=>e.UserId).IncludeAll(e=>e.Groups).ThenInclude(e=>e.GroupId)
var newUser = ObjectExtensions.Clone(user1, include, leafRule1);
var newUserList = ObjectExtensions.CloneAll(userList1, leafRule1);
Может существовать интрепретатор который будет отбирать leaf из includes. Почему сделано — через отдельное правило? Что было схоже с семантикой ObjectExtensions.Copy
Проход по узлам-ветка выражения и идентификация по узлам-листьям. Копируются свойства подходящие под правило (схоже с Clone).
Include<User> include = chain=>chain.IncludeAll(e=>e.Groups);
ObjectExtensions.Copy(user1, user2, include, supportedLeafsRule);
ObjectExtensions.CopyAll(userList1, userList2, include, supportedLeafsRule);
Может существовать интерпретатор который будет отбирать leaf из includes. Почему сделано — через отдельное правило? Что было схоже с объявление ObjectExtensions.Copy (там разделение вынуждено — в include то как идентифицируем, в supportedLeafsRule — то что копируем).
For copy / clone you should keep in mind:
- The inability to copy readonly properties, and these are the popular types Tuple <,> and Anonymous Type. A similar problem with cloning, but somewhat from a different angle.
- An abstract type (for example, IEnumerable is implemented by a private type) - which public type to replace it with.
- All expressions from include-expressions that do not express properties and fields will be discarded.
- "copying to an array" is not clear what it is.
The DSL author must rely on the fact that such uncertain situations arising from the conflict of semantics and the way metadata is written the user can foresee, i.e. will assume that they will lead to an undefined result and will not count on existing interpreters. By the way, the serialization of anonymous properties and Tuple<,>
, i.e. c types are readonly properties, or copying ValueTuple<,>
c writabale fields are not uncertain situations (and implemented as you would expect).
The good news here is that it’s quite simple to write your own interpreter (without pretending to compile Expression Trees). Includes expressions. All algebra work with Include DSL has already been implemented.
It is possible to create interpreters Detach, FindDifferences, etc.
Why run-time, not .cs generated before compilation starts?
Having the ability to generate .cs is better than not being able to, but run-time has its advantages:
- Avoid costly fuss with generated source codes (settings of directories, file names, source control).
- We avoid binding to the programming environment, plugins, interceptions of events, scripting languages - everything that raises the threshold of entry.
- The guaranteed absence of changes in the generated code makes it easier to update.
- Avoid the problem of "eggs and chicken." The dev time code generation requires scheduling, otherwise you can get into the situation: “A” cannot be compiled because “B” is not yet generated, and “B” cannot be generated, because “A” is not yet compiled.
The latter is solved by Roslyn, but this solution brings limitations and increases the threshold of entry. However, if Typescript bindigs are needed (I wrote DTO with a function, i.e. now this is a problem) - you need to pull out the DSL Includes Roslyn expressions (the hard part) - and write their interpreter in typescript (the simple part). Then "for the company" can be written and "perfect serializer" in .cs (and not in Expression Trees).
To summarize: the same kodogeneration run time - almost pure kodogeneratsiya, a minimum of infrastructure. You just need to remember that you should avoid repeated re-creation of functions that can be reused (and agree to the exponential record of Expression Trees).
Problems with the efficiency of the compiled functions of Expression Trees
When programming the Internal DSL using the Expression Tree, you should keep in mind that:
LambdaExpression.Compile
compiles only the top lambda . At the same time, the expression remains working, but slow. It is necessary to compile every lambda in the course of "gluing" the expression tree, passing in CallExpression as parameters - not LambdaExpression, but a delegate (that is, a compiled LambdaExpression) wrapped in a ConstantExpression constant. This greatly increases the code, since “pass on the lambda / function” is the most popular operation in functional programming that Expression Trees require.Compilation takes place in a dynamically generated anonymous assembly , and the method call passes (10 nanoseconds in my tests) a security check (my assembly is not signed, maybe when signed it will be longer). It is, of course, not a lot, but if the code is strongly split, it can accumulate.
You can try to formulate an optimization strategy designed to take these and other aspects of code generation into account (in an anonymous assembly), which I cannot yet, because I do not have an exhaustive understanding of all the details. But there is a practical way out: I stopped at benchmarks sufficient for me. And by the way - yes - the generation in .cs would remove all the listed problems.
Serialization benchmark
Data - An object containing an array of 600 entries in 15 fields of simple types. Streaming JSON.NET, ServiceStack need two calls to reflection'a GetProperties ().
dslComposeFormatter - ComposeFormatter in the first place, the rest of the details here .
BenchmarkDotNet = v0.10.14, OS = Windows 10.0.17134
Intel Core i5-2500K CPU 3.30GHz (Sandy Bridge), 1 CPU, 4 logical and 4 physical cores.
NET Core SDK = 2.1.300
Method | Mean | Error | Stddev | Min | Max | Median | Allocated |
---|---|---|---|---|---|---|---|
dslComposeFormatter | 2.208 ms | 0.0093 ms | 0.0078 ms | 2.193 ms | 2.220 ms | 2.211 ms | 849.47 KB |
JsonNet_Default | 2.902 ms | 0.0160 ms | 0.0150 ms | 2.883 ms | 2.934 ms | 2.899 ms | 658.63 KB |
JsonNet_NullIgnore | 2.944 ms | 0.0089 ms | 0.0079 ms | 2.932 ms | 2.960 ms | 2.942 ms | 564.97 KB |
JsonNet_DateFormatFF | 3.480 ms | 0.0121 ms | 0.0113 ms | 3.458 ms | 3.497 ms | 3.479 ms | 757.41 KB |
JsonNet_DateFormatSS | 3.880 ms | 0.0139 ms | 0.0130 ms | 3.854 ms | 3.899 ms | 3.877 ms | 785.53 KB |
ServiceStack_SerializeToString | 4.225 ms | 0.0120 ms | 0.0106 ms | 4.201 ms | 4.243 ms | 4.226 ms | 805.13 KB |
fake_expressionManuallyConstruted | 54.396 ms | 0.1758 ms | 0.1644 ms | 54.104 ms | 54.629 ms | 54.383 ms | 7401.58 KB |
fake_expressionManuallyConstruted - expression where only the top lambda is compiled (error price).
Formalization
Code generation and DSL are related as follows: to create an effective DSL, code generation is required in the runtime language; To create an efficient internal DSL, run-time code generation is required .
We use Expression Tree because it is the only way to have code generation being in the .NET Standard framework.
On the other hand, using Expression Trees to parse expressions is not an attribute distinguishing Internal DSL from the entire Fluent API class. Such a sign is the use of the C # grammar to express relationships in the problem area.
Building presentation structures can be done by simply executing fluent code expressions (indiscriminately by Expression Trees), while the most characteristic of Internal DSL in C # is to combine the execution of fluent chains, each of which has a little bit parsing by Expression Trees.
Expression Trees inside DSL Includes play a very small role (get node names if they are not specified manually), and vice versa for creating an effective interpreter / serializer of a compiled run-time — crucial (run-time compilation).
Creating an Internal DSL is more important for the subsequent creative process: the created library function iterators of the properties serialize , copy , clone , equals are derivatives with respect to the method found to record the iteration process and effectively simplify the recording of the “traversal” . When one may think that it is worthwhile to limit himself to inventing only “representation structures,” the creative process does not proceed this way. A convenient symbolic entry is necessary: includes algebra is much more expressive (and therefore helps thinking) than the same operations recorded with structures (although direct use of structures can also be justified, since it is effective).
Conclusion
With the help of DSL Includes, it became possible to write DTO and finally what it is in a significant number of cases is the serialization function (in json). It was possible to reach a new level of abstraction without losing, but gaining in performance, both in computation speed and "less code", but still due to an increase in applied complexity. Growth abstraction = growth abstraction leaks.
The DSL developer’s response to this problem is to draw the user's attention to the semantics of operations implemented by DSL interpreters, to the need for knowledge of the structures of the Internal DSL view (in what form Expression is stored) and the importance of knowing the interpreter's internal structure (use or not use the Expression Tree compilation ).
Both DSL Includes and json serializer ComposeFormatter are in the DashboardCodes.Routines library accessible via nuget and GitHub.