Development of applications based on DSL and code generation

What are we talking about

In this post I want to speculate abstractly on application development. At first I decided to write just about code generation, but as I pondered the topic, I had a lot of thoughts that I also want to share. Therefore, it turned out a little wider than just about DSL.

What is DSL (Domain Specific Language) and code generation

DSL is a domain-specific language. Those. it is a language that operates directly on the concepts of a given area. Usually contrasted with general-purpose languages. In principle, nothing prevents a language from being just a formal syntax that can not be interpreted by a computer, but the benefits of such a language are not very many. Computer language usually involves processing in some way, so it would be nice to have some kind of interpreter for DSL. Accordingly, there are two standard approaches - interpretation and compilation. With interpretation it is more or less clear, but with compilation the following story. Of course, you can translate it immediately into processor instructions or, at worst, into assembler, but why, if you can "write" normal code, in the sense of compiling a high-level language into text, which is then converted by your compiler into something, not run by computer. Therefore, they often say “code generation” rather than compilation, although the latter term is also correct and is used.

Labor productivity

If we take application development, then I consider the main problem low performance, i.e. “Product quantity” for the effort expended. In principle, a similar problem is found in all industries, and there are both general and specific methods of solution. We have a lot of different things to raise this productivity - high-level languages, powerful IDEs, continious integration tools, scrum, canban, coffee points, coffee ladies and much more. Nevertheless, product development takes a lot of time. This is especially noticeable when what needs to be done can be easily described in words in a few minutes, and done takes weeks. A significant gap between “what” and “how”. “What to do” is simple and clear, “how to do” is simple, clear, but for a long time. I want to do the “how” - quickly, but ideally not at all. In short

Levels of abstraction

There is a very useful concept - the level of abstraction. It helps structure applications. Let's say we have an application for a certain subject area. On the one hand (above) there are concepts from this subject area that will somehow appear in the application, on the other hand there is a general-purpose programming language (below), in which there are bytes, types, methods and the like elements that have nothing common with the subject area (we will not go down to the operating system, electrical impulses, transistors, molecules, atoms, protons, quarks ...). The programmer's job is precisely to link these two layers or fill the area in the picture (left picture). If the application is large and the domain area is “far away” enough, then various intermediate levels of abstraction arise in the application,

Levels, of course, arise, but they arise logically. And you need to make some effort to ensure that the code also supports levels. This is especially difficult if the language is the same and everything is running in the same process. After all, nothing prevents you from invoking a method from level 1 to level 3. Yes, and functions or classes are usually not marked by the level of abstraction. What does the DSL with the codogen offer us about this? We still need to fill in the same area. Accordingly, the upper part is filled with our language, and the lower one with the generated code:

Unlike the previous example, the level here is impenetrable, i.e. DSL instructions cannot be called from the generated code (especially if they are not there). We will not consider cases when the generator makes the code on the same DSL ... Another important point here is that the generated code can be considered as compiled, in the sense that it is created automatically and there is no need to look into it. Provided that the generator is already written (and well tested). Those. writing a language and a generator for it can significantly narrow the scope of the application. This is especially valuable when developing multiple applications in this area or when constantly changing one.

Complication Management

Let's imagine a situation that, it seems to me, is quite common. Suppose you get an order to develop some system. The ideal specification is brought to you and you come up with the ideal system architecture where everything is fine, components, interfaces. encapsulation and many other equally beautiful patterns. Take a concrete example - an online bike store. You wrote according to the specifications of the online store and everyone is happy. The store is thriving and is thinking about expanding its business, namely, to start even selling scooters and motorcycles. And so they come to you and ask you to modify the store. You had beautiful architecture, sharpened by bicycles, but now you have to drag. On the one hand, scooters and motorcycles are similar to bicycles, and both have spare parts, accessories, related products, but there are also differences.
The system as a whole remains the same, but part of the functions must support new types of objects, or separate functions for new types of objects should appear.
The domain domain has become more complex, i.e. instead of just bicycles, now you need to support bicycles, scooters and motorcycles. Our system must also be complicated. I think that in general the complexity of a software system corresponds to the complexity of a simulated system. In this case, there is the lowest possible level of complexity at which it is still possible to solve the problem. (There is no top level - you can come up with an infinitely complex solution for any problem). I believe that we should strive for a minimum level of complexity, since of all possible solutions, the simplest is the best. In short, the code should be simple.
Back to our online store. Let there be some function that is written for a bicycle. Now it should work for new types.

public void process(Bicycle b) {

for this there must be specificForMotobike code inside. What are the solution options?

Copy / paste

public void process(Motobike b) {
We copied the method, replaced the type-specific code, and that’s it. Simple, but there is a problem. If you need to change genericCode, then you need to change the same thing in several places, and this time, errors ...

If / else

public void process(Object b) {
     if(b instanceof Bicycle) {
     } else if(b instanceof Motobike) {

Set the conditions and you're done. A little better than copy / paste, but again there is a problem. And tomorrow they will want to sell ATVs and they will have to look for such pieces throughout the code and add another else.

Abstract method

abstract void specific()
public void process(Vehicle b) {

At this point, the abstract method that is implemented for each type is called. In principle, this may turn out to be an acceptable option, or it can significantly complicate the system. Multi-storey inheritance hierarchies with a bunch of overridden methods, when it is not easy to figure out which particular method is called, is a common situation.

DSL and code generation

DSL is designed in such a way that all type features can be described. Templates are written in the code generator that apply to the type description and the code is obtained as in copy / paste.
public void process("TYPE" b) {

DSL: Next, for each type from DSL, the template is transformed into specific code. From my experience it is difficult to immediately write a language that would support new entities without changes, but changes to the language and generator are usually small and simple. In general, the approach is the following - a lot of simple code is generated that is easy to read and understand, and it does not matter that there are a lot of files and they can be several thousand lines long. After all, this is not to write with your hands.

type Bicycle:
     property A, ( description, value, links ...)
type Motobile:
     property B,
     property C,

DSL at the beginning or formalized specification

Here I come to the most important thing. (there was an introduction before that :) What is the usual process of starting a project? Specifications are written, diagrams are drawn, architecture, stages of the project are worked out. And when it's all done, they start writing code. Specifications are free-form documents. Why don't specifications be formalized? My main idea is to first develop a system description language in terms of the domain domain. This will be partly a description of the architecture, and partly a formalized specification. In this case, the customer will understand the language, since he directly operates with the terms of the subject area, and he will also be able to take part in the development of the system. The idea, of course, is not mine. In the literature, this approach is called Domain-Driven Design (DDD). I only claim that the DDD approach works well with DSL and code generation.
Formalization means the possibility of automatic processing. You can add various checks for consistency, consistency. On the other hand, system developers have a ready formalized declaration of what should be. It remains to write the converter in as a working system, those same code generators.

Not everything is so smooth

Of course, not everything is so simple and smooth. Like any other approach, there are problems and disadvantages.
  • It is not always clear what to generate. One must imagine the final system. After all, not all code is generated and you need to understand what will be generated, and what is written by hand, and how it will all work together. Sometimes it’s easier to first write everything manually (keeping in mind the future generation), and then pull out part of the code into templates and generators.
  • The second problem is the balance of generated and manual code. It makes no sense to put code into a template that is actually not parameterized and always the same. It is bad practice to use the approaches from the examples above simultaneously.
  • Dependencies between manual and generated code. No need to make manual code break when changing DSL. (text on DSL)
  • "Damage" to the brain by code generation. Writing code generators is somewhat different from writing regular programs. Using the “wrong” style leads to writing “not very” code. Saves the review and "healthy" colleagues.
  • Another point that I encountered is difficult to convince the customer of the correct approach. Like, they used to do it somehow, and then we will live normally, and you are here with your ideas. And anyway, where is the scooter support you were supposed to do yesterday? Go to work.
  • Have you seen the DSL developer jobs? But here, probably, it’s just like getting a Haskell programmer. Set up as a Java programmer (C ++, Perl, Python, etc). Make Haskell DSL cool. And now you are a DSL developer.

Tools for developing DSL and writing code generators

All that I wrote before that would have little practical meaning without normal development tools. Fortunately, there are such remedies. Means are different, but my choice is Eclipse Xtext. The most important thing that xtext has is integration into the Eclipse IDE, namely there are all the standard properties - syntax highlighting, errors and warnings, content assist, quick fix. This is what is called "out of the box." And then what fantasy is enough. I think I will do a few more practical posts on the topic, if there is interest.


I think I did not discover America. Much of what I wrote is commonplace. But on the other hand, I think the topic of DSL and code generation is not sufficiently disclosed, so I decided to try my hand at enlightenment. And they haven’t heard much about Eclipse Xtext, much less use it.

Also popular now: