New Mash programming language

For several years, I tried my hand at designing my own programming language. I wanted to create in my opinion the most simple, full-featured and convenient language.

In this article I want to highlight the main stages of my work and first describe the created concept of the language and its first implementation that I am working on now.

I will say in advance that I wrote the whole project on Free Pascal, because programs on it can be assembled under a huge number of platforms, and the compiler itself produces very optimized binaries (I collect all the components of a project with the O2 flag).

Language runtime


The first thing is to talk about the virtual machine, which I had to write to perform future applications in my language. I decided to implement the stack architecture, perhaps because it was the easiest way. I did not find a single normal article on how to do this in Russian, so after becoming familiar with English-language material I sat down to design and write my bicycle. Next, I will bring my "advanced" ideas and developments in this matter.

Stack implementation


Obviously, the VM is headed by the stack. In my implementation, it works in blocks. In essence, this is a simple array of pointers and a variable to store the index of the top of the stack.
When it is initialized, an array of 256 elements is created. If more pointers are thrown onto the stack, then its size increases by the next 256 items. Accordingly, when removing items from the stack - its size is adjustable.

The VM uses several stacks:

  1. Main stack
  2. Stack to store return points.
  3. Stack garbage collector.
  4. A stack of a try / catch / finally block handler.

Constants and variables


It's all simple. Constants are processed by a separate small piece of code and are available in applications in the future at static addresses. Variables are an array of pointers of a certain size, access to its cells is done by index - i.e. static address. Variables can be placed at the top of the stack or read it from there. Actually, because Since our variables essentially store pointers to values ​​in the VM memory, then work with implicit pointers prevails in the language.

Garbage collector


In my VM it is semi-automatic. Those. the developer himself decides when to call the garbage collector. It does not work on the usual pointer count, as in the same Python, Perl, Ruby, Lua, etc. It is implemented through a system of markers. Those. when it is assumed that a variable is assigned a temporary value, a pointer to this value is added to the stack of the garbage collector. In the future, the collector quickly goes over the already prepared list of pointers.

Handling try / catch / finally blocks


As with any modern language, exception handling is an important component. The VM kernel is wrapped in a try..catch block, which can return to code execution, after catching an exception, by putting some information about it on the stack. In the application code, you can set try / catch / finally blocks of code, indicating entry points to catch (exception handler) and finally / end (end of block).

Multithreading


It is maintained at the VM level. It is easy and convenient to use. It works without an interrupt system, so the code should be executed in several threads several times faster, respectively.

External Libraries for VMs


Without this can not do. VM supports import, just as it is implemented in other languages. You can write part of the code in Mash and part of the code in native languages, then linking them together.

Translator from high-level language Mash to Baytkod for VM


Intermediate language


To quickly write a translator from a complex language to a code for a VM, I first developed an intermediate language. It turned out assembler-like scary sight, which does not have much to consider here. Let me just say that at this level the translator processes the majority of constants, variables, calculates their static addresses and addresses of entry points.

Translator Architecture


I chose not the best architecture for implementation. The translator does not build a code tree, as befits other translators. He looks at the beginning of the design. Those. if the piece of code being parsed has the form “while <condition>:”, then it is obvious that this is a while construct of the loop and should be processed as a while construct of the loop. Something like a complex switch-case.

Thanks to this architectural solution, the translator was not very fast. However, the simplicity of its refinement has increased significantly. I added the necessary constructions faster than my coffee could cool. Full support for the PLO was implemented at all in less than a week.

Code optimization


Here, of course, it was possible to realize better (and it will be realized, but later, how the hands will reach). So far, the optimizer only knows how to cut off unused code, constants, and imports from an assembly. Also, several constants with the same value are replaced by one. That's all.

Language mash


Basic language concept


The main idea was to develop the most functional and simple language. I believe that the development of the task copes with a bang.

Blocks of code, procedures and functions


All constructions in the language are opened with a colon : and closed with the operator end .

Procedures and functions are declared as proc and func, respectively. Arguments are listed in parentheses. Just like most other languages.

The return statement can return a value from a function, the break statement allows you to exit a procedure / function (if it is out of cycles).

Code example:

...
func summ(a, b):
  return a + b
end
proc main():
  println(summ(inputln(), inputln()))
end

Supported designs


  • Loops: for..end, while..end, until..end
  • Conditions: if .. [else ..] end, switch .. [case..end ..] [else ..] end
  • Methods: proc <name> (): ... end, func <name> (): ... end
  • Label & goto: <name> :, jump <name>
  • Enum enums and constant arrays.

Variables


The translator can determine them automatically, or if the developer writes var before defining them.

Code examples:

a ?= 10
b ?= a + 20

var a = 10, b = a + 20

Global and local variables are supported.

OOP


Well, we got close to the most delicious topic. Mash supports all object-oriented programming paradigms. Those. classes, inheritance, polymorphism (including dynamic), dynamic automatic reflection and introspection (complete).

Without further ado, I'd rather just give code examples.

Simple class and work with it:

uses <bf>
uses <crt>
classMyClass:
  var a, b
  proc Create, Free
  func Summ
end
proc MyClass::Create(a, b):
  $a = new(a)
  $b = new(b)
end
proc MyClass::Free():
  Free($a, $b)
  $rem()
end
func MyClass::Summ():
  return $a + $b
end
proc main():
  x ?= new MyClass(10, 20)
  println(x->Summ())
  x->Free()
end

Displays: 30.

Inheritance and polymorphism:

uses <bf>
uses <crt>
classMyClass:
  var a, b
  proc Create, Free
  func Summ
end
proc MyClass::Create(a, b):
  $a = new(a)
  $b = new(b)
end
proc MyClass::Free():
  Free($a, $b)
  $rem()
end
func MyClass::Summ():
  return $a + $b
endclassMyNewClass(MyClass):
  func Summ
end
func MyNewClass::Summ():
  return ($a + $b) * 2end
proc main():
  x ?= new MyNewClass(10, 20)
  println(x->Summ())
  x->Free()
end

Displays: 60.

What about dynamic polymorphism? Yes, it's a reflection !:

uses <bf>
uses <crt>
classMyClass:
  var a, b
  proc Create, Free
  func Summ
end
proc MyClass::Create(a, b):
  $a = new(a)
  $b = new(b)
end
proc MyClass::Free():
  Free($a, $b)
  $rem()
end
func MyClass::Summ():
  return $a + $b
endclassMyNewClass(MyClass):
  func Summ
end
func MyNewClass::Summ():
  return ($a + $b) * 2end
proc main():
  x ?= new MyClass(10, 20)
  x->Summ ?= MyNewClass::Summ
  println(x->Summ())
  x->Free()
end

Displays: 60.

Now let's take a moment of introspection for simple values ​​and classes:

uses <bf>
uses <crt>
classMyClass:
  var a, b
end
proc main():
  x ?= new MyClass
  println(BoolToStr(x->type == MyClass))
  x->rem()
  println(BoolToStr(typeof(3.14) == typeReal))
end

Displays: true, true.

About assignment operators and explicit pointers


The? = Operator is used to assign a variable to a pointer to a value in memory.
Operator = changes the value in memory by a pointer from a variable.
And now a little about explicit pointers. I added them to the language so they were.
@ <variable> - take an explicit pointer to the variable.
? <variable> - get variable by pointer.
@ = - assign a value to a variable using an explicit pointer to it.

Code example:

uses <bf>
uses <crt>
proc main():
  var a = 10, b
  b ?= @a
  PrintLn(b)
  b ?= ?b
  PrintLn(b)
  b++
  PrintLn(a)
  InputLn()
end

Displays: some number, 10, 11.

Try .. [catch ..] [finally ..] end


Code example:

uses <bf>
uses <crt>
proc main():
  println("Start")
  try:
    println("Trying to do something...")
    a ?= 10 / 0catch:
    println(getError())
  finally:
    println("Finally")
  end
  println("End")
  inputln()
end

Future plans


I look at everything and look at GraalVM & Truffle. My runtime does not have a JIT compiler, so in terms of performance, for the time being, it can only compete with python. I hope that I will be able to implement a JIT compilation based on GraalVM or LLVM.

Repository


You can play with the developments and follow the project yourself.

Site
Repository on GitHub

Thank you for reading to the end, if you did it.

Also popular now: