My compiler for Lisp
- Transfer
I am very pleased to announce the completion of my first compiler for a programming language! Malcc is an Lisp incremental AOT compiler written in C. I
will briefly talk about its many years of development and what I learned in the process. Alternative article title: "How to write a compiler in ten years or less."
(In the end there is TL; DR , if you don't care about the background).
For almost ten years, I dreamed of writing a compiler. I have always been fascinated by the work of programming languages, especially compilers. Although I imagined the compiler as dark magic and understood that it was impossible for a mere mortal like me to make it from scratch.
But I still tried and studied along the way!
In 2011, I started work on a simple interpreter for the fictional language Airball (airball can be translated as “muff"). By name you can evaluate the degree of my uncertainty that it will work. It was a fairly simple Ruby program that parsed code and walked through an abstract syntax tree (AST). When the interpreter still worked, I renamed it to Lydia and rewrote it to C to make it faster.
I remember Lydia's syntax seemed very smart to me! I still enjoy its simplicity.
Although Lydia was far from a perfect compiler, it inspired me to continue experimenting. However, I was still tormented by questions, how to make the compiler work: what to compile into? do i need to learn assembler?
As a next step, in 2014, I started work on scheme-vm , a virtual machine for Scheme written in Ruby. I thought that a virtual machine with its own stack and bytecode would be a transitional stage from an interpreter with AST passes and a full-fledged compiler. And since Scheme is formally defined , there is no need to invent anything.
I have been messing around with scheme-vm for over three years and have learned a lot about compilation. In the end, I realized that I could not finish this project. The code turned into real chaos, but there was no end in sight. Without a mentor or experience, I seemed to wander in the dark. As it turned out, the language specification is not the same as the manual for it. Lesson learned!
By the end of 2017, I put off scheme-vm in search of something better.
Sometime in 2018, I came across Mal , a Clojure-style Lisp interpreter.
Mal was invented by Joel Martin as a training tool. Since then, more than 75 implementations in different languages have been developed! When I looked at these implementations, I realized that they help a lot: if I'm stuck, then I can go look for tips in the Ruby or Python version. Finally, at least someone speaks my language!
I also thought that if I could write an interpreter for Mal, I could repeat the same steps - and make a compiler for Mal.
First, I started developing the interpreter according to the walkthrough . At that time, I was actively studying Rust (I'll leave it for another article), so I wrote my own implementation of Mal in Rust: mal-rust . For more information about this experiment, see. Here .
It was a perfect pleasure! I don’t know how to thank or praise Joel for creating an excellent guide to Mal. Each step is described in detail , there are flowcharts, pseudo-code and tests ! All that a developer needs to create a programming language from start to finish.
Towards the end of the tutorial, I managed to run my Mal implementation for Mal, written in Mal, on top of my Rust implementation. (two levels of depth, wow). When she worked for the first time, I jumped on a chair with excitement!
As soon as I proved the viability of mal-rust, I immediately began to research how to write a compiler. Compile to assembler? Can I compile machine code directly?
I saw the x86 assembler written in Ruby. He intrigued me, but the thought of working with assembler made me stop.
At one point, I stumbled upon this comment on Hacker News , which referred to the Tiny C Compiler as a “compilation backend”. It seemed like a great idea!
TinyCC has a test file showing how to use libtcc to compile C code from C program. This is the starting point for “hello world”.
Returning again to the walkthrough of Mal, recalling my knowledge of C, in a couple of months of free evenings and weekends, I was able to write the Mal compiler. It was a real pleasure. If you are used to developing through testing, then evaluate the availability of a preliminary set of tests. Tests lead to a working implementation. I can’t say much about this process, unless I repeat: the Mal manual is a real treasure. At every step, I knew exactly what to do!
Looking back, here are some difficulties when writing the Mal compiler, where I had to tinker:
I like that it took almost a decade for the compiler. No really. Each step on the path is a pleasant memory of how I gradually became an ever better programmer.
But this does not mean that I "finished". There are still hundreds of methods and tools that you need to learn to feel like a true compiler author. But I can confidently say: "I did it."
Here is the whole process in a concise form, how to make your own Lisp compiler:
I believe that this method can be used with any programming language that compiles into an executable file. For example, you can:
Ideally, it's better to control the Go compiler as a library, but this is also a way to make a compiler!
With the help of Mal's guide and your ingenuity, you can do all this. If even I could, then you can!
will briefly talk about its many years of development and what I learned in the process. Alternative article title: "How to write a compiler in ten years or less."
(In the end there is TL; DR , if you don't care about the background).
Compiler demo
tim ~/pp/malcc master 0 → ./malcc Mal [malcc]
user> (println "hello world")
hello world
nil
user> (+ 1 2)
3
user> (def! fib2 (fn* (n) (let* (f (fn* (n1 n2 c) (if (= c n) n2 (f n2 (+ n1 n2) (+ c 1))))) (f 0 1 1))))
user> (fib2 25)
75025
user> ^D%
tim ~/pp/malcc master 0 → ./malcc examples/hello.mal hello world
tim ~/pp/malcc master 0 → ./malcc --compile examples/hello.mal hello gcc -g -I ./tinycc -I . -o hello hello.c ./reader.c ./printer.c ./hashmap.c ./types.c ./util.c ./env.c ./core.c ./tinycc/libtcc.a -ledit -lgc -lpcre
-ldl
tim ~/pp/malcc master 0 → ./hello hello world
tim ~/pp/malcc master 0 →
Successful failures
For almost ten years, I dreamed of writing a compiler. I have always been fascinated by the work of programming languages, especially compilers. Although I imagined the compiler as dark magic and understood that it was impossible for a mere mortal like me to make it from scratch.
But I still tried and studied along the way!
First, the interpreter
In 2011, I started work on a simple interpreter for the fictional language Airball (airball can be translated as “muff"). By name you can evaluate the degree of my uncertainty that it will work. It was a fairly simple Ruby program that parsed code and walked through an abstract syntax tree (AST). When the interpreter still worked, I renamed it to Lydia and rewrote it to C to make it faster.
I remember Lydia's syntax seemed very smart to me! I still enjoy its simplicity.
Although Lydia was far from a perfect compiler, it inspired me to continue experimenting. However, I was still tormented by questions, how to make the compiler work: what to compile into? do i need to learn assembler?
Secondly, the bytecode compiler and interpreter
As a next step, in 2014, I started work on scheme-vm , a virtual machine for Scheme written in Ruby. I thought that a virtual machine with its own stack and bytecode would be a transitional stage from an interpreter with AST passes and a full-fledged compiler. And since Scheme is formally defined , there is no need to invent anything.
I have been messing around with scheme-vm for over three years and have learned a lot about compilation. In the end, I realized that I could not finish this project. The code turned into real chaos, but there was no end in sight. Without a mentor or experience, I seemed to wander in the dark. As it turned out, the language specification is not the same as the manual for it. Lesson learned!
By the end of 2017, I put off scheme-vm in search of something better.
Meeting with Mal
Sometime in 2018, I came across Mal , a Clojure-style Lisp interpreter.
Mal was invented by Joel Martin as a training tool. Since then, more than 75 implementations in different languages have been developed! When I looked at these implementations, I realized that they help a lot: if I'm stuck, then I can go look for tips in the Ruby or Python version. Finally, at least someone speaks my language!
I also thought that if I could write an interpreter for Mal, I could repeat the same steps - and make a compiler for Mal.
Mal interpreter on Rust
First, I started developing the interpreter according to the walkthrough . At that time, I was actively studying Rust (I'll leave it for another article), so I wrote my own implementation of Mal in Rust: mal-rust . For more information about this experiment, see. Here .
It was a perfect pleasure! I don’t know how to thank or praise Joel for creating an excellent guide to Mal. Each step is described in detail , there are flowcharts, pseudo-code and tests ! All that a developer needs to create a programming language from start to finish.
Towards the end of the tutorial, I managed to run my Mal implementation for Mal, written in Mal, on top of my Rust implementation. (two levels of depth, wow). When she worked for the first time, I jumped on a chair with excitement!
Compiler Mal C
As soon as I proved the viability of mal-rust, I immediately began to research how to write a compiler. Compile to assembler? Can I compile machine code directly?
I saw the x86 assembler written in Ruby. He intrigued me, but the thought of working with assembler made me stop.
At one point, I stumbled upon this comment on Hacker News , which referred to the Tiny C Compiler as a “compilation backend”. It seemed like a great idea!
TinyCC has a test file showing how to use libtcc to compile C code from C program. This is the starting point for “hello world”.
Returning again to the walkthrough of Mal, recalling my knowledge of C, in a couple of months of free evenings and weekends, I was able to write the Mal compiler. It was a real pleasure. If you are used to developing through testing, then evaluate the availability of a preliminary set of tests. Tests lead to a working implementation. I can’t say much about this process, unless I repeat: the Mal manual is a real treasure. At every step, I knew exactly what to do!
Difficulties
Looking back, here are some difficulties when writing the Mal compiler, where I had to tinker:
- Macros must compile on the fly and be ready to be executed at compile time. This is a little perplexing.
- It is necessary to provide an “environment” (a tree of hashes / associative arrays / dictionaries with variables and their values) both for the compiler code and for the final code of the compiled program. This allows you to define macros at compile time.
- Since the environment is available at compile time, initially Malcc caught undefined errors during compilation (access to a variable that was not defined), and in a couple of places this violated the expectations of the test suite. In the end, to pass the tests, I turned off this feature. It would be great to add it back as an additional flag of the compiler, since this way you can catch a lot of errors in advance.
- I compiled C code by writing in three lines of the structure:
top
: top level code - here are the functionsdecl
: declaration and initialization of variables used in the bodybody
: body where the main work is done
- All day I wondered if I could write my own garbage collector, but I decided to leave this exercise for later. The Boehm-Demers-Weiser garbage collection library is easy to connect and available on many platforms.
- It is important to look at the code that your compiler writes. Whenever the compiler encountered an environment variable
DEBUG
, it would produce compiled C code, where errors could be viewed.
What would I do otherwise
- Writing C code and trying to keep the indentation was not easy, then I would not refuse automation. It seems to me that some compilers write ugly code, and then a special library "decorates" it before issuing it. It needs to be studied!
- Adding to lines during code generation is a bit messy. You could consider creating an AST and then converting it to the last line of C code. This should bring the code in order and give harmony.
Now advice
I like that it took almost a decade for the compiler. No really. Each step on the path is a pleasant memory of how I gradually became an ever better programmer.
But this does not mean that I "finished". There are still hundreds of methods and tools that you need to learn to feel like a true compiler author. But I can confidently say: "I did it."
Here is the whole process in a concise form, how to make your own Lisp compiler:
- Choose the language in which you feel comfortable. You do not want to simultaneously learn a new language and how to write another new language.
- Following the Mal manual, write an interpreter.
- Rejoice!
- Follow the instructions again, but instead of executing the code, write code that executes the code. (Not just “refactoring” the existing interpreter. You need to start from scratch, although copy-paste is not prohibited).
I believe that this method can be used with any programming language that compiles into an executable file. For example, you can:
- Write the Mal interpreter on Go .
- Modify your code to:
- create a line of Go code and write it to a file;
- compile this resulting file with
go build
.
Ideally, it's better to control the Go compiler as a library, but this is also a way to make a compiler!
With the help of Mal's guide and your ingenuity, you can do all this. If even I could, then you can!