My compiler for Lisp

Original author: Tim Morgan
  • Transfer
I am very pleased to announce the completion of my first compiler for a programming language! Malcc is an Lisp incremental AOT compiler written in C. I

will briefly talk about its many years of development and what I learned in the process. Alternative article title: "How to write a compiler in ten years or less."

(In the end there is TL; DR , if you don't care about the background).

Compiler demo

tim ~/pp/malcc master 0 → ./malcc                                                                   Mal [malcc]
user> (println "hello world")
hello world
user> (+ 1 2)
user> (def! fib2 (fn* (n) (let* (f (fn* (n1 n2 c) (if (= c n) n2 (f n2 (+ n1 n2) (+ c 1))))) (f 0 1 1))))

user> (fib2 25)
user> ^D%
tim ~/pp/malcc master 0 → ./malcc examples/hello.mal                                                                                         hello world
tim ~/pp/malcc master 0 → ./malcc --compile examples/hello.mal hello                                                                         gcc -g -I ./tinycc -I . -o hello hello.c ./reader.c ./printer.c ./hashmap.c ./types.c ./util.c ./env.c ./core.c ./tinycc/libtcc.a -ledit -lgc -lpcre
tim ~/pp/malcc master 0 → ./hello                                                                                                            hello world
tim ~/pp/malcc master 0 →

Successful failures

For almost ten years, I dreamed of writing a compiler. I have always been fascinated by the work of programming languages, especially compilers. Although I imagined the compiler as dark magic and understood that it was impossible for a mere mortal like me to make it from scratch.

But I still tried and studied along the way!

First, the interpreter

In 2011, I started work on a simple interpreter for the fictional language Airball (airball can be translated as “muff"). By name you can evaluate the degree of my uncertainty that it will work. It was a fairly simple Ruby program that parsed code and walked through an abstract syntax tree (AST). When the interpreter still worked, I renamed it to Lydia and rewrote it to C to make it faster.

I remember Lydia's syntax seemed very smart to me! I still enjoy its simplicity.

Although Lydia was far from a perfect compiler, it inspired me to continue experimenting. However, I was still tormented by questions, how to make the compiler work: what to compile into? do i need to learn assembler?

Secondly, the bytecode compiler and interpreter

As a next step, in 2014, I started work on scheme-vm  , a virtual machine for Scheme written in Ruby. I thought that a virtual machine with its own stack and bytecode would be a transitional stage from an interpreter with AST passes and a full-fledged compiler. And since Scheme is formally defined , there is no need to invent anything.

I have been messing around with scheme-vm for over three years and have learned a lot about compilation. In the end, I realized that I could not finish this project. The code turned into real chaos, but there was no end in sight. Without a mentor or experience, I seemed to wander in the dark. As it turned out, the language specification is not the same as the manual for it. Lesson learned!

By the end of 2017, I put off scheme-vm in search of something better.

Meeting with Mal

Sometime in 2018, I came across Mal  , a Clojure-style Lisp interpreter.

Mal was invented by Joel Martin as a training tool. Since then, more than 75 implementations in different languages ​​have been developed! When I looked at these implementations, I realized that they help a lot: if I'm stuck, then I can go look for tips in the Ruby or Python version. Finally, at least someone speaks my language!

I also thought that if I could write an interpreter for Mal, I could repeat the same steps - and make a compiler for Mal.

Mal interpreter on Rust

First, I started developing the interpreter according to the walkthrough . At that time, I was actively studying Rust (I'll leave it for another article), so I wrote my own implementation of Mal in Rust: mal-rust . For more information about this experiment, see. Here .

It was a perfect pleasure! I don’t know how to thank or praise Joel for creating an excellent guide to Mal. Each step is described in detail , there are flowcharts, pseudo-code and tests ! All that a developer needs to create a programming language from start to finish.

Towards the end of the tutorial, I managed to run my Mal implementation for Mal, written in Mal, on top of my Rust implementation. (two levels of depth, wow). When she worked for the first time, I jumped on a chair with excitement!

Compiler Mal C

As soon as I proved the viability of mal-rust, I immediately began to research how to write a compiler. Compile to assembler? Can I compile machine code directly?

I saw the x86 assembler written in Ruby. He intrigued me, but the thought of working with assembler made me stop.

At one point, I stumbled upon this comment on Hacker News , which referred to the Tiny C Compiler as a “compilation backend”. It seemed like a great idea!

TinyCC has a test file showing how to use libtcc to compile C code from C program. This is the starting point for “hello world”.

Returning again to the walkthrough of Mal, recalling my knowledge of C, in a couple of months of free evenings and weekends, I was able to write the Mal compiler. It was a real pleasure. If you are used to developing through testing, then evaluate the availability of a preliminary set of tests. Tests lead to a working implementation. I can’t say much about this process, unless I repeat: the Mal manual is a real treasure. At every step, I knew exactly what to do!


Looking back, here are some difficulties when writing the Mal compiler, where I had to tinker:

  1. Macros must compile on the fly and be ready to be executed at compile time. This is a little perplexing.
  2. It is necessary to provide an “environment” (a tree of hashes / associative arrays / dictionaries with variables and their values) both for the compiler code and for the final code of the compiled program. This allows you to define macros at compile time.
  3. Since the environment is available at compile time, initially Malcc caught undefined errors during compilation (access to a variable that was not defined), and in a couple of places this violated the expectations of the test suite. In the end, to pass the tests, I turned off this feature. It would be great to add it back as an additional flag of the compiler, since this way you can catch a lot of errors in advance.
  4. I compiled C code by writing in three lines of the structure:
    • top: top level code - here are the functions
    • decl: declaration and initialization of variables used in the body
    • body: body where the main work is done
  5. All day I wondered if I could write my own garbage collector, but I decided to leave this exercise for later. The Boehm-Demers-Weiser garbage collection library is easy to connect and available on many platforms.
  6. It is important to look at the code that your compiler writes. Whenever the compiler encountered an environment variable DEBUG, it would produce compiled C code, where errors could be viewed.

What would I do otherwise

  1. Writing C code and trying to keep the indentation was not easy, then I would not refuse automation. It seems to me that some compilers write ugly code, and then a special library "decorates" it before issuing it. It needs to be studied!
  2. Adding to lines during code generation is a bit messy. You could consider creating an AST and then converting it to the last line of C code. This should bring the code in order and give harmony.

Now advice

I like that it took almost a decade for the compiler. No really. Each step on the path is a pleasant memory of how I gradually became an ever better programmer.

But this does not mean that I "finished". There are still hundreds of methods and tools that you need to learn to feel like a true compiler author. But I can confidently say: "I did it."

Here is the whole process in a concise form, how to make your own Lisp compiler:

  1. Choose the language in which you feel comfortable. You do not want to simultaneously learn a new language and how to write another new language.
  2. Following the Mal manual, write an interpreter.
  3. Rejoice!
  4. Follow the instructions again, but instead of executing the code, write code that executes the code. (Not just “refactoring” the existing interpreter. You need to start from scratch, although copy-paste is not prohibited).

I believe that this method can be used with any programming language that compiles into an executable file. For example, you can:

  1. Write the Mal interpreter on Go .
  2. Modify your code to:
    • create a line of Go code and write it to a file;
    • compile this resulting file with go build.

Ideally, it's better to control the Go compiler as a library, but this is also a way to make a compiler!

With the help of Mal's guide and your ingenuity, you can do all this. If even I could, then you can!

Also popular now: