Swift compiler device. Part 1
Swift is not just a programming language. This is a project that, in addition to the compiler, includes many other components. And the compiler itself is not a big and scary box, which with the help of magic turns your code into a set of machine-friendly instructions. It can also be broken down into components. If you are interested in which ones - welcome under cat.
I am not a compiler expert and have no experience in this area. But I wondered how it works, and I began to learn the Swift compiler. Since the article turned out too big, we had to divide it into 4 parts:
- general overview of components
- parsing the source file
- Swift intermediate language,
- LLVM IR and code generation.
Swift has been an open source project for more than two years. During this time, many community improvements have been added to it. You can follow them on a special site , as well as on the forum . You can also discuss suggestions for improving the language or posting your ideas. But to do this, you must first understand how the project works.
Swift Standard Library
The main parts of Swift, of course, are the compiler and the standard library of functions. They develop in parallel and are practically inseparable from each other.
The compiler is written in C ++, and the main part of stdlib is in Swift. However, the language used in it has several features:
- The standard library through the module Builtin has direct access to the functions of the compiler. This allows it to refer to low-level language representations and "raw" pointers.
- The standard library does not use the private access modifier. Instead, entity names that are not public begin with an underscore. Read more here .
- Code generation is used using the Generate Your Boilerplate (GYB) utility to reduce repetitions in the standard library code.
A standard library is usually associated with containers and useful functions that simplify the life of a developer, but this is only one of the parts. In total, there are 3 most interesting components:
- Core. The kernel with all its protocols, data types and functions. Sources .
- Runtime The intermediate layer between the standard library and the compiler. He is responsible for type casting, working with memory, reflection and other dynamic capabilities of the language. Written in C ++ and Objective-C. Sources .
- SDK Overlays. Wrappers over Foundation and other system frameworks that make Swift more convenient to access them. Sources .
In addition to the compiler and the standard library, there are many other subprojects in open access. Some of them are listed below.
IDE support framework : indexing, syntax highlighting, code completion, and so on.
Swift Package Manager
Package manager for Swift projects.
Port of the Foundation library , which is one of the cores of Apple’s OS for third-party platforms.
GCD for third-party platforms.
XCTest for third-party platforms.
LLDB with Swift and REPL support.
The project includes two frameworks - PlaygroundSupport and PlaygroundLogger. They provide interaction with Xcode and beautiful data mapping, respectively.
Utility for code generation.
Implementation of the standard C ++ library.
Compiler in a broad sense - a program that converts code from one language to another. But more often compilation refers to the transformation of the source code into a machine (or another low-level representation), which can then be used to create an executable file.
The compiler is often divided into three parts: frontend, middlend, backend. The first is responsible for transforming the source code into an intermediate representation, which is convenient for the compiler to work with. Middlend performs the optimization, and the backend generates the machine code from the optimized intermediate representation.
However, in Swift, optimization is performed in the frontend, and (most) in the backend. Therefore, the intermediate step is not shown in the diagram.
The Swift compiler uses LLVM as a backend. LLVM is a big project that includes many technologies. It is based on intermediate representation (IR). This is a universal intermediate code representation that can be converted to executable code on any platform supported by LLVM.
If a new architecture appears, it will be enough to add to LLVM the generation of machine code from IR for this platform. After that, all languages for which there is a compiler with IR generation will support this architecture.
On the other hand, to create a compiler for a new programming language, it is enough to write the translation of the source code in IR, and LLVM will take over the support of various architectures.
Another advantage of such a system is that LLVM is able to optimize the intermediate representation, and the frontend may not be engaged in optimization. This greatly simplifies compiler development.
IR has three types of display:
- Object tree in memory. Each object corresponds to a specific entity in the source code: a function, an operator, a string, a pointer, and so on. This tree is created by the frontend at the IR generation stage.
- Text view. IR can be output as low-level source code. It can be saved to a file and executed using an interpreter.
- Serialized bitcode format (not to be confused with bytecode, which is used, for example, in Java). It can be used as the end result of the backend and transferred to the linker for optimization at the link level. The conversion to machine code in this case will be carried out by the linker.
Linker is a program that generates an executable file. Her description is beyond the scope of the article.
As you can see, Apple posted in the open access a lot of interesting projects. In the next part I will talk about parsing the source file and generating AST.