Swift compiler device. Part 1

Swift is not just a programming language. This is a project that, in addition to the compiler, includes many other components. And the compiler itself is not a big and scary box, which with the help of magic turns your code into a set of machine-friendly instructions. It can also be broken down into components. If you are interested in which ones - welcome under cat.

I am not a compiler expert and have no experience in this area. But I wondered how it works, and I began to learn the Swift compiler. Since the article turned out too big, we had to divide it into 4 parts:

general overview of components
parsing the source file
Swift intermediate language,
LLVM IR and code generation.

Swift

Swift has been an open source project for more than two years. During this time, many community improvements have been added to it. You can follow them on a special site , as well as on the forum . You can also discuss suggestions for improving the language or posting your ideas. But to do this, you must first understand how the project works.

Swift Standard Library

The main parts of Swift, of course, are the compiler and the standard library of functions. They develop in parallel and are practically inseparable from each other.

The compiler is written in C ++, and the main part of stdlib is in Swift. However, the language used in it has several features:

The standard library through the module Builtin has direct access to the functions of the compiler. This allows it to refer to low-level language representations and "raw" pointers.
The standard library does not use the private access modifier. Instead, entity names that are not public begin with an underscore. Read more here .
Code generation is used using the Generate Your Boilerplate (GYB) utility to reduce repetitions in the standard library code.

A standard library is usually associated with containers and useful functions that simplify the life of a developer, but this is only one of the parts. In total, there are 3 most interesting components:

Core. The kernel with all its protocols, data types and functions. Sources .
Runtime The intermediate layer between the standard library and the compiler. He is responsible for type casting, working with memory, reflection and other dynamic capabilities of the language. Written in C ++ and Objective-C. Sources .
SDK Overlays. Wrappers over Foundation and other system frameworks that make Swift more convenient to access them. Sources .

Other subprojects

In addition to the compiler and the standard library, there are many other subprojects in open access. Some of them are listed below.

Sourcekit

IDE support framework : indexing, syntax highlighting, code completion, and so on.

SourceKit-LSP

Swift LSP implementation based on SourceKit. About what it is, you can read here .

Swift Package Manager

Package manager for Swift projects.

Foundation

Port of the Foundation library , which is one of the cores of Apple’s OS for third-party platforms.

libdispatch (GCD)

GCD for third-party platforms.

Xctest

XCTest for third-party platforms.

Lldb

LLDB with Swift and REPL support.

Playground support

The project includes two frameworks - PlaygroundSupport and PlaygroundLogger. They provide interaction with Xcode and beautiful data mapping, respectively.

llbuild

Build system .

gyb

Utility for code generation.

libcxx

Implementation of the standard C ++ library.

Swift compiler

Compiler in a broad sense - a program that converts code from one language to another. But more often compilation refers to the transformation of the source code into a machine (or another low-level representation), which can then be used to create an executable file.

The compiler is often divided into three parts: frontend, middlend, backend. The first is responsible for transforming the source code into an intermediate representation, which is convenient for the compiler to work with. Middlend performs the optimization, and the backend generates the machine code from the optimized intermediate representation.

However, in Swift, optimization is performed in the frontend, and (most) in the backend. Therefore, the intermediate step is not shown in the diagram.

Llvm

The Swift compiler uses LLVM as a backend. LLVM is a big project that includes many technologies. It is based on intermediate representation (IR). This is a universal intermediate code representation that can be converted to executable code on any platform supported by LLVM.

If a new architecture appears, it will be enough to add to LLVM the generation of machine code from IR for this platform. After that, all languages for which there is a compiler with IR generation will support this architecture.

On the other hand, to create a compiler for a new programming language, it is enough to write the translation of the source code in IR, and LLVM will take over the support of various architectures.

Another advantage of such a system is that LLVM is able to optimize the intermediate representation, and the frontend may not be engaged in optimization. This greatly simplifies compiler development.

IR has three types of display:

Object tree in memory. Each object corresponds to a specific entity in the source code: a function, an operator, a string, a pointer, and so on. This tree is created by the frontend at the IR generation stage.
Text view. IR can be output as low-level source code. It can be saved to a file and executed using an interpreter.
Serialized bitcode format (not to be confused with bytecode, which is used, for example, in Java). It can be used as the end result of the backend and transferred to the linker for optimization at the link level. The conversion to machine code in this case will be carried out by the linker.

Linker is a program that generates an executable file. Her description is beyond the scope of the article.

The source code for the LLVM version used in Swift can be found here , and the documentation on the official website .

As you can see, Apple posted in the open access a lot of interesting projects. In the next part I will talk about parsing the source file and generating AST.

Tags: