Swift compiler device. Part 3


    We continue to study the Swift compiler. This part is dedicated to Swift Intermediate Language.


    If you have not seen the previous ones, I recommend to follow the link and read:



    Silgen


    The next step is to convert typed AST to raw SIL. Swift Intermediate Language (SIL) is an intermediate representation specially created for Swift. A description of all instructions can be found in the documentation .


    SIL has an SSA form. Static Single Assignment (SSA) is a representation of a code in which each variable is assigned a value only once. It is created from ordinary code by adding additional variables. For example, using a numeric suffix that indicates the version of the variable after each assignment.


    Thanks to this form it is easier for the compiler to optimize the code. Below is an example in pseudocode. Obviously, the first line is unnecessary:


    a = 1
    a = 2
    b = a

    But this is only for us. To teach the compiler to define it, it would be necessary to write non-trivial algorithms. But using SSA is much easier. Now, even for a simple compiler, it will be obvious that the value of the a1 variable is not used, and this line can be deleted:


    a1 = 1
    a2 = 2
    b1 = a2

    SIL allows you to apply specific optimizations and checks to the Swift code that would be difficult or impossible to implement at the AST stage.


    Using SIL Generator


    To generate SIL, the -emit-silgen flag is used :


    swiftc -emit-silgen main.swift

    The result of the command:


    sil_stage raw
    import Builtin
    import Swift
    import SwiftShims
    let x: Int
    // x
    sil_global hidden [let] @$S4main1xSivp : $Int
    // main
    sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 {
    bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):
      alloc_global @$S4main1xSivp                     // id: %2
      %3 = global_addr @$S4main1xSivp : $*Int         // user: %8
      %4 = metatype $@thin Int.Type                   // user: %7
      %5 = integer_literal $Builtin.Int2048, 16       // user: %7
      // function_ref Int.init(_builtinIntegerLiteral:)
      %6 = function_ref @$SSi22_builtinIntegerLiteralSiBi2048__tcfC : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int // user: %7
      %7 = apply %6(%5, %4) : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int // user: %8
      store %7 to [trivial] %3 : $*Int                // id: %8
      %9 = integer_literal $Builtin.Int32, 0          // user: %10
      %10 = struct $Int32 (%9 : $Builtin.Int32)       // user: %11
      return %10 : $Int32                             // id: %11
    } // end sil function 'main'
    // Int.init(_builtinIntegerLiteral:)
    sil [transparent] [serialized] @$SSi22_builtinIntegerLiteralSiBi2048__tcfC : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int

    SIL, like LLVM IR, can be output as source code. It can be found that at this stage Swift import of Builtin, Swift and SwiftShims modules was added.


    Despite the fact that you can write code in Swift directly in the global scope, SILGen generates the main function - the entry point to the program. All code was located inside it, except for declaring a constant, since it is global and should be accessible everywhere.


    Most of the lines have a similar structure. On the left is a pseudo-register, which stores the result of the instruction. Then - the instruction itself and its parameters, and at the end - a comment indicating the register for which calculation this register will be used.


    For example, this line creates an integer literal of type Int2048 and a value of 16. This literal is saved in the fifth register and will be used to calculate the value of the seventh:


    %5 = integer_literal $Builtin.Int2048, 16       // user: %7

    The function declaration begins with the keyword sil. Further, the name with the prefix @, calling convention, the parameters, the type of the return value and the function code are indicated. For the initializer Int.init (_builtinIntegerLiteral :) it is, of course, not specified, since this function is from another module, and it only needs to be declared, but not defined. The dollar symbol indicates the start of the type indication:


    // Int.init(_builtinIntegerLiteral:)
    sil [transparent] [serialized] @$SSi22_builtinIntegerLiteralSiBi2048__tcfC : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int

    Calling convention points out how to properly call a function. This is necessary to generate machine code. A detailed description of these principles is beyond the scope of the article.


    The name of initializers, as well as the names of structures, classes, methods, protocols, are distorted (name mangling). This solves several problems at once.


    First, it allows you to use the same name in different modules and nested entities. For example, for the first fff method , the name S4main3AAAV3fffSiyF is used , and for the second method, S4main3BBV3fffSiyF is used :


    struct AAA {
        func fff() -> Int {
            return 8
        }
    }
    struct BBB {
        func fff() -> Int {
            return 8
        }
    }

    S means Swift, 4 is the number of characters in the module name, and 3 in the class name. In the initializer, the literal Si denotes the standard type Swift.Int.


    Secondly, names and types of function arguments are added to the name. This allows overloading. For example, for the first method, S4main3AAAV3fff3iiiS2i_tF is generated , and for the second method, S4main3AAAV3fff3dddSiSd_tF is generated :


    struct AAA {
        func fff(iii internalName: Int) -> Int {
            return 8
        }
        func fff(ddd internalName: Double) -> Int {
            return 8
        }
    }

    After the parameter names, the type of the return value is indicated, followed by the parameter types. However, their internal names are not specified. Unfortunately, there is no documentation for name mangling in Swift, and its implementation may change at any time.


    The function name is followed by its definition. It consists of one or more basic block. The base unit is a sequence of instructions with one entry point, one exit point, which does not contain branch instructions or conditions for early exit.


    The main function has one base unit that accepts all parameters passed to the function and contains all its code, since there are no branches in it:


    bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):

    We can assume that each field of view, bounded by curly brackets, is a separate base unit. Suppose the code contains a branch:


    // before
    if 2 > 5 {
        // true
    } else {
        // false
    }
    // after

    In this case, at least 4 basic blocks will be generated for:


    • code to branch,
    • the case where the expression is true
    • the case where the expression is false,
    • code after branching.

    cond_br is an instruction for conditional branching . If the value of the pseudo-register% 14 is true, then the transition to the bb1 block is performed . If not, then in bb2 . br is an unconditional jump that triggers the execution of the specified base unit:


    // before
    cond_br %14, bb1, bb2                           // id: %15
    bb1:   
    // true
      br bb3                                          // id: %21
    bb2:                                              // Preds: bb0
    // false
      br bb3                                          // id: %27
    bb3:                                              // Preds: bb2 bb1
    // after

    Source:



    SIL guaranteed transformations


    The raw intermediate representation, which was obtained at the previous stage, is analyzed for correctness and transformed into canonical: functions marked transparent are inline (the function call is replaced by its body), the values ​​of constant expressions are calculated, the functions that return values ​​are calculated do it in all branches of code, and so on.


    These conversions are mandatory and are performed even if code optimization is disabled.


    Generating canonical SIL


    To generate canonical SIL, the -emit-sil flag is used :


    swiftc -emit-sil main.swift

    The result of the command:


    sil_stage canonical
    import Builtin
    import Swift
    import SwiftShims
    let x: Int
    // x
    sil_global hidden [let] @$S4main1xSivp : $Int
    // main
    sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 {
    bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):
      alloc_global @$S4main1xSivp                     // id: %2
      %3 = global_addr @$S4main1xSivp : $*Int         // user: %6
      %4 = integer_literal $Builtin.Int64, 16         // user: %5
      %5 = struct $Int (%4 : $Builtin.Int64)          // user: %6
      store %5 to %3 : $*Int                          // id: %6
      %7 = integer_literal $Builtin.Int32, 0          // user: %8
      %8 = struct $Int32 (%7 : $Builtin.Int32)        // user: %9
      return %8 : $Int32                              // id: %9
    } // end sil function 'main'
    // Int.init(_builtinIntegerLiteral:)
    sil public_external [transparent] [serialized] @$SSi22_builtinIntegerLiteralSiBi2048__tcfC : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int {
    // %0                                             // user: %2
    bb0(%0 : $Builtin.Int2048, %1 : $@thin Int.Type):
      %2 = builtin "s_to_s_checked_trunc_Int2048_Int64"(%0 : $Builtin.Int2048) : $(Builtin.Int64, Builtin.Int1) // user: %3
      %3 = tuple_extract %2 : $(Builtin.Int64, Builtin.Int1), 0 // user: %4
      %4 = struct $Int (%3 : $Builtin.Int64)          // user: %5
      return %4 : $Int                                // id: %5
    } // end sil function '$SSi22_builtinIntegerLiteralSiBi2048__tcfC'

    In this simple example, there are few changes. To see the real work of the optimizer, you need to complicate the code a bit. For example, add addition:


    let x = 16 + 8

    In his raw SIL you can find the addition of these literals:


    %13 = function_ref @$SSi1poiyS2i_SitFZ : $@convention(method) (Int, Int, @thin Int.Type) -> Int // user: %14
    %14 = apply %13(%8, %12, %4) : $@convention(method) (Int, Int, @thin Int.Type) -> Int // user: %15

    And in the canonical it is no longer. Instead, the constant value 24 is used:


    %4 = integer_literal $Builtin.Int64, 24         // user: %5

    Source:



    SIL optimization


    Additional Swift-specific transformations are applied if optimization is enabled. Among them, specialization of generics (optimization of a generic code for a specific type of parameter), devirtualization (replacement of dynamic calls with static calls), inlayning, optimization of ARC and much more. The explanation of these techniques does not fit into the already overgrown article.


    Source:



    Since SIL is a Swift feature, I did not show examples of implementation this time. We will return to the parenthesis compiler in the next section, when we are engaged in the generation of LLVM IR.


    Also popular now: