Swift compiler device. Part 4


    This is the last part of my review of the Swift compiler. I will show how you can generate LLVM IR from AST and what a real frontend gives out. If you have not read the previous parts, then follow the links:



    LLVM IR Gen


    For the frontend - this is the final step. The LLVM IR Generator converts SIL to an intermediate representation of the LLVM. It is transferred to the backend for further optimization and generation of machine code.


    Implementation example


    In order to generate an intermediate view, you need to interact with the LLVM library. It is written in C ++, but since you cannot call it from Swift, you will have to use the C-interface. But the C-library just does not turn.


    It needs to be wrapped in a module. Make it easy. Here there is a good guide. For LLVM already exists such a wrapper in the public domain, so it is easier to take it.


    The Swift-wrapper over the LLVM-C library is laid out on the same account , but it will not be used in this article.


    To generate an intermediate representation, the corresponding class LLVMIRGen was created . In the initializer, it accepts the AST created by the parser:


    import cllvm
    class LLVMIRGen {
        private let ast: ASTNode
        init(ast: ASTNode) {
            self.ast = ast
        }

    The printTo (_, dump) method starts the generation and saves it in a readable form to a file. The dump parameter is used to optionally display the same information in the console:


    func printTo(_ fileName: String, dump: Bool) {

    First you need to create a module. Its creation, as well as the creation of other entities, are included in separate methods and will be discussed below. Since this is C, then you need to manage memory manually. To remove a module from memory, use the LLVMDisposeModule () function :


    let module = generateModule()
    defer {
        LLVMDisposeModule(module)
    }

    The names of all functions and types of LLVM begin with the appropriate prefix. For example, a pointer to a module is of type LLVMModuleRef , and on a builder is LLVMBuilderRef . The builder is an auxiliary class (after all, under the inconvenient C-interface, the usual classes and methods are hidden), which helps generate IR:


    let builder = generateBuilder()
    defer {
        LLVMDisposeBuilder(builder)
    }

    The output of the number from the brackets to the console will be done using the standard puts function . In order to refer to it, you need to announce it. This happens in the generateExternalPutsFunction method . A module is passed to it because an ad needs to be added to it. The putsFunction constant will hold a function pointer so that it can be accessed:


    let putsFunction = generateExternalPutsFunction(module: module)

    The Swift compiler created the main function at the SIL stage. Since the curly bracket compiler does not have such an intermediate representation, the function will be generated immediately in LLVM IR.


    To do this, use the generateMainFunction method (builder, module, mainInternalGenerator) . The call to the main function will not be. Therefore, the pointer to it is not necessary to save:


    generateMainFunction(builder: builder, module: module) {
        // ...
    }

    The last parameter of the method is the closure inside which the AST is converted to the corresponding LLVM IR. A separate handleAST method (_, putsFunction, builder) is created for this :


    generateMainFunction(builder: builder, module: module) {
        handleAST(ast, putsFunction: putsFunction, builder: builder)
    }

    At the end of the method, the resulting intermediate view is output to the console and saved to the file:


    if dump {
        LLVMDumpModule(module)
    }
    LLVMPrintModuleToFile(module, fileName, nil)

    Now more about the methods. The module is generated by calling the LLVMModuleCreateWithName () function with the desired name:


    private func generateModule() -> LLVMModuleRef {
        let moduleName = "BraceCompiller"
        return LLVMModuleCreateWithName(moduleName)
    }

    The builder is created even easier. It does not need parameters at all:


    private func generateBuilder() -> LLVMBuilderRef {
        return LLVMCreateBuilder()
    }

    To declare a function, you first need to allocate memory for its parameter and store in it a pointer to an Int8. Next, call LLVMFunctionType () to create the type of the function, passing in it the type of the return value, the array of the types of the arguments (C-array - a pointer to the corresponding sequence of values) and their number. LLVMAddFunction () adds the puts function to the module and returns a pointer to it:


    private func generateExternalPutsFunction(module: LLVMModuleRef) -> LLVMValueRef {
        var putParamTypes = UnsafeMutablePointer<LLVMTypeRef?>.allocate(capacity: 1)
        defer {
            putParamTypes.deallocate()
        }
        putParamTypes[0] = LLVMPointerType(LLVMInt8Type(), 0)
        let putFunctionType = LLVMFunctionType(LLVMInt32Type(), putParamTypes, 1, 0)
        return LLVMAddFunction(module, "puts", putFunctionType)
    }

    main is created in a similar way, but a body is added to it. As in SIL, it consists of basic blocks. To do this, call the LLVMAppendBasicBlock () method , passing in it the function and the name of the block.


    Now the builder comes into play. Calling LLVMPositionBuilderAtEnd () it moves to the end of yet empty, block, and inside the closure mainInternalGenerator () with it will be added to the body function.


    At the end of the method, the constant value 0 is returned from main . This is the last instruction in this function:


    private func generateMainFunction(builder: LLVMBuilderRef,
                                      module: LLVMModuleRef,
                                      mainInternalGenerator: () -> Void) {
        let mainFunctionType = LLVMFunctionType(LLVMInt32Type(), nil, 0, 0)
        let mainFunction = LLVMAddFunction(module, "main", mainFunctionType)
        let mainEntryBlock = LLVMAppendBasicBlock(mainFunction, "entry")
        LLVMPositionBuilderAtEnd(builder, mainEntryBlock)
        mainInternalGenerator()
        let zero = LLVMConstInt(LLVMInt32Type(), 0, 0)
        LLVMBuildRet(builder, zero)
    }

    The generation of IR by AST in the parenthesis compiler is very simple, since the only action that can be done on this “programming language” is the output to the console of a single number. You need to recursively go through the whole tree, and when you find the number node, add a call to the puts function . If this node is not present, the main function will only return a zero value:


    private func handleAST(_ ast: ASTNode, putsFunction: LLVMValueRef, builder: LLVMBuilderRef) {
        switch ast {
        case let .brace(childNode):
            guard let childNode = childNode else {
                break
            }
            handleAST(childNode, putsFunction: putsFunction, builder: builder)
        case let .number(value):
            generatePrint(value: value, putsFunction: putsFunction, builder: builder)
        }
    }

    Generating the call to puts is done using the LLVMBuildCall () function . In it it is necessary to transfer the builder, the pointer to the function, the arguments and their number. LLVMBuildGlobalStringPtr () creates a global constant to hold the string. She will be the only argument:


    private func generatePrint(value: Int, putsFunction: LLVMValueRef, builder: LLVMBuilderRef) {
        let putArgumentsSize = MemoryLayout<LLVMValueRef?>.size
        let putArguments = UnsafeMutablePointer<LLVMValueRef?>.allocate(capacity: 1)
        defer {
            putArguments.deallocate()
        }
        putArguments[0] = LLVMBuildGlobalStringPtr(builder, "\(value)", "print")
        _ = LLVMBuildCall(builder, putsFunction, putArguments, 1, "put")
    }

    To start generating the LLVM IR, you need to create an instance of the LLVMIRGen class and call the printTo (_, dump) method :


    let llvmIRGen = LLVMIRGen(ast: ast)
    llvmIRGen.printTo(outputFilePath, dump: false)

    Since now the compiler of brackets is completely ready, you can run it from the command line. To do this, collect it ( instruction ) and execute the command:


    build/debug/BraceCompiler Example/input.b Example/output.ll

    The result is an intermediate view:


    ; ModuleID = 'BraceCompiller'
    source_filename = "BraceCompiller"
    @print = private unnamed_addr constant [5 x i8] c"5678\00"
    declare i32 @puts(i8*)
    define i32 @main() {
    entry:
      %put = call i32 @puts(i8* getelementptr inbounds ([5 x i8], [5 x i8]* @print, i32 0, i32 0))
      ret i32 0
    }

    Using the LLVM IR Swift Generator


    LLVM IR also has SSA form, but it is low-level and more like an assembler. A description of the instructions can be found in the documentation .


    Global identifiers begin with the symbol b> @ </ b , local with % . In the example above, the string "5678 \ 00" is saved to the global constant b> @print </ b , and then used to call the function b> @puts </ b using the call instruction .


    In order to see something interesting in the LLVM IR generated by the Swift compiler, you need to make the code a little more complicated. For example, add addition:


    let x = 16
    let y = x + 7

    The -emit-ir flag is responsible for generating LLVM IR :


    swiftc -emit-ir main.swift

    The result of the command:


    ; ModuleID = '-'
    source_filename = "-"
    target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-apple-macosx10.14.0"
    %TSi = type <{ i64 }>
    @"$S4main1xSivp" = hidden global %TSi zeroinitializer, align 8
    @"$S4main1ySivp" = hidden global %TSi zeroinitializer, align 8
    @__swift_reflection_version = linkonce_odr hidden constant i16 3
    @llvm.used = appending global [1 x i8*] [i8* bitcast (i16* @__swift_reflection_version to i8*)], section "llvm.metadata", align 8
    define i32 @main(i32, i8**) #0 {
    entry:
      %2 = bitcast i8** %1 to i8*
      store i64 16, i64* getelementptr inbounds (%TSi, %TSi* @"$S4main1xSivp", i32 0, i32 0), align 8
      %3 = load i64, i64* getelementptr inbounds (%TSi, %TSi* @"$S4main1xSivp", i32 0, i32 0), align 8
      %4 = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %3, i64 7)
      %5 = extractvalue { i64, i1 } %4, 0
      %6 = extractvalue { i64, i1 } %4, 1
      br i1 %6, label %8, label %7
    ; <label>:7:                                      ; preds = %entry
      store i64 %5, i64* getelementptr inbounds (%TSi, %TSi* @"$S4main1ySivp", i32 0, i32 0), align 8
      ret i32 0
    ; <label>:8:                                      ; preds = %entry
      call void @llvm.trap()
      unreachable
    }
    ; Function Attrs: nounwind readnone speculatable
    declare { i64, i1 } @llvm.sadd.with.overflow.i64(i64, i64) #1
    ; Function Attrs: noreturn nounwind
    declare void @llvm.trap() #2
    attributes #0 = { "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "target-cpu"="penryn" "target-features"="+cx16,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" }
    attributes #1 = { nounwind readnone speculatable }
    attributes #2 = { noreturn nounwind }
    !llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7}
    !llvm.linker.options = !{!8, !9, !10}
    !llvm.asan.globals = !{!11}
    !0 = !{i32 1, !"Objective-C Version", i32 2}
    !1 = !{i32 1, !"Objective-C Image Info Version", i32 0}
    !2 = !{i32 1, !"Objective-C Image Info Section", !"__DATA,__objc_imageinfo,regular,no_dead_strip"}
    !3 = !{i32 4, !"Objective-C Garbage Collection", i32 1536}
    !4 = !{i32 1, !"Objective-C Class Properties", i32 64}
    !5 = !{i32 1, !"wchar_size", i32 4}
    !6 = !{i32 7, !"PIC Level", i32 2}
    !7 = !{i32 1, !"Swift Version", i32 6}
    !8 = !{!"-lswiftSwiftOnoneSupport"}
    !9 = !{!"-lswiftCore"}
    !10 = !{!"-lobjc"}
    !11 = !{[1 x i8*]* @llvm.used, null, null, i1 false, i1 true}

    The intermediate representation of a real compiler is a bit more complicated. It contains additional operations, but the necessary instructions are not difficult to find. Here global constants x and y with distorted names are declared :


    @"$S4main1xSivp" = hidden global %TSi zeroinitializer, align 8
    @"$S4main1ySivp" = hidden global %TSi zeroinitializer, align 8

    Here begins the definition of the main function :


    define i32 @main(i32, i8**) #0 {

    First , the value 16 is stored in the constant x :


    store i64 16, i64* getelementptr inbounds (%TSi, %TSi* @"$S4main1xSivp", i32 0, i32 0), align 8

    It is then loaded into register 3 and is used to invoke addition with literal 7:


    %3 = load i64, i64* getelementptr inbounds (%TSi, %TSi* @"$S4main1xSivp", i32 0, i32 0), align 8
    %4 = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %3, i64 7)

    An overflow check addition returns a structure. Its first value is the result of the addition, and the second is a flag that indicates whether there was an overflow.


    The structure in LLVM is more like a Swift tuple. It has no names for the fields, and you need to get the value using the extractvalue statement . Its first parameter indicates the types of fields in the structure, the second - the structure itself, and after the comma - the field index, the value of which you need to pull out:


    %5 = extractvalue { i64, i1 } %4, 0
    %6 = extractvalue { i64, i1 } %4, 1

    Now in the sixth register is stored overflow sign. This value is checked using the branch instruction. If there was an overflow, there will be a transition to the label8 block , if not - to label7 :


    br i1 %6, label %8, label %7

    In the first of them, the execution of the program is interrupted by a call to trap () . In the second, the result of the addition is stored in the constant y , and 0 is returned from the main function :


    ; <label>:7:                                      ; preds = %entry
      store i64 %5, i64* getelementptr inbounds (%TSi, %TSi* @"$S4main1ySivp", i32 0, i32 0), align 8
      ret i32 0
    ; <label>:8:                                      ; preds = %entry
      call void @llvm.trap()
      unreachable

    Assembly Code Generation


    The Swift compiler can also display assembler code. To do this, pass the -emit-assembly flag :


    swiftc -emit-assembly main.swift

    The result of the command:


        .section    __TEXT,__text,regular,pure_instructions
        .build_version macos, 10, 14
        .globl  _main
        .p2align    4, 0x90
    _main:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register %rbp
        movq    $16, _$S4main1xSivp(%rip)
        movq    _$S4main1xSivp(%rip), %rax
        addq    $7, %rax
        seto    %cl
        movl    %edi, -4(%rbp)
        movq    %rsi, -16(%rbp)
        movq    %rax, -24(%rbp)
        movb    %cl, -25(%rbp)
        jo  LBB0_2
        xorl    %eax, %eax
        movq    -24(%rbp), %rcx
        movq    %rcx, _$S4main1ySivp(%rip)
        popq    %rbp
        retq
    LBB0_2:
        ud2
        .cfi_endproc
        .private_extern _$S4main1xSivp
        .globl  _$S4main1xSivp
    .zerofill __DATA,__common,_$S4main1xSivp,8,3
        .private_extern _$S4main1ySivp
        .globl  _$S4main1ySivp
    .zerofill __DATA,__common,_$S4main1ySivp,8,3
        .private_extern ___swift_reflection_version
        .section    __TEXT,__const
        .globl  ___swift_reflection_version
        .weak_definition    ___swift_reflection_version
        .p2align    1
    ___swift_reflection_version:
        .short  3
        .no_dead_strip  ___swift_reflection_version
        .linker_option "-lswiftSwiftOnoneSupport"
        .linker_option "-lswiftCore"
        .linker_option "-lobjc"
        .section    __DATA,__objc_imageinfo,regular,no_dead_strip
    L_OBJC_IMAGE_INFO:
        .long   0
        .long   1600
    .subsections_via_symbols

    By understanding the intermediate presentation code described above, you can find the assembler instructions it generates. Here is the saving of 16 to the constant and loading it into the % rax register :


    movq    $16, _$S4main1xSivp(%rip)
    movq    _$S4main1xSivp(%rip), %rax

    Here is the addition of 7 and the value of the constant. The result of the addition is placed in the register % rax :


    addq    $7, %rax

    And this is how the result loading in the constant y looks like :


    movq    %rax, -24(%rbp)
    movq    -24(%rbp), %rcx
    movq    %rcx, _$S4main1ySivp(%rip)

    Source:



    Conclusion


    Swift is a well-structured compiler, and it was not difficult to understand its overall architecture. Also I was surprised that using LLVM, you can easily write your own programming language. Of course, the parentheses compiler is quite primitive, but Kaleidoscope’s implementation is also realistic. I recommend reading at least the first three chapters of the tutorial.


    Thanks to everyone who read. I will continue to study the Swift compiler and, perhaps, write about what came of it. What topics would you be interested in?


    Useful links:



    Also popular now: