Code profiling with LLVM

Original author: Arun Ramachandran
  • Transfer

Curse of non-determinism



My first attempt to write an LLVM pass - I love these segolvts.

Recently, I was faced with an interesting task - I needed a deterministic and cross-platform method for determining the execution time of C ++ code. By the word "deterministic" I mean that the same code will be executed for the same number of units of time. By cross-platform, I understand that the same code under Windows and under Ubuntu will be executed in the same amount of time units.

Naturally, the time measurement on the CPU does not satisfy these conditions. The machine code varies depending on the architecture and operating system, and the same code will take a different amount of time to execute. Even on the same machine, factors such as cache misses will play a large role — enough to distort the results of measuring the execution time of the same code. I needed something smarter ...

Motivation


I ran into this problem when I was working on my project, Code Character. Code Character is an online AI competition in which participants write bots to control an army in a turn-based strategy. I wanted to limit the amount of code that a participant can perform in one move.

My first thought was to simply measure the execution time of the code, but, as we can see, this strategy is not deterministic, and the participant who sent the same code twice will get completely different results. In fact, we tried to implement this solution, and the results change so much that the participant can win or lose with the same code. The result will be completely random, and we have dropped the idea of ​​measuring time.

Baytk LLVM


Since we cannot measure time, we decided instead to measure the number of instructions executed. Therefore, we need to instruct the code of participants. If you are not familiar with this term, this is the addition of some code to an application, in order to monitor any parameter, for example, CPU usage or runtime. Naturally, we do not expect the participants to do it themselves, we must automate the process.

We wanted to avoid the overhead of running time tools when working on our server, and therefore something like a PIN tool.not suitable for our purposes. Instead, we decided to directly instruct the participant code in order to count the number of instructions to be executed. Instead of tooling the binaries (machine code), we decided to use Clang to compile the code and tool the LLVM bytecode.

If you are not familiar with LLVM, it is a collection of modular and reusable compiler and toolchain technologies. One of the main projects is LLVM IR and backend. In simple terms, an intermediate representation (Intermediate Representation) has been developed, into which the compile front-end compiles the code. This code, LLVM IR, is then compiled into the machine code by the LLVM backend. Thus, if you are making a new language, you can decide to allow LLVM to support the generation of machine code and its optimization, and write a front-end to convert your language to LLVM IR.


Clang converts C ++ code to LLVM IR, which is then converted to native code by the LLVM backend.

Clang is an LLVM frontend. Since we need a cross-platform code measurement method, we cannot instruct a binary code. LLVM IR, however, is platform-independent, as it is only an intermediate presentation of code. Instrumenting IR code using LLVM libraries is a simple cross-platform solution.

Decision


A simple calculation of LLVM IR instructions is obviously not appropriate, since we need the number of instructions to be actually executed, and not just the number of instructions in the code. In the end, we developed a simple algorithm based on the concept of basic code blocks.

The base block is a set of instructions in which the input point can be only the first instruction, and the output point is only the last instruction. ( Also, any transitions within the base unit are prohibited - approx. transl.) To understand this, try to divide the code into pieces in which the branch instructions (transitions, cycles and returns) can only be the last in the set, and the input to the block (the first instruction in a function, loop or if / else block) is possible only on the first instructions. The result is a set of basic blocks - blocks of sequential code, which is simply executed sequentially, without making decisions about which instruction is executed next.

Why don't we try it right now? This is a code snippet provided by Code Character:

// Make the soldiers who aren't patrolling attack the enemyfor (int i = NUM_SOLDIERS / 2; i < NUM_SOLDIERS; ++i) {
    auto &soldier = state.soldiers[i];
    if (soldier.hp == 0) // If this soldier is dead, skip itcontinue;
    for (auto enemy_soldier : state.enemy_soldiers) {
        if (enemy_soldier.hp != 0) { // Ensure your prospective target has// not already been slain
            soldier.soldier_target = enemy_soldier.id;
            break;
        }
    }
}

Link to Github: https://gist.github.com/JHurricane96/8c9c3a45ec5e969de4d9fecb47ebef69#file-player_code-cpp

Using the fact that the base unit has only one input point (the first instruction), we can divide this fragment into the following basic blocks:

//-------------------------------- BB 1 ----------------------------------for (int i = NUM_SOLDIERS / 2; i < NUM_SOLDIERS; ++i) {
//-------------------------------- BB 2 ----------------------------------auto &soldier = state.soldiers[i];
    if (soldier.hp == 0)
//-------------------------------- BB 3 ----------------------------------continue;
//-------------------------------- BB 4 ----------------------------------for (auto enemy_soldier : state.enemy_soldiers) {
//-------------------------------- BB 5 ----------------------------------if (enemy_soldier.hp != 0) {
//-------------------------------- BB 6 ----------------------------------
            soldier.soldier_target = enemy_soldier.id;
            break;
//-------------------------------- BB 7 ----------------------------------
        }
    }
}

Link to Github
This helped us to understand how the basic blocks work, now consider this algorithm in LLVM IR:


; <label>:140:                                    ; preds = %181, %139
%141 = load i32, i32* %19, align 4
%142 = sext i32 %141 to i64
%143 = icmp slt i64 %142, 20
br i1 %143, label %144, label %184
; <label>:144:                                    ; preds = %140
%145 = getelementptr inbounds %"struct.player_state::State",
    %"struct.player_state::State"* %2, i32 0, i32 1
%146 = load i32, i32* %19, align 4
%147 = sext i32 %146 to i64
%148 = call dereferenceable(72) %"struct.player_state::Soldier"*
    @_ZNSt5arrayIN12player_state7SoldierELm20EEixEm(
    %"struct.std::array.10"* %145, i64 %147) #3
store %"struct.player_state::Soldier"* %148,
    %"struct.player_state::Soldier"** %20, align 8
%149 = load %"struct.player_state::Soldier"*,
    %"struct.player_state::Soldier"** %20, align 8
%150 = getelementptr inbounds %"struct.player_state::Soldier",
    %"struct.player_state::Soldier"* %149, i32 0, i32 2
%151 = load i64, i64* %150, align 8
%152 = icmp eq i64 %151, 0
br i1 %152, label %153, label %154
; <label>:153:                                    ; preds = %144
br label %181

Link to Github

If you look carefully, you will notice that the code snippet above is the first three blocks of the C ++ snippet compiled into LLVM IR (each line is the beginning of the base block).

LLVM has libraries that allow us to write “passages” —a code that can transform LLVM IR. The LLVM API allows us to easily read and analyze the LLVM IR by iterating over modules, functions, and base blocks, and modifying the LLVM IR before it is compiled into native code.

Now we have the basic blocks and the LLVM API, it becomes simple to count the number of instructions to be executed using such a simple algorithm:

  1. We write the function IncrementCount, which accepts an integer and increments a static integer by this value, with each call. It must be linked to the participant code.
  2. We iterate over all base blocks. For each base unit perform steps 3 and 4
  3. Find n - the number of instructions in this base unit.
  4. Insert the call to the IncrementCount function before the last instruction of the base unit, with the argument n.
  5. The static integer with which IncrementCount works and will be the counter of instructions after the code is executed. It can be saved somewhere and then checked.

Our instrumented IR works like this:

; <label>:140:                                    ; preds = %181, %139
%141 = load i32, i32* %19, align 4
%142 = sext i32 %141 to i64
%143 = icmp slt i64 %142, 20
call void @_Z14IncrementCountm(i32 4)
br i1 %143, label %144, label %184
; <label>:144:                                    ; preds = %140
%145 = getelementptr inbounds %"struct.player_state::State",
    %"struct.player_state::State"* %2, i32 0, i32 1
%146 = load i32, i32* %19, align 4
%147 = sext i32 %146 to i64
%148 = call dereferenceable(72) %"struct.player_state::Soldier"*
    @_ZNSt5arrayIN12player_state7SoldierELm20EEixEm(
    %"struct.std::array.10"* %145, i64 %147) #3
store %"struct.player_state::Soldier"* %148,
    %"struct.player_state::Soldier"** %20, align 8
%149 = load %"struct.player_state::Soldier"*,
    %"struct.player_state::Soldier"** %20, align 8
%150 = getelementptr inbounds %"struct.player_state::Soldier",
    %"struct.player_state::Soldier"* %149, i32 0, i32 2
%151 = load i64, i64* %150, align 8
%152 = icmp eq i64 %151, 0
call void @_Z14IncrementCountm(i32 10)
br i1 %152, label %153, label %154
; <label>:153:                                    ; preds = %144
call void @_Z14IncrementCountm(i32 1)
br label %181

Link to Github

As we can see, the IncrementCount call is made at the end of each base block right before the last instruction. Using the static int with which IncrementCount works, we can get the number of instructions at the end of each participant’s turn. This method is deterministic and cross-platform, because the same source code is guaranteed to generate the same LLVM IR if we use the same version of the compiler and the same flags.

Conclusion


Code profiling is not as simple as I once thought. In the process of working on such a relatively simple task, I became familiar with how the compiler works and how to write LLVM passages. If you are interested in code generation and you want to write your own passages, LLVM has a beginner's guide . also has an excellent article in the blog, which I used while writing my own passage. Since the LLVM API does not have backward compatibility between major versions, pay attention to which version of LLVM you are using.

The source code of the passage you can take here .

Also popular now: