Towards a brighter future for smart compilers

    Now the topic of machine learning and artificial intelligence is extremely popular, at the moment, thanks to the computing power of computers, ideas and algorithms that have arisen for a long time can be implemented and significantly improved. Almost every day you can read news about new achievements in this area. Moreover, machine learning is used in almost all areas ... and the development of compilers is no exception. However, the area is quite specific and has its own characteristics and difficulties in creating smart compilers. At the same time, there are a lot of studies on this topic and they have been conducted for a long time both in the academic environment and within various companies.

    Where exactly are trying to apply machine learning methods when creating compilers? And why so far the “smart” compilers have not become part of the daily life of the developer?

    Options for using machine learning in compiler development

    Let's start with the first question about specific uses of machine learning. The fact is that modern compilers are complex systems with a large number of optimizations that allow you to get more efficient machine code. However, some of the optimizations and other tasks, such as register allocation, are NP-complete, which forces compiler developers to use heuristic algorithms. As a result, most compilers have a large number of optimization flags that allow you to configure the heuristics used. In LLVM, almost every passage has several hidden options that can affect its operation; they can be used either using the –mllvm flag when calling clang, or in the opt utility. However, this variety of flags is hidden behind much more commonly used options, which contain many settings at once and are usually called optimization levels. For C / C ++ compilers, these are known to most -O1, -O2, -O3 for optimizing runtime and -Os for optimizing code size. But, unfortunately, the optimal code is not always the result (assembler experts can rewrite the generated code in the best way), a lot depends on the source code in a high-level language, processor architecture, language features, etc.

    Despite the fact that today modern processors have enough RAM and quite high performance, there are still areas where application performance, energy efficiency and machine code size play a key role. Examples of such areas are software development for embedded systems with a limited amount of RAM, digital signal processing, real-time systems, etc. Therefore, in cases where you need to get high-performance machine code for large enough systems, the selection of the correct compilation options that give the best result is an important task. In addition, the worst runtime problem ( WCET) has not disappeared anywhere when in real-time systems it is necessary to calculate and minimize, if possible, the time taken to complete a specific task on the platform. Until now, programmers working with systems with a limited amount of RAM cannot rely entirely on compilers, and often independently optimize the generated machine code.

    It’s hard for a person to predict which optimizations will give a good result and which can lead to regressions, because for this you need to have a good understanding of the intricacies of the heuristic algorithms used, a good knowledge of the structure and passages of the compiler used, and also fully know the code of the compiled program, which The current application development process is impossible. As a result, identifying the best compilation options for a program for a person becomes a task of exhaustive search of various combinations of options and measurements of performance and code sizes.

    In addition, there is a limitation in the form of a compilation unit with which you can work and for which you can choose options. So for C / C ++, this is still a file that can contain a lot of code, which, perhaps, it would be useful to optimize in different ways, but at the moment this is not possible. Therefore, a “smart” compiler that could train and then get code well-optimized for a variety of cases is a dream for some developers.

    Existing Research and Solutions

    Naturally, the problem of automated selection of compilation options has been of interest to researchers for many years. One of the most famous projects is the development of G. Fursin and researchers from his MILEPOST GCC team , which is a version of the gcc compiler, capable of selecting optimization passes on the basis of previous training on the obtained data sample. In this work, we used a set of 55 characteristics for solving the problem and a rather simple model based on the idea of ​​distributing good solutions based on the K algorithm of nearest neighbors. It was this development that showed that tuning optimization passes can lead to code that is twice as fast as code obtained using the standard maximum optimization option -O3.

    There are also studies by G. Pekhimenko and A.D. Brown for IBM's TPO ( Toronto Portable Optimizer ). Their main task was to select heuristically selectable values ​​for optimizations and the very set of code transformations. For the implementation, logistic regression was used, which made it possible to make effective fines settings for faster training. The classifier was built in Matlab. The probability of use was calculated for each pass, and it was applied if it was more than 50%. As a result of the characteristic that they tried to reduce in this study, it was static compilation time.

    A.Askhari was engaged in the direct selection of compilation options for the entire program at once to minimize execution time, compilation time, code size and power consumption . The cTuning Framework and the Collective Mind Framework developed by G. Fursin and A. Lokhmotov (also developed on GitHub ) were used for this .

    There are also studies by M. Stephenson and S. Amarasingeselection of optimizations for particular especially important algorithms (register allocation, DATA PREFETCHING, HYPERBLOCK FORMATION). For each function, its own characteristics were used accordingly. For the solution, a genetic algorithm was used. Testing of the developed product was carried out at the Open Research Compiler (ORC).

    There is also a MAGEEC (Machine Guided Energy Efficient Compiler) project , the goals of which are somewhat different. The developed infrastructure uses machine learning to select the optimizations necessary to generate the code with maximum energy efficiency for high-performance computing systems. MAGEEC is designed to work with both gcc and LLVM. This compiler is part of the larger TSERO (Total Software Energy Reporting and Optimization) project.

    One research directly related to LLVM is LLVMTuner, a software product developed at the University of Illinois by I. Chen and W. Adve. In 2017, a report was presented describing the results available at that time. In this paper, we optimized individual “hot” cycles. This framework is designed for automated configuration of large programs. LLVMTuner runs on LLVM IR middleware, uses profiling to identify hot loops, and then automatically sets heuristics for them. The focus is on top-level cycles. The selected cycles and any call functions are transferred to a separate module, which is further subjected to the necessary optimizations. This solution allows you to get improved performance on large programs.

    Existing problems

    However, there is no widely used compiler that independently adjusts heuristics of optimizing passes. What is the problem? As you know, the effectiveness of the application of machine learning methods and the quality of the obtained models depend on the correct choice of features and the quality of data for training (despite the existence of algorithms less sensitive to “noisy” data). Without knowing the structure and algorithms used in the compiler, it is not easy to select a complete and sufficient set of characteristics for training, although there are quite understandable and logical ones, for example, the size of the cycle, the number of exits from the cycle, etc. Therefore, it is difficult to develop a universal solution suitable for many compilers at once, and it is not a fact that it is generally possible. In addition, it is likely that this is not required.

    Since the development of compilers should be efficient and feasible in a fairly short time, it is natural that even large companies develop their industrial compilers based on ready-made solutions. Most modern solutions can be divided into two categories: running on virtual machines, for example, JVM - JIT compilers, and compilers based on LLVM, a system that implements a virtual machine with RISC-like instructions - static and dynamic compilers. Of course, there are still companies own solutions, but they are becoming less competitive due to the lack of a large community involved in the development of the technologies used in them. As a result, today many large companies such as Google, Apple, Adobe, ARM use LLVM to develop their own solutions. Of course,

    Also, the collection of characteristics for training itself becomes a big problem, since multi-pass compilers strongly transform the intermediate representation, and the characteristics collected at the initial stage are not quite relevant for later compiler optimizations, these characteristics can change with a high probability. Characteristics, in addition, it makes sense to collect separately for different types of elements: modules, cycles, base blocks, since optimizations are usually designed to change a particular type of element, in LLVM even by this criterion the passages are divided.

    But, firstly, the question arises of identifying the elements for which it is necessary to collect characteristics. There are many ways to calculate unique identifiers that can be saved during all optimizations, for example:

    • AST based front end hash
    • unique numbers assigned in front-end parsing
    • 64-bit number generated on the basis of arcs in CFG (control-flow graph) using a checksum (similar to PGO (Profile-guided optimization) in LLVM)

    However, you need to properly preserve these identifiers during transformations, when the elements can merge into one, separate, create new ones and delete the original ones, which is not an easy task.

    Secondly, it is difficult in principle to evaluate the boundaries of the source cycles, base blocks, etc. written in the source code, on the already converted IR. For example, due to the multi-stage generation of machine code adopted by LLVM, information about machine base units is lost after code generation based on machine instructions in AsmPrinter. And accordingly, information about the identifiers of the base blocks and cycles is also lost, for which the offset from the beginning of the function is measured, for example, therefore, with this method, only at the stage of generating the machine code can the offset be obtained in the form of the number of bytes. However, at subsequent stages of generating machine code, when working with machine fragments, various alignments can be added to it, which change the size of the instructions taken into account earlier, and nop instructions are also added. Due to this, for base blocks at the end of large functions, the calculation error can be very large, up to a complete shift to another block / cycle. And although some of the transformations in the later stages can be tracked and taken into account, this will not give guarantees for the accuracy of measurements, since the size of the instructions can vary right up to the linker.

    As you can see, even the collection of features on the basis of which you need to conduct training, and which are likely to become the input set for the trained model for making decisions, is quite complex and time-consuming. And there are no obvious solutions to these problems, which complicates the immediate work associated with machine learning and attracting a large number of people due to the lack of sufficient datasets. Well, the typical difficulties of finding solutions to machine learning problems, choosing models, methods, determining the correct subset of attributes with a large number of them, etc. exist in this case. Almost everyone who has come across machine learning knows about them and, perhaps, there is nothing unique and specific for compilers here.

    It’s hard to predict when smart compilers will become widespread. Modern compilers also have other problems that are unlikely to be solved by this method, and which at the moment are probably more priority. However, compilers have already become much more intelligent than they were at the dawn of their appearance, and this process will continue, although it may be somewhat slower than most developers would like.

    Also popular now: