LLVM for Tensorflow, or Moore's Law End Compiler

Original author: The TensorFlow MLIR Team
  • Transfer
The TensorFlow ecosystem contains a number of compilers and optimizers working at various levels of the software and hardware stack. For those who use Tensorflow daily, this multi-level stack can generate difficult to understand errors, both compilation time and runtime, associated with the use of various kinds of hardware (GPU, TPU, mobile platforms, etc.).

These components, starting with the graph Tensorflow, can be represented in the form of such a diagram:



In fact, everything is more complicated

In this diagram we can see that Tensorflow graphs can be launched in several different ways

note
In TensorFlow 2.0, graphs can be implicit; greedy execution can run operations individually, in groups, or on a full graph. These graphs or fragments of the graph must be optimized and executed.

For instance: 

  • We send the graphs to the Tensorflow executor, who calls specialized hand-written kernels
  • Convert them to XLA HLO (XLA High-Level Optimizer representation) - a high-level representation of the XLA optimizer, which, in turn, can call the LLVM compiler for the CPU or GPU, or continue to use XLA for TPU , or combine them. 
  • We convert them to TensorRT , nGraph , or another format for a specialized instruction set implemented in hardware.
  • We convert them to the TensorFlow Lite format , run in the TensorFlow Lite runtime, or converted to code to run on the GPU or DSP via the Android Neural Networks API (NNAPI) or the like.

There are also more complex methods, including many optimization passes on each layer, as, for example, in the Grappler framework, which optimizes operations in TensorFlow.

Although these various implementations of compilers and intermediate representations improve performance, their diversity poses a problem to end users, such as confusing error messages when pairing these subsystems. Also, the creators of new software and hardware stacks must adjust the optimization and conversion passages for each new case.

And by virtue of all this, we are pleased to announce MLIR, a Multi-Level Intermediate Representation. This is an intermediate view format and compilation libraries for use between a model view and a low-level compiler that generates hardware-dependent code. Introducing MLIR, we want to give way to new research in the development of optimizing compilers and implementations of compilers based on industrial quality components.

We expect MLIR to be of interest to many groups, including:

  • compiler researchers, as well as practitioners who want to optimize the performance and memory consumption of machine learning models;
  • hardware manufacturers looking for a way to combine their hardware with Tensorflow, such as TPUs, mobile neuroprocessors in smartphones, and other custom ASICs;
  • people who want to give the programming languages ​​the benefits provided by optimizing compilers and hardware accelerators;

What is MLIR?


MLIR is essentially a flexible infrastructure for modern optimizing compilers. This means that it consists of an intermediate representation (IR) specification and a set of tools to transform this representation. When we talk about compilers, moving from a higher-level view to a lower-level view is called lowering, and we will use this term in the future.

MLIR is built under the influence of LLVM and shamelessly borrows many good ideas from it. It has a flexible type system, and is designed to represent, analyze and transform graphs, combining many levels of abstraction in one compilation level. These abstractions include Tensorflow operations, nested polyhedral loop regions, LLVM instructions, and fixed point operations and types.

Dialects of MLIR 


In order to separate the various software and hardware targets, MLIR has “dialects”, including:

  • TensorFlow IR, which includes everything that can be done in TensorFlow graphs
  • XLA HLO IR, designed to get all the benefits provided by the XLA compiler, the output of which we can get code for TPU, and not only.
  • An experimental affinity dialect designed specifically for polyhedral representations and optimizations
  • LLVM IR, 1: 1 matching the native LLVM view, allowing MLIR to generate code for the GPU and CPU using LLVM. 
  • TensorFlow Lite designed to generate code for mobile platforms

Each dialect contains a set of specific operations, using invariants, such as: "it is a binary operator, and its input and output are of the same type."

Extensions MLIR


MLIR does not have a fixed and built-in list of global intrinsic operations. Dialects can define completely custom types, and in this way MLIR can model things like LLVM IR type system (having first-class aggregates), domain language abstractions, such as quantized types, important for ML optimized accelerators, and, in the future, even a Swift or Clang type system.

If you want to attach a new low-level compiler to this system, you can create a new dialect and descend from the dialect of the TensorFlow graph to your dialect. This simplifies the path for hardware developers and compiler developers. You can target the dialect to different levels of the same model, high-level optimizers will be responsible for specific parts of IR.

For compiler researchers and framework developers, MLIR allows you to create transformations at every level, you can define your own operations and abstractions in IR, allowing you to better model your application tasks. Thus, MLIR is more than a pure compiler infrastructure, which LLVM is.

Although MLIR works as a compiler for ML, it also allows the use of machine learning technologies! This is very important for engineers developing numerical libraries, and cannot provide support for the full variety of ML models and hardware. The flexibility of MLIR makes it easier to explore strategies for code descent when moving between levels of abstraction.

What's next


We have opened a GitHub repository and invite everyone interested (check out our guide!). We will be releasing something more than this toolbox - the TensorFlow and TF Lite dialect specifications, in the coming months. We can tell you more, in order to find out more, see the presentation by Chris Luttner and our README on Github .

If you want to keep abreast of all things related to MLIR, join our new mailing list , which will soon focus on announcements of future releases of our project. Stay with us!

Also popular now: