mkevac February 8, 2017 at 09:25

How to work with JIT

Transfer

enter image description here

On some internal systems, we use JIT in Badoo to quickly search a large bitmap. This is a very interesting and not the most famous topic. And to fix such an annoying situation, I translated a useful article by Eli Benderski about what JIT is and how to use it.

I previously published an introductory article on libjit for programmers who are already familiar with JIT. At least a little. In that post I described JIT quite briefly, and in this I will do a full review of JIT and supplement it with examples, the code in which does not require any additional libraries.

Jit definition

JIT is an acronym for “Just In Time” or, if translated into Russian, “on the fly”. This does not tell us anything and sounds as if it has nothing to do with programming. It seems to me that this JIT description is most likely the truth:

If a program during its execution creates and executes some new executable code that was not part of the original program on disk, this is JIT.

But where did this name come from? Fortunately, John Icock of the University of Calgary has written a very interesting article entitled “A Brief History of JIT,” which looks at JIT techniques from a historical perspective. Judging by the article, the first mention of code generation and code execution while the program was running appeared in 1960 in an article about LISP written by McCarthy. In later works (for example, Thomson 's 1968 article on regular expressions) this approach is quite obvious (regular expressions are compiled into machine code and executed on the fly).

The very term JIT first appeared in James Gosling's Java books. Haycock says Gosling adopted the term from industrial production and began to use it in the early 90s. If you are interested in the details, then read the article by Aikok. Now let's see how everything described above works in practice.

JIT: generate machine code and run it

It seems to me that JIT is easier to understand if you immediately divide it into two phases:

Phase 1: generating machine code while the program is running
Phase 2: execution of machine code while the program is running

The first phase is 99% of the total JIT complexity. But at the same time, this is the most banal part of the process: this is exactly what a regular compiler does. Well-known compilers, such as gcc and clang / llvm, translate sources from C / C ++ to machine code. Further, the machine code is usually stored in a file, but it makes no sense not to leave it in memory (in fact, both gcc and clang / llvm have ready-made options for storing code in memory for use in JIT). But in this article, I would like to focus on the second phase.

Executing Generated Code

Modern operating systems are very selective in what the program is allowed to do during its operation. The times of the wild west ended with the advent of a protected mode , which allows the operating system to set various rights to various pieces of process memory. That is, in “normal” mode, you can allocate memory on the heap, but you cannot just execute the code that is allocated on the heap without first explicitly asking the OS about it.

I hope everyone understands that machine code is just data, a set of bytes. Like this, for example:

unsigned char[] code = {0x48, 0x89, 0xf8};

For some, these three bytes are just three bytes, and for some, a binary representation of valid x86-64 code:

mov %rdi, %rax

Putting this machine code into memory is very easy. But how to make it executable and, in fact, execute?

Let's look at the code

Later in this article, there will be code examples for a POSIX-compatible UNIX operating system (namely Linux). On other operating systems (such as Windows), the code will differ in details, but not in the approach. All modern operating systems have convenient APIs for doing the same.

Without further ado, let's see how to dynamically create a function in memory and execute it. This feature is specially made very simple. In C, it looks like this:

long add4(long num) {
  return num + 4;
}

Here is the first attempt (the full source code along with the Makefile is available in the repository):

#include 
#include 
#include 
#include 
// Выделяет RWX память заданного размера и возвращает указатель на нее. В случае ошибки
// печатает ошибку и возвращает NULL.
void* alloc_executable_memory(size_t size) {
  void* ptr = mmap(0, size,
                   PROT_READ | PROT_WRITE | PROT_EXEC,
                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  if (ptr == (void*)-1) {
    perror("mmap");
    return NULL;
  }
  return ptr;
}
void emit_code_into_memory(unsigned char* m) {
  unsigned char code[] = {
    0x48, 0x89, 0xf8,                   // mov %rdi, %rax
    0x48, 0x83, 0xc0, 0x04,             // add $4, %rax
    0xc3                                // ret
  };
  memcpy(m, code, sizeof(code));
}
const size_t SIZE = 1024;
typedef long (*JittedFunc)(long);
// Выделяет RWX память напрямую.
void run_from_rwx() {
  void* m = alloc_executable_memory(SIZE);
  emit_code_into_memory(m);
  JittedFunc func = m;
  int result = func(2);
  printf("result = %d\n", result);
}

The three main steps this code executes are:

Using mmap to allocate a piece of memory on a heap that you can write to, from which you can read, and which you can execute.
Copying machine code implementing add4 to this memory.
Code execution from this memory by converting a pointer to a pointer to a function and calling it through this pointer.

Please note that the third stage is possible only when a piece of memory with machine code has execution rights. Without the necessary rights, a function call would lead to an OS error (most likely a segmentation error). This will happen if, for example, we allocate m by a normal call to malloc, which allocates RW memory, but not X.

Distract for a moment: heap, malloc and mmap

Attentive readers may have noticed that I spoke of the memory allocated by mmap as “memory from the heap”. Strictly speaking, “heap” is the name for the memory source that functions use malloc, freeamong others. Unlike the stack, which is directly controlled by the compiler.

But not so simple. :-) If traditionally (that is, a very long time ago) mallocused only one source for allocated memory (system call sbrk), then now most implementations use it mallocin many cases mmap. Details differ from OS to OS in different implementations, but usually mmap is used for large chunks of memory, and sbrkfor small ones. The difference in efficiency when using one or the other method of obtaining memory from the operating system.

So to call the memory received from mmap “memory from the heap” is not an error, in my opinion, and I intend to continue to use this name.

We care about security

The code above has a serious vulnerability. The reason for the block of RWX memory that he allocates is a paradise for exploits. Let's be a little more responsible. Here is the slightly modified code:

// Выделяет RW память заданного размера и возвращает указатель на нее. В случае ошибки
// печатает ошибку и возвращает NULL. В отличие от malloc, память выделяется
// на границе страниц памяти, так что ее можно использовать при вызове mprotect.
void* alloc_writable_memory(size_t size) {
  void* ptr = mmap(0, size,
                   PROT_READ | PROT_WRITE,
                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  if (ptr == (void*)-1) {
    perror("mmap");
    return NULL;
  }
  return ptr;
}
// Ставит RX права на этот кусок выровненной памяти. Возвращает
// 0 при успехе. При ошибке печатает ошибку и возвращает -1.
int make_memory_executable(void* m, size_t size) {
  if (mprotect(m, size, PROT_READ | PROT_EXEC) == -1) {
    perror("mprotect");
    return -1;
  }
  return 0;
}
// Выделяет RW память, сохраняет код в нее и меняет права на RX перед
// исполнением.
void emit_to_rw_run_from_rx() {
  void* m = alloc_writable_memory(SIZE);
  emit_code_into_memory(m);
  make_memory_executable(m, SIZE);
  JittedFunc func = m;
  int result = func(2);
  printf("result = %d\n", result);
}

This example is equivalent to the previous example in all respects, except for one: the memory is first allocated with RW rights (as with the usual one malloc). These are sufficient rights so that we can write our piece of code there. After the code is already in memory, we use mprotectto change the rights from RW to RX, prohibiting writing. As a result, the effect is the same, but at no stage is our memory both rewritable and executable at the same time. This is good and correct in terms of security.

What about malloc?

Could we use mallocinstead mmapto allocate memory in the previous code? After all, RW-memory is exactly what gives us malloc. Yes we could. But there are more problems than amenities. The fact is that rights can be set only on whole pages. And, allocating memory with help malloc, we would need to manually make sure that the memory is aligned on the page border. Mmapsolves this problem in such a way that it always allocates aligned memory (because mmapby definition it works only with whole pages).

To summarize

This article began with a general overview of JIT, what we generally mean when we say “JIT,” and ended with code examples that demonstrate how to dynamically execute a piece of machine code from memory. The techniques presented in this article are about how JIT is done in real JIT systems (LLVM or libjit). All that remains is the “simple" part of generating machine code from some other representation.

LLVM contains a full-fledged compiler, so that it can translate C and C ++ code (via LLVM IR) into machine code on the fly and execute it. Libjit works at a much lower level: it can serve as a backend for the compiler. My introductory article on libjit demonstrates how to generate and execute non-trivial code using this library. But JIT is a much more general concept. You can create code on the fly for data structures , regular expressions, and even to access C from virtual machines of various languages . I rummaged through the archives of my blog and found a mention of JIT in an article eight years ago. It's about Perl code that generates other Perl code on the fly (from an XML description file), but the idea is the same.

This is why I think it is important to describe JIT, separating the two phases. For the second phase (which I described in this article), the implementation is pretty banal and uses standard operating system APIs. For the first phase of possibilities, an infinite amount. And what exactly will be in it ultimately depends on the specific application that you are developing.

Tags: