 November 30, 2017 at 10:01
 November 30, 2017 at 10:01To infect for good: how do we execute spurious code
Recently, we have been talking a lot about CRIU - a system of live container migration. But today we will talk about an even more interesting development: live application patching, as well as the Compel library, which allows you to do all this disgrace, giving hyperconverged systems a new level of flexibility. 

In the last post, we talked about the fact that live migration is relevant for various operating systems, which, unlike containers, work continuously and contain a lot of useful data. However, today it is only one side of developments in this area. After all, applications with a long lifespan, such as database management systems, file storages, and so on, are no less sensitive to shutdown.
Take for example the DBMS. Yes, it can be killed like a container and restarted. But how long it will take to repeat the download - only God knows. The process of recovering a cache from terabytes of diverse data can require a ton of additional resources. Needless to say, during its re-formation, the performance of all systems can be significantly reduced? But such a process can last several hours. Another example is services that serve long-playing network connections. Alas, not all protocols today provide a REST-style API. In practice, there are many cases in which you need to maintain a connection with a client for a long time. Rebooting such an application is fraught with loss of access to the service.
Compel Library
One very interesting opportunity was used in the CRIU project - the execution of the parasitic code by the process. This was necessary in order to prepare the container for migration. However, practice has shown that this feature has much more applications, and we moved it to a separate library - Compel. Now anyone can write a program in C, compile it in a special way, and then take it and “load” it into a living process. She will work there, and then the victim process will return to her main job.

By the way, Compel also runs our live application patching system, which is the first “third-party” (past Criu himself) user of this library. This happens as follows: with the help of Compel, the launched program is “patched” right during its operation. That is, you can put on your machine an updated version of the software and run this updated version almost instantly.
There are two difficulties in this matter: first, you need to generate the “patch” itself, since the binary code is updated, that is, executable instructions, and not the source code of the program. Secondly, you need to apply these changes so that the running program does not crash. It’s akin to beating heart surgery. Until today, such a solution was used only for the OS kernel, since restarting a new kernel meant rebooting the machine, and this took a long time. They didn’t do this for updating applications, but since it’s possible for the kernel, why not for applications? And recently we implemented the technology of live patching, it has already been posted on Github.
static long process_syscall (struct process_ctx_s * ctx, int nr,
unsigned long arg1, unsigned long arg2,
unsigned long arg3, unsigned long arg4,
unsigned long arg5, unsigned long arg6)
{
int ret;
long sret = -ENOSYS;
ret = compel_syscall (ctx-> ctl, nr, & sret,
arg1, arg2, arg3, arg4, arg5, arg6);
if (ret <0) {
pr_err ("Failed to execute syscall% d in% d \ n", nr, ctx-> pid);
return ret;
}
if (sret <0) {
errno = -sret;
return -1;
}
return sret;
}
Sample code that forces a process to make a system call
Technical details
Compel is a library for preparing, throwing into an arbitrary process, executing and downloading spurious code. In order to crank it all up, Compel uses the debugger interface - the ptrace system call. Technically, it looks like this: Compel joins the process, stops it, then starts to correct its memory and registers.
However, to do all this was not so simple. There are subtleties to stopping a process: after all, a process can be stopped even without a debugger using the SIGSTOP signal. Therefore, for a long time there was a serious problem when working with the kernel: when the debugger stopped (ptrace) a process that was previously stopped by a signal (sigstop), the process again “woke up” as soon as the debugger turned off. Of course, it was possible before the debugger turned off to send another STOP signal to the process, and then it would have been stopped in any case. But at the same time, it was impossible without dancing with a tambourine to find out whether the process was stopped by the stop signal when connected or not. For debugging, this situation is more or less acceptable, but for a program that photographs the status of processes (i.e. for Criu or application patching) - no. Especially to circumvent this point, an alternative method of stopping processes was developed, which did not lose information about whether it was stopped by a signal or not. This method is called PTRACE_SEIZE and today it is in all distribution kernels and is used including the strace utility.
int compel_interrupt_task (int pid)
{
int ret;
ret = ptrace (PTRACE_SEIZE, pid, NULL, 0);
if (ret) {
/ *
* ptrace API doesn't allow to distinguish
* attaching to zombie from other errors.
* All errors will be handled in compel_wait_task ().
* /
pr_warn ("Unable to interrupt task:% d (% s) \ n", pid, strerror (errno));
return ret;
}
/ *
* If we SEIZE-d the task stop it before going
* and reading its stat from proc. Otherwise task
* may die _while_ we're doing it and we'll have
* inconsistent seize / state pair.
*
* If task dies after we seize it but before we
* do this interrupt, we'll notice it via proc.
* /
ret = ptrace (PTRACE_INTERRUPT, pid, NULL, NULL);
if (ret <0) {
pr_warn ("SEIZE% d: can't interrupt task:% s", pid, strerror (errno));
if (ptrace (PTRACE_DETACH, pid, NULL, NULL))
pr_perror ( «Unable to detach from% d», pid);
}
return ret;
}
Code for SEIZE
By the way, strace is trying to act the old fashioned way if the SEIZE operation fails. But for CRIU, it is useless. If SEIZE does not work, then saving the process state is not possible. We are sometimes asked whether it is possible to make CRIU work on those kernels where there is no SEIZE. We say that it is theoretically possible, for this it will be necessary to write SEIZE support in Compel. However, this is not done deliberately, since then it will be impossible to guarantee the correct operation of Criu on stopped processes.
There is another caveat regarding signal processing. You can send a signal to a process stopped by the debugger, and to process it, the debugger itself will be awakened, which will decide what to do with the arrived signal. In the process of loading stray code, Compel certainly encounters situations in which the “prepared” process currently receives signals from the outside.
At first we tried to write code that could resolve this situation, but it turned out to be too difficult to maintain, and with any changes there was a huge risk that the signal processing would fail. So we decided to go the other way. Fortunately, Linux has the ability to block signals for a process, in which case debugging becomes much easier. However, the blocking interface is designed in such a way that the process can block signals only by itself. You may ask: we are loading parasitic code into the process and can block signals from it, what is the problem? But there is a problem: while the parasite is loading, the signals must be processed, and loading the parasite, as you know, is quite complicated in itself, although the lack of the need to process the signals after it does not greatly simplify the task.
To make life easier for themselves and, as it soon turned out, the developers of the gdb debugger, a way was added to the kernel to block signals to the debugged process. This was done as another extension to the ptrace call. After that, all the code for working with the parasite was greatly facilitated, but, alas, Compel (and Criu) lost the ability to work on kernels without this interface. However, unlike the SEIZE operation, it is possible to train Criu and Compel to work without the ability to block signals to an arbitrary process, although it will require tremendous effort.
static int arasite_run (pid_t pid, int cmd, unsigned long ip, void * stack,
user_regs_struct_t * regs, struct thread_ctx * octx)
{
k_rtsigset_t block;
ksigfillset (& block);
if (ptrace (PTRACE_SETSIGMASK, pid, sizeof (k_rtsigset_t), & block)) {
pr_perror ("Can't block signals for% d", pid);
goto err_sig;
}
parasite_setup_regs (ip, stack, regs);
if (ptrace_set_regs (pid, regs)) {
pr_perror ("Can't set registers for% d", pid);
goto err_regs;
}
if (ptrace (cmd, pid, NULL, NULL)) {
pr_perror ("Can't run parasite at% d", pid);
goto err_cont;
}
Return 0;
err_cont:
if (ptrace_set_regs (pid, & octx-> regs))
pr_perror ("Can't restore regs for% d", pid);
err_regs:
if (ptrace (PTRACE_SETSIGMASK, pid, sizeof (k_rtsigset_t), & octx-> sigmask))
pr_perror ("Can't restore sigmask for% d", pid);
err_sig:
return -1;
}
The method of blocking signals for a debugged process
Fortunately, today this problem has ceased to be acute. Both SEIZE and signal blocking have become part of the Linux kernel functionality starting from 3.11 (and, of course, any newer versions), so the minimum system requirements for running Compel and Criu in particular are to use the kernel version 3.11 or later.
Infect, use!
Compel is currently available on Github.and can be used by anyone to run a spurious code in any process. You can simply use the springboard in assembler, which helps to join the process and make it do something - unload part of the memory, replace the data with it, or update it. Today, there are many processes that would be nice to fix without stopping, and Compel allows you to do it your way ... well, or you can use a ready-made utility for patching applications.

In the last post, we talked about the fact that live migration is relevant for various operating systems, which, unlike containers, work continuously and contain a lot of useful data. However, today it is only one side of developments in this area. After all, applications with a long lifespan, such as database management systems, file storages, and so on, are no less sensitive to shutdown.
Take for example the DBMS. Yes, it can be killed like a container and restarted. But how long it will take to repeat the download - only God knows. The process of recovering a cache from terabytes of diverse data can require a ton of additional resources. Needless to say, during its re-formation, the performance of all systems can be significantly reduced? But such a process can last several hours. Another example is services that serve long-playing network connections. Alas, not all protocols today provide a REST-style API. In practice, there are many cases in which you need to maintain a connection with a client for a long time. Rebooting such an application is fraught with loss of access to the service.
Compel Library
One very interesting opportunity was used in the CRIU project - the execution of the parasitic code by the process. This was necessary in order to prepare the container for migration. However, practice has shown that this feature has much more applications, and we moved it to a separate library - Compel. Now anyone can write a program in C, compile it in a special way, and then take it and “load” it into a living process. She will work there, and then the victim process will return to her main job.

By the way, Compel also runs our live application patching system, which is the first “third-party” (past Criu himself) user of this library. This happens as follows: with the help of Compel, the launched program is “patched” right during its operation. That is, you can put on your machine an updated version of the software and run this updated version almost instantly.
There are two difficulties in this matter: first, you need to generate the “patch” itself, since the binary code is updated, that is, executable instructions, and not the source code of the program. Secondly, you need to apply these changes so that the running program does not crash. It’s akin to beating heart surgery. Until today, such a solution was used only for the OS kernel, since restarting a new kernel meant rebooting the machine, and this took a long time. They didn’t do this for updating applications, but since it’s possible for the kernel, why not for applications? And recently we implemented the technology of live patching, it has already been posted on Github.
static long process_syscall (struct process_ctx_s * ctx, int nr,
unsigned long arg1, unsigned long arg2,
unsigned long arg3, unsigned long arg4,
unsigned long arg5, unsigned long arg6)
{
int ret;
long sret = -ENOSYS;
ret = compel_syscall (ctx-> ctl, nr, & sret,
arg1, arg2, arg3, arg4, arg5, arg6);
if (ret <0) {
pr_err ("Failed to execute syscall% d in% d \ n", nr, ctx-> pid);
return ret;
}
if (sret <0) {
errno = -sret;
return -1;
}
return sret;
}
Sample code that forces a process to make a system call
Technical details
Compel is a library for preparing, throwing into an arbitrary process, executing and downloading spurious code. In order to crank it all up, Compel uses the debugger interface - the ptrace system call. Technically, it looks like this: Compel joins the process, stops it, then starts to correct its memory and registers.
However, to do all this was not so simple. There are subtleties to stopping a process: after all, a process can be stopped even without a debugger using the SIGSTOP signal. Therefore, for a long time there was a serious problem when working with the kernel: when the debugger stopped (ptrace) a process that was previously stopped by a signal (sigstop), the process again “woke up” as soon as the debugger turned off. Of course, it was possible before the debugger turned off to send another STOP signal to the process, and then it would have been stopped in any case. But at the same time, it was impossible without dancing with a tambourine to find out whether the process was stopped by the stop signal when connected or not. For debugging, this situation is more or less acceptable, but for a program that photographs the status of processes (i.e. for Criu or application patching) - no. Especially to circumvent this point, an alternative method of stopping processes was developed, which did not lose information about whether it was stopped by a signal or not. This method is called PTRACE_SEIZE and today it is in all distribution kernels and is used including the strace utility.
int compel_interrupt_task (int pid)
{
int ret;
ret = ptrace (PTRACE_SEIZE, pid, NULL, 0);
if (ret) {
/ *
* ptrace API doesn't allow to distinguish
* attaching to zombie from other errors.
* All errors will be handled in compel_wait_task ().
* /
pr_warn ("Unable to interrupt task:% d (% s) \ n", pid, strerror (errno));
return ret;
}
/ *
* If we SEIZE-d the task stop it before going
* and reading its stat from proc. Otherwise task
* may die _while_ we're doing it and we'll have
* inconsistent seize / state pair.
*
* If task dies after we seize it but before we
* do this interrupt, we'll notice it via proc.
* /
ret = ptrace (PTRACE_INTERRUPT, pid, NULL, NULL);
if (ret <0) {
pr_warn ("SEIZE% d: can't interrupt task:% s", pid, strerror (errno));
if (ptrace (PTRACE_DETACH, pid, NULL, NULL))
pr_perror ( «Unable to detach from% d», pid);
}
return ret;
}
Code for SEIZE
By the way, strace is trying to act the old fashioned way if the SEIZE operation fails. But for CRIU, it is useless. If SEIZE does not work, then saving the process state is not possible. We are sometimes asked whether it is possible to make CRIU work on those kernels where there is no SEIZE. We say that it is theoretically possible, for this it will be necessary to write SEIZE support in Compel. However, this is not done deliberately, since then it will be impossible to guarantee the correct operation of Criu on stopped processes.
There is another caveat regarding signal processing. You can send a signal to a process stopped by the debugger, and to process it, the debugger itself will be awakened, which will decide what to do with the arrived signal. In the process of loading stray code, Compel certainly encounters situations in which the “prepared” process currently receives signals from the outside.
At first we tried to write code that could resolve this situation, but it turned out to be too difficult to maintain, and with any changes there was a huge risk that the signal processing would fail. So we decided to go the other way. Fortunately, Linux has the ability to block signals for a process, in which case debugging becomes much easier. However, the blocking interface is designed in such a way that the process can block signals only by itself. You may ask: we are loading parasitic code into the process and can block signals from it, what is the problem? But there is a problem: while the parasite is loading, the signals must be processed, and loading the parasite, as you know, is quite complicated in itself, although the lack of the need to process the signals after it does not greatly simplify the task.
To make life easier for themselves and, as it soon turned out, the developers of the gdb debugger, a way was added to the kernel to block signals to the debugged process. This was done as another extension to the ptrace call. After that, all the code for working with the parasite was greatly facilitated, but, alas, Compel (and Criu) lost the ability to work on kernels without this interface. However, unlike the SEIZE operation, it is possible to train Criu and Compel to work without the ability to block signals to an arbitrary process, although it will require tremendous effort.
static int arasite_run (pid_t pid, int cmd, unsigned long ip, void * stack,
user_regs_struct_t * regs, struct thread_ctx * octx)
{
k_rtsigset_t block;
ksigfillset (& block);
if (ptrace (PTRACE_SETSIGMASK, pid, sizeof (k_rtsigset_t), & block)) {
pr_perror ("Can't block signals for% d", pid);
goto err_sig;
}
parasite_setup_regs (ip, stack, regs);
if (ptrace_set_regs (pid, regs)) {
pr_perror ("Can't set registers for% d", pid);
goto err_regs;
}
if (ptrace (cmd, pid, NULL, NULL)) {
pr_perror ("Can't run parasite at% d", pid);
goto err_cont;
}
Return 0;
err_cont:
if (ptrace_set_regs (pid, & octx-> regs))
pr_perror ("Can't restore regs for% d", pid);
err_regs:
if (ptrace (PTRACE_SETSIGMASK, pid, sizeof (k_rtsigset_t), & octx-> sigmask))
pr_perror ("Can't restore sigmask for% d", pid);
err_sig:
return -1;
}
The method of blocking signals for a debugged process
Fortunately, today this problem has ceased to be acute. Both SEIZE and signal blocking have become part of the Linux kernel functionality starting from 3.11 (and, of course, any newer versions), so the minimum system requirements for running Compel and Criu in particular are to use the kernel version 3.11 or later.
Infect, use!
Compel is currently available on Github.and can be used by anyone to run a spurious code in any process. You can simply use the springboard in assembler, which helps to join the process and make it do something - unload part of the memory, replace the data with it, or update it. Today, there are many processes that would be nice to fix without stopping, and Compel allows you to do it your way ... well, or you can use a ready-made utility for patching applications.