Limiting the memory available to the program

Original author: avd
  • Transfer
I somehow decided to tackle the task of sorting a million integers with an available memory of 1 MB. But before that I had to think about how to limit the amount of available memory for the program. And that’s what I came up with.

Process virtual memory


Before you plunge into different methods of limiting memory, you need to know how the virtual memory of the process works. The best article on this topic is “Anatomy of a Program in Memory” .

After reading the article, I can offer two options for limiting memory: reduce virtual address space or heap volume.

First: reducing the amount of address space. It is quite simple, but not entirely correct. We cannot reduce all the space to 1 MB - there is not enough space for the kernel and libraries.

Second: reduce the volume of the heap. This is not so easy to do, and usually no one does, because it is only available through fuss with the linker. But for our task this would be a more correct option.

I will also consider other methods, such as tracking memory usage by intercepting library and system calls, and changing the program environment by emulating and introducing a sandbox.

For testing, we will use a small program called big_alloc, which places, and then releases 100 MiB.

#include 
#include 
#include 
#include 
// 1000 раз по 100 KiB = 100 000 KiB = 100 MiB
#define NALLOCS 1000
#define ALLOC_SIZE 1024*100 // 100 KiB
int main(int argc, const char *argv[])
{
    int i = 0;
    int **pp;
    bool failed = false;
    pp = malloc(NALLOCS * sizeof(int *));
    for(i = 0; i < NALLOCS; i++)
    {
        pp[i] = malloc(ALLOC_SIZE);
        if (!pp[i])
        {
            perror("malloc");
            printf("Облом после %d размещений\n", i);
            failed = true;
            break;
        }
        // Обратимся к нескольким байтам памяти, чтобы обмануть copy-on-write.
        memset(pp[i], 0xA, 100);
        printf("pp[%d] = %p\n", i, pp[i]);
    }
    if (!failed)
        printf("Успешно разместили %d байтов\n", NALLOCS * ALLOC_SIZE);
    for(i = 0; i < NALLOCS; i++)
    {
        if (pp[i])
            free(pp[i]);
    }
    free(pp);
    return 0;
}


All sources are on github .

ulimit


What the old unix hacker immediately recalls when he needs to limit memory. This is a utility from bash that allows you to limit the resources of the program. In fact, this is the interface to setrlimit.

We can set a limit on the amount of memory for the program.

$ ulimit -m 1024


We check:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 7802
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) 1024
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


We set a limit of 1024 kb - 1 MiB. But if we try to run the program, it will work without errors. Despite the limit of 1024 kb, in top it is clear that the program takes as much as 4872 kb.

The reason is that Linux does not impose strict restrictions, and in man it is written about this:

ulimit [-HSTabcdefilmnpqrstuvx [limit]]
    ...
    -m     The maximum resident set size (many systems do not honor this limit)

There is also the ulimit -d option, which should work , but still does not work due to mmap (see the linker section).

QEMU


For manipulating the software environment, QEMU is perfect. It has the –R option to limit the virtual address space. But it cannot be limited to too small values ​​- libc and kernel will not fit.

Look:

$ qemu-i386 -R 1048576 ./big_alloc
big_alloc: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory

Here -R 1048576 leaves 1 MiB per virtual address space.

To do this, you need to allocate something of the order of 20 MB. Here:

$ qemu-i386 -R 20M ./big_alloc
malloc: Cannot allocate memory
Failed after 100 allocations

Stops after 100 iterations (10 MB).

In general, QEMU is still the leader among the methods for limiting, you just have to play with the –R value.

Container


Another option is to run the program in a container and limit resources. To do this, you can:
  • use some docker
  • use usermode tools from lxc package
  • write your script with libvirt.
  • something else…


But resources will be limited by using a Linux subsystem called cgroups. You can play with them directly, but I recommend through lxc. I would like to use docker, but it only works on 64-bit machines.

LXC is LinuX Containers. This is a set of tools and libraries from userspace for managing kernel functions and creating containers - isolated secure environments for applications, or for the entire system.

The kernel functions are as follows:
  • Control groups (cgroups)
  • Kernel namespaces
  • chroot
  • Kernel capabilities
  • SELinux, AppArmor
  • Seccomp policies


The documentation can be found on the offsite or on the author’s blog .

To run the application in the container, you must provide the lxc-execute config, where you specify all the settings for the container. You can start with the examples in / usr / share / doc / lxc / examples. Man recommends starting with lxc-macvlan.conf. Let's start:

# cp /usr/share/doc/lxc/examples/lxc-macvlan.conf lxc-my.conf
# lxc-execute -n foo -f ./lxc-my.conf ./big_alloc
Successfully allocated 102400000 bytes


Works!

Now let's limit the memory with cgroup. LXC allows you to configure the memory subsystem for the cgroup container by setting memory limits. Parameters can be found in the RedHat documentation . I found 2:

  • memory.limit_in_bytes - sets the maximum amount of user memory, including file cache
  • memory.memsw.limit_in_bytes - sets the maximum amount in the sum of memory and swap

What I added in lxc-my.conf:

lxc.cgroup.memory.limit_in_bytes = 2M
lxc.cgroup.memory.memsw.limit_in_bytes = 2M

We launch:

# lxc-execute -n foo -f ./lxc-my.conf ./big_alloc
#

Silence - apparently, there is too little memory. Let's try to run from the shell

# lxc-execute -n foo -f ./lxc-my.conf /bin/bash
#

bash did not start. Let's try / bin / sh:

# lxc-execute -n foo -f ./lxc-my.conf -l DEBUG -o log /bin/sh
sh-4.2# ./dev/big_alloc/big_alloc 
Killed

And in dmesg you can track the glorious death of the process:

[15447.035569] big_alloc invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
...
[15447.035779] Task in /lxc/foo
[15447.035785]  killed as a result of limit of 
[15447.035789] /lxc/foo
[15447.035795] memory: usage 3072kB, limit 3072kB, failcnt 127
[15447.035800] memory+swap: usage 3072kB, limit 3072kB, failcnt 0
[15447.035805] kmem: usage 0kB, limit 18014398509481983kB, failcnt 0
[15447.035808] Memory cgroup stats for /lxc/foo: cache:32KB rss:3040KB rss_huge:0KB mapped_file:0KB writeback:0KB swap:0KB inactive_anon:1588KB active_anon:1448KB inactive_file:16KB active_file:16KB unevictable:0KB
[15447.035836] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[15447.035963] [ 9225]     0  9225      942      308      10        0 0 init.lxc
[15447.035971] [ 9228]     0  9228      833      698       6        0 0 sh
[15447.035978] [ 9252]     0  9252    16106      843      36        0 0 big_alloc
[15447.035983] Memory cgroup out of memory: Kill process 9252 (big_alloc) score 1110 or sacrifice child
[15447.035990] Killed process 9252 (big_alloc) total-vm:64424kB, anon-rss:2396kB, file-rss:976kB

Although we did not receive an error message from big_alloc regarding malloc failure and the amount of available memory, it seems to me that we successfully limited the memory using containers. For now, let's dwell on this

Linker


Let's try to change the binary image, limiting the available heap space. Layout is the last step in building a program. For this, the linker and its script are used. A script is a description of sections of a program in memory, along with all sorts of attributes and other things.

Example build script:

ENTRY(main)
SECTIONS
{
  . = 0x10000;
  .text : { *(.text) }
  . = 0x8000000;
  .data : { *(.data) }
  .bss : { *(.bss) }
}

The dot indicates the current position. For example, the .text section starts at 0 × 10000, and then, starting at 0 × 8000000, we have the following two sections: .data and .bss. The entry point is main.

Everything is cool, but it will not work in real programs. The main function that you write in C programs is not really the first one to be called. First, a lot of initializations and erasures are made. This code is contained in the C runtime library (crt) and is distributed among the crt # .o libraries in / usr / lib.

Details can be seen by running gcc –v. First, it calls ccl, creates assembler code, translates it to the object file via as, and finally collects everything together with ELF using collect2. collect2 - ld wrapper. It accepts an object file and 5 additional libraries to create the final binary image:

    /usr/lib/gcc/i686-redhat-linux/4.8.3/./././crt1.o
    /usr/lib/gcc/i686-redhat-linux/4.8.3/./././crti.o
    /usr/lib/gcc/i686-redhat-linux/4.8.3/crtbegin.o
    /tmp/ccEZwSgF.o <- объектный файл нашей программы
    /usr/lib/gcc/i686-redhat-linux/4.8.3/crtend.o
    /usr/lib/gcc/i686-redhat-linux/4.8.3/./././crtn.o

All this is very complicated, so instead of writing my own script, I will edit the default linker script. Get it by passing -Wl, -verbose to gcc:

gcc big_alloc.c -o big_alloc -Wl,-verbose

Now let's think about how to change it. Let's see how the binary is built by default. Compile and look for the address of the .data section. Here is the output of objdump -h big_alloc

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
...
12 .text         000002e4  080483e0  080483e0  000003e0  2**4
                 CONTENTS, ALLOC, LOAD, READONLY, CODE
...
23 .data         00000004  0804a028  0804a028  00001028  2**2
                 CONTENTS, ALLOC, LOAD, DATA
24 .bss          00000004  0804a02c  0804a02c  0000102c  2**2
                 ALLOC

The .text, .data and .bss sections are located around 128 MiB.

Let's see where the stack is with gdb:

[restrict-memory]$ gdb big_alloc 
...
Reading symbols from big_alloc...done.
(gdb) break main
Breakpoint 1 at 0x80484fa: file big_alloc.c, line 12.
(gdb) r
Starting program: /home/avd/dev/restrict-memory/big_alloc 
Breakpoint 1, main (argc=1, argv=0xbffff164) at big_alloc.c:12
12              int i = 0;
Missing separate debuginfos, use: debuginfo-install glibc-2.18-16.fc20.i686
(gdb) info registers 
eax            0x1      1
ecx            0x9a8fc98f       -1701852785
edx            0xbffff0f4       -1073745676
ebx            0x42427000       1111650304
esp            0xbffff0a0       0xbffff0a0
ebp            0xbffff0c8       0xbffff0c8
esi            0x0      0
edi            0x0      0
eip            0x80484fa        0x80484fa 
eflags         0x286    [ PF SF IF ]
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

esp points to 0xbffff0a0, which is about 3 GiB. So we have a bunch of ~ 2.9 GiB.

In the real world, the top address of the stack is random, it can be seen, for example, in the output:

# cat /proc/self/maps

As we know, the heap grows from the end of the .data towards the stack. What if we move the .data section as high as possible?

Let's place the data segment in 2 MiB in front of the stack. Take the top of the stack, subtract 2 MiB:

0xbffff0a0 - 0x200000 = 0xbfdff0a0

We shift all partitions starting with .data to this address:

. =     0xbfdff0a0
.data           :
{
  *(.data .data.* .gnu.linkonce.d.*)
  SORT(CONSTRUCTORS)
}

We compile:

$ gcc big_alloc.c -o big_alloc -Wl,-T hack.lst

The -Wl and -T hack.lst options tell the linker to use hack.lst as a script.

Let's look at the title:

Разделы:
Idx Name          Size      VMA       LMA       File off  Algn
 ...
 23 .data         00000004  bfdff0a0  bfdff0a0  000010a0  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 24 .bss          00000004  bfdff0a4  bfdff0a4  000010a4  2**2
                  ALLOC

And still the data is stored in memory. How? When I tried to look at the values ​​of the pointers returned by malloc, I saw that the placement starts somewhere after the end of the .data section at addresses like 0xbf8b7000, gradually continues with increasing pointers, and then returns to lower addresses like 0xb5e76000. It looks like a bunch is growing down.

If you think about it, there’s nothing strange about it. I checked the glibc sources and found that when brk fails, then mmap is used. So, glibc asks the kernel to place pages, the kernel sees that the process has a bunch of holes in virtual memory, and places the page in one of the empty places, after which glibc returns a pointer from it.

Running big_alloc under strace confirmed the theory. Look at the normal binary:

brk(0)                                  = 0x8135000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77df000
mmap2(NULL, 95800, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb77c7000
mmap2(0x4226d000, 1825436, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4226d000
mmap2(0x42425000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b8000) = 0x42425000
mmap2(0x42428000, 10908, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x42428000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77c6000
mprotect(0x42425000, 8192, PROT_READ)   = 0
mprotect(0x8049000, 4096, PROT_READ)    = 0
mprotect(0x42269000, 4096, PROT_READ)   = 0
munmap(0xb77c7000, 95800)               = 0
brk(0)                                  = 0x8135000
brk(0x8156000)                          = 0x8156000
brk(0)                                  = 0x8156000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77de000
brk(0)                                  = 0x8156000
brk(0x8188000)                          = 0x8188000
brk(0)                                  = 0x8188000
brk(0x81ba000)                          = 0x81ba000
brk(0)                                  = 0x81ba000
brk(0x81ec000)                          = 0x81ec000
...
brk(0)                                  = 0x9c19000
brk(0x9c4b000)                          = 0x9c4b000
brk(0)                                  = 0x9c4b000
brk(0x9c7d000)                          = 0x9c7d000
brk(0)                                  = 0x9c7d000
brk(0x9caf000)                          = 0x9caf000
...
brk(0)                                  = 0xe29c000
brk(0xe2ce000)                          = 0xe2ce000
brk(0)                                  = 0xe2ce000
brk(0xe300000)                          = 0xe300000
brk(0)                                  = 0xe300000
brk(0)                                  = 0xe300000
brk(0x8156000)                          = 0x8156000
brk(0)                                  = 0x8156000
+++ exited with 0 +++

And now for the modified one:

brk(0)                                  = 0xbf896000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb778f000
mmap2(NULL, 95800, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7777000
mmap2(0x4226d000, 1825436, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4226d000
mmap2(0x42425000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b8000) = 0x42425000
mmap2(0x42428000, 10908, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x42428000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7776000
mprotect(0x42425000, 8192, PROT_READ)   = 0
mprotect(0x8049000, 4096, PROT_READ)    = 0
mprotect(0x42269000, 4096, PROT_READ)   = 0
munmap(0xb7777000, 95800)               = 0
brk(0)                                  = 0xbf896000
brk(0xbf8b7000)                         = 0xbf8b7000
brk(0)                                  = 0xbf8b7000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb778e000
brk(0)                                  = 0xbf8b7000
brk(0xbf8e9000)                         = 0xbf8e9000
brk(0)                                  = 0xbf8e9000
brk(0xbf91b000)                         = 0xbf91b000
brk(0)                                  = 0xbf91b000
brk(0xbf94d000)                         = 0xbf94d000
brk(0)                                  = 0xbf94d000
brk(0xbf97f000)                         = 0xbf97f000
...
brk(0)                                  = 0xbff8e000
brk(0xbffc0000)                         = 0xbffc0000
brk(0)                                  = 0xbffc0000
brk(0xbfff2000)                         = 0xbffc0000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7676000
brk(0)                                  = 0xbffc0000
brk(0xbfffa000)                         = 0xbffc0000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7576000
brk(0)                                  = 0xbffc0000
brk(0xbfffa000)                         = 0xbffc0000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7476000
brk(0)                                  = 0xbffc0000
brk(0xbfffa000)                         = 0xbffc0000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7376000
...
brk(0)                                  = 0xbffc0000
brk(0xbfffa000)                         = 0xbffc0000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb1c76000
brk(0)                                  = 0xbffc0000
brk(0xbfffa000)                         = 0xbffc0000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb1b76000
brk(0)                                  = 0xbffc0000
brk(0xbfffa000)                         = 0xbffc0000
mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb1a76000
brk(0)                                  = 0xbffc0000
brk(0)                                  = 0xbffc0000
brk(0)                                  = 0xbffc0000
...
brk(0)                                  = 0xbffc0000
brk(0)                                  = 0xbffc0000
brk(0)                                  = 0xbffc0000
+++ exited with 0 +++

Shifting the .data section to the stack in order to reduce the heap space does not make sense, since the kernel will place the page in empty space.

Sandbox


Another way to limit program memory is sandboxing. The difference from emulation is that we do not emulate anything, but simply monitor and control some things in the behavior of the program. Commonly used in security research, when you isolate a malware and analyze it so that it does not harm your system.

Trick with LD_PRELOAD

LD_PRELOAD is a special environment variable that forces the dynamic linker to use preloaded libraries in priority, including libc. This trick, by the way, is also used by some malware .

I wrote a simple sandbox that intercepts malloc / free calls, works with memory and returns ENOMEM when the limit has been reached.

To do this, I made a shared library with my implementations around malloc / free, increasing the counter by the volume of malloc, and decreasing when free is called. It is preloaded via LD_PRELOAD.

My malloc implementation:

void *malloc(size_t size)
{
    void *p = NULL;
    if (libc_malloc == NULL) 
        save_libc_malloc();
    if (mem_allocated <= MEM_THRESHOLD)
    {
        p = libc_malloc(size);
    }
    else
    {
        errno = ENOMEM;
        return NULL;
    }
    if (!no_hook) 
    {
        no_hook = 1;
        account(p, size);
        no_hook = 0;
    }
    return p;
}

libc_malloc - pointer to the original malloc from libc. no_hook local flag in the stream. Used so that you can use malloc in hooks and avoid recursive calls.

malloc is used implicitly in the account function by the uthash library . Why use a hash table? Because when you call free you only pass a pointer into it, and inside free it is not known how much memory has been allocated. Therefore, you have a table with key pointers and the amount of allocated memory in the form of values. Here is what I do in malloc:

struct malloc_item *item, *out;
item = malloc(sizeof(*item));
item->p = ptr;
item->size = size;
HASH_ADD_PTR(HT, p, item);
mem_allocated += size;
fprintf(stderr, "Alloc: %p -> %zu\n", ptr, size);

mem_allocated is that static variable which is compared with restriction in malloc.

Now when calling free, the following happens:

struct malloc_item *found;
HASH_FIND_PTR(HT, &ptr, found);
if (found)
{
    mem_allocated -= found->size;
    fprintf(stderr, "Free: %p -> %zu\n", found->p, found->size);
    HASH_DEL(HT, found);
    free(found);
}
else
{
    fprintf(stderr, "Freeing unaccounted allocation %p\n", ptr);
}

Yes, just decrement mem_allocated.

And the best part is that it works.

[restrict-memory]$ LD_PRELOAD=./libmemrestrict.so ./big_alloc
pp[0] = 0x25ac210
pp[1] = 0x25c5270
pp[2] = 0x25de2d0
pp[3] = 0x25f7330
pp[4] = 0x2610390
pp[5] = 0x26293f0
pp[6] = 0x2642450
pp[7] = 0x265b4b0
pp[8] = 0x2674510
pp[9] = 0x268d570
pp[10] = 0x26a65d0
pp[11] = 0x26bf630
pp[12] = 0x26d8690
pp[13] = 0x26f16f0
pp[14] = 0x270a750
pp[15] = 0x27237b0
pp[16] = 0x273c810
pp[17] = 0x2755870
pp[18] = 0x276e8d0
pp[19] = 0x2787930
pp[20] = 0x27a0990
malloc: Cannot allocate memory
Failed after 21 allocations

Full library code on github

It turns out that LD_PRELOAD is a great way to limit memory

ptrace

ptrace is another way to build a sandbox. This is a system call that allows you to control the execution of another process. Built into various POSIX OS.

This is the basis of tracers like strace, ltrace, and almost all sandboxing programs - systrace, sydbox, mbox, and debuggers, including gdb.

I made my tool with ptrace. It tracks brk calls and measures the distance between the original break value and the new value, which is set by the next brk call.

The program forks and starts 2 processes. Parent - tracer, and child - tracer. In the child process, I call ptrace (PTRACE_TRACEME) and then execv. In the parent, I use ptrace (PTRACE_SYSCALL) to stop at syscall and filter out brk calls from the child, and then another ptrace (PTRACE_SYSCALL) to get the value returned by brk.

When brk goes beyond the given value, I set -ENOMEM as the brk return value. This is set in the eax register, so I just overwrite it with ptrace (PTRACE_SETREGS). Here is the most delicious part:

// Получить возвращаемое значение
if (!syscall_trace(pid, &state))
{
    dbg("brk return: 0x%08X, brk_start 0x%08X\n", state.eax, brk_start);
    if (brk_start) // We have start of brk
    {
        diff = state.eax - brk_start;
        // Если дочерний процесс превысил ограничение 
        // заменить возвращаемое значение brk на -ENOMEM
        if (diff > THRESHOLD || threshold) 
        {
            dbg("THRESHOLD!\n");
            threshold = true;
            state.eax = -ENOMEM;
            ptrace(PTRACE_SETREGS, pid, 0, &state);
        }
        else
        {
            dbg("diff 0x%08X\n", diff);
        }
    }
    else
    {
        dbg("Assigning 0x%08X to brk_start\n", state.eax);
        brk_start = state.eax;
    }
}

I also intercept calls to mmap / mmap2, since libc has enough brains to call them in case of problems with brk. So when the setpoint is exceeded and I see a mmap call, I break it off with ENOMEM.

Works!

[restrict-memory]$ ./ptrace-restrict ./big_alloc
pp[0] = 0x8958fb0
pp[1] = 0x8971fb8
pp[2] = 0x898afc0
pp[3] = 0x89a3fc8
pp[4] = 0x89bcfd0
pp[5] = 0x89d5fd8
pp[6] = 0x89eefe0
pp[7] = 0x8a07fe8
pp[8] = 0x8a20ff0
pp[9] = 0x8a39ff8
pp[10] = 0x8a53000
pp[11] = 0x8a6c008
pp[12] = 0x8a85010
pp[13] = 0x8a9e018
pp[14] = 0x8ab7020
pp[15] = 0x8ad0028
pp[16] = 0x8ae9030
pp[17] = 0x8b02038
pp[18] = 0x8b1b040
pp[19] = 0x8b34048
pp[20] = 0x8b4d050
malloc: Cannot allocate memory
Failed after 21 allocations

But I do not like it. It is tied to ABI, i.e. here you have to use rax instead of eax on a 64-bit machine, so you either have to make a separate version, or use #ifdef, or force the -m32 option to be used. And most likely it will not work on other POSIX-like systems that may have a different ABI.

Other ways


What else can you try (these options were rejected for various reasons):

  • hooks malloc . It says in man that they are no longer supported
  • Seccomp and prctl with PR_SET_MM_START_BRK. It may work - but, as the documentation says , this is not a sandbox, but a way to minimize the available kernel surface. That is, it will be even more awkward than using manual ptrace
  • libvirt-sandbox . Just a wrapper for lxc and qemu.
  • SELinux sandbox . It does not work, because it uses cgroup.

References



Also popular now: