Creating jpeg out of nowhere

Original author: Michal Zalewski
  • Transfer
Here is an interesting demonstration of afl's capabilities ; I was really surprised that it works!

$ mkdir in_dir
$ echo 'hello' >in_dir/hello
$ ./afl-fuzz -i in_dir -o out_dir ./jpeg-9a/djpeg

In essence, I created a text file with only the word "hello" and asked the fuzzer to output a stream to a program that expects a JPEG image as input ( djpeg is a simple utility that comes with the common IJG jpeg graphics library ; libjpeg-turbo should also work) . Of course, my input is not like a valid image, so the utility quickly rejects them:

$ ./djpeg '../out_dir/queue/id:000000,orig:hello'
Not a JPEG file: starts with 0x68 0x65

Usually such a fuzzing would be completely meaningless: in essence, there is no chance that a traditional format-independent fuzzer is able to someday turn the word "hello" into a real JPEG image. The likelihood that dozens of random settings line up one after another is astronomically small.

Fortunately, afl-fuzz can use simple assembly-level tools for its own purposes - and within a millisecond or so it notices that although setting the first byte to 0xff does not change the externally observed output, you can run a slightly different internal path in the test application . With this information, he decides to use this test case as the basis for future fuzzing rounds:

$ ./djpeg '../out_dir/queue/id:000001,src:000000,op:int8,pos:0,val:-1,+cov'
Not a JPEG file: starts with 0xff 0x65

After processing the second generation test case, the fuzzer almost immediately notices that setting the second byte to 0xd8 does something even more interesting:

$ ./djpeg '../out_dir/queue/id:000004,src:000001,op:havoc,rep:16,+cov'
Premature end of JPEG file
JPEG datastream contains no image

Here fazzer managed to synthesize a valid file header - and really understood its significance. Using such an extradition as the basis for the next round of fuzzing, he quickly begins to sink deeper and deeper into the essence. After several hundred generations and several hundred million calls to execve (), it finds more and more control structures that are necessary for a valid JPEG file - SOFs, Huffman tables, quantization tables, SOS markers, etc.:

$ ./djpeg '../out_dir/queue/id:000008,src:000004,op:havoc,rep:2,+cov'
Invalid JPEG file structure: two SOI markers
...
$ ./djpeg '../out_dir/queue/id:001005,src:000262+000979,op:splice,rep:2'
Quantization table 0x0e was not defined
...
$ ./djpeg '../out_dir/queue/id:001282,src:001005+001270,op:splice,rep:2,+cov' >.tmp; ls -l .tmp
-rw-r--r-- 1 lcamtuf lcamtuf 7069 Nov  7 09:29 .tmp

The first picture, obtained after six hours of fuzzing on an 8-core system, looks very modest: it is a clean gray rectangle 3 pixels high and 748 pixels wide. But from the moment of its opening, the fazzer starts using this picture as a basis - and quickly produces a wide range of more interesting pictures for each new way of execution:



Of course, the synthesis of a complete image from nowhere is an exceptional case, and hardly useful in practice. But for more prosaic purposes, fuzzers are suitable for stress testing any function in the target program. Equipped with a snap-in, evolutionary fuzzing using lesser-known features (e.g. progressive or arithmetic-encoded JPEG, black-and-white JPEG) can be used as an alternative to the giant high-quality case of various test cases that fuzzing starts with.

A great feature of the libjpeg caseIt is that it works without any special preparation: there is nothing special in the "hello" line, the fuzzer does not know anything about image parsing, it is not intended and is not configured to work specifically with this library. There are not even any command line switches to activate. You can set afl-fuzz on many other types of parsers with the same results: with bash it will write valid scripts ; with giflib to produce GIFs; with fileutils produce ELF files and set flags, create binaries for Atari 68xxx, boot sectors x86 and UTF-8 with BOM. In almost all cases, the impact of rigging on productivity is also minimal.

Of course, not everything is so smooth. At its core, afl-fuzz remains a brute force program. This makes it simple, fast and reliable, but also means that certain types of atomized checks in a large search space can become an insurmountable obstacle for the fuzzer. Here is a good example:

if (strcmp(header.magic_password, "h4ck3d by p1gZ")) goto terminate_now;

In practice, this means that afl-fuzz is unlikely to be able to "invent" PNG files or non-trivial HTML documents from scratch - and it needs a better starting point than just "hello". To invariably work with code constructs as in the above example, the universal fuzzer needs to understand the operation of the target binary at a completely different level. Scientists have made some progress in this regard, but we will have to wait another years for the emergence of frameworks that are able to quickly, simply and reliably work with diverse and complex code bases.

Several people asked me about symbolic performance and other things influenced by afl-fuzz ; I have collected some notes in this document .

Also popular now: