First steps with Unicorn Engine
When searching for "Unicorn Engine" on Habré, I was surprised to find that this tool has never been in the article. I will try to fill this void. Let's start with the basics, and look at an example of using an emulator in real life. In order not to reinvent the wheel, I decided to just translate this manual. Before starting, I will say that all my comments or observations will look like this .
What is the Unicorn Engine?
The developers themselves write about Unicorn Engine Unicorn Engine like this:
Unicorn is a lightweight, multi-platform and multi-architecture processor emulator.
This is not a standard emulator. It does not emulate the work of the entire program or the whole OS. It does not support system commands (such as opening a file, outputting a character to the console, etc.). You will have to independently engage in memory markup and data loading into it, and then you simply start execution from some specific address.
So why is it useful?
- When analyzing viruses, you can call single functions without creating a malicious process.
- To address CTF.
- For fuzzing .
- Plugin for gdb to predict the future state, for example, future jumps or register values.
- Emulation obfursification code.
What do you need?
- Installed Unicorn Engine with link to Python.
- Disassembler.
Example
For example, let's take the task with hxp CTF 2017 under the name Fibonacci . Binary can be downloaded here .
When you start the program, it starts outputting our flag to the console, but very slowly. Each next byte of the flag is considered slower and slower.
The flag is: hxp{F
This means that in order to get the flag in a reasonable time, we need to optimize the operation of this application.
Using IDA Pro ( I personally used radare2 + Cutter ) we decompiled the code into C-like pseudocode. Despite the fact that the code was not properly decompiled, we can still get information from it about what is going on inside.
__int64 __fastcall main(__int64 a1, char **a2, char **a3){
void *v3; // rbp@1int v4; // ebx@1signed __int64 v5; // r8@2char v6; // r9@3
__int64 v7; // r8@3char v8; // cl@3
__int64 v9; // r9@5int a2a; // [sp+Ch] [bp-1Ch]@3
v3 = &encrypted_flag;
v4 = 0;
setbuf(stdout, 0LL);
printf("The flag is: ", 0LL);
while ( 1 )
{
LODWORD(v5) = 0;
do
{
a2a = 0;
fibonacci(v4 + v5, &a2a);
v8 = v7;
v5 = v7 + 1;
}
while ( v5 != 8 );
v4 += 8;
if ( (unsigned __int8)(a2a << v8) == v6 )
break;
v3 = (char *)v3 + 1;
_IO_putc((char)(v6 ^ ((_BYTE)a2a << v8)), stdout);
v9 = *((char *)v3 - 1);
}
_IO_putc(10, stdout);
return0LL;
}
unsignedint __fastcall fibonacci(int i, _DWORD *a2){
_DWORD *v2; // rbp@1unsignedint v3; // er12@3unsignedint result; // eax@3unsignedint v5; // edx@3unsignedint v6; // esi@3unsignedint v7; // edx@4
v2 = a2;
if ( i )
{
if ( i == 1 )
{
result = fibonacci(0, a2);
v5 = result - ((result >> 1) & 0x55555555);
v6 = ((result - ((result >> 1) & 0x55555555)) >> 2) & 0x33333333;
}
else
{
v3 = fibonacci(i - 2, a2);
result = v3 + fibonacci(i - 1, a2);
v5 = result - ((result >> 1) & 0x55555555);
v6 = ((result - ((result >> 1) & 0x55555555)) >> 2) & 0x33333333;
}
v7 = v6 + (v5 & 0x33333333) + ((v6 + (v5 & 0x33333333)) >> 4);
*v2 ^= ((BYTE1(v7) & 0xF) + (v7 & 0xF) + (unsigned __int8)((((v7 >> 8) & 0xF0F0F) + (v7 & 0xF0F0F0F)) >> 16)) & 1;
}
else
{
*a2 ^= 1u;
result = 1;
}
return result;
}
Here is the assembler code of the main and fibonacci functions:
.text:0x4004E0 main proc near ; DATA XREF: start+1Do
.text:0x4004E0
.text:0x4004E0 var_1C = dword ptr -1Ch
.text:0x4004E0
.text:0x4004E0 push rbp
.text:0x4004E1 push rbx
.text:0x4004E2 xor esi, esi ; buf
.text:0x4004E4 mov ebp, offset unk_4007E1
.text:0x4004E9 xor ebx, ebx
.text:0x4004EB sub rsp, 18h
.text:0x4004EF mov rdi, cs:stdout ; stream
.text:0x4004F6 call _setbuf
.text:0x4004FB mov edi, offset format ; "The flag is: "
.text:0x400500 xor eax, eax
.text:0x400502 call _printf
.text:0x400507 mov r9d, 49h
.text:0x40050D nop dword ptr [rax]
.text:0x400510
.text:0x400510 loc_400510: ; CODE XREF: main+8Aj
.text:0x400510 xor r8d, r8d
.text:0x400513 jmp short loc_40051B
.text:0x400513 ; ---------------------------------------------------------------------------
.text:0x400515 align 8
.text:0x400518
.text:0x400518 loc_400518: ; CODE XREF: main+67j
.text:0x400518 mov r9d, edi
.text:0x40051B
.text:0x40051B loc_40051B: ; CODE XREF: main+33j
.text:0x40051B lea edi, [rbx+r8]
.text:0x40051F lea rsi, [rsp+28h+var_1C]
.text:0x400524 mov [rsp+28h+var_1C], 0
.text:0x40052C call fibonacci
.text:0x400531 mov edi, [rsp+28h+var_1C]
.text:0x400535 mov ecx, r8d
.text:0x400538 add r8, 1
.text:0x40053C shl edi, cl
.text:0x40053E mov eax, edi
.text:0x400540 xor edi, r9d
.text:0x400543 cmp r8, 8
.text:0x400547 jnz short loc_400518
.text:0x400549 add ebx, 8
.text:0x40054C cmp al, r9b
.text:0x40054F mov rsi, cs:stdout ; fp
.text:0x400556 jz short loc_400570
.text:0x400558 movsx edi, dil ; c
.text:0x40055C add rbp, 1
.text:0x400560 call __IO_putc
.text:0x400565 movzx r9d, byte ptr [rbp-1]
.text:0x40056A jmp short loc_400510
.text:0x40056A ; ---------------------------------------------------------------------------
.text:0x40056C align 10h
.text:0x400570
.text:0x400570 loc_400570: ; CODE XREF: main+76j
.text:0x400570 mov edi, 0Ah ; c
.text:0x400575 call __IO_putc
.text:0x40057A add rsp, 18h
.text:0x40057E xor eax, eax
.text:0x400580 pop rbx
.text:0x400581 pop rbp
.text:0x400582 retn
.text:0x400582 main endp
.text:0x400670 fibonacci proc near ; CODE XREF: main+4Cp
.text:0x400670 ; fibonacci+19p ...
.text:0x400670 test edi, edi
.text:0x400672 push r12
.text:0x400674 push rbp
.text:0x400675 mov rbp, rsi
.text:0x400678 push rbx
.text:0x400679 jz short loc_4006F8
.text:0x40067B cmp edi, 1
.text:0x40067E mov ebx, edi
.text:0x400680 jz loc_400710
.text:0x400686 lea edi, [rdi-2]
.text:0x400689 call fibonacci
.text:0x40068E lea edi, [rbx-1]
.text:0x400691 mov r12d, eax
.text:0x400694 mov rsi, rbp
.text:0x400697 call fibonacci
.text:0x40069C add eax, r12d
.text:0x40069F mov edx, eax
.text:0x4006A1 mov ebx, eax
.text:0x4006A3 shr edx, 1
.text:0x4006A5 and edx, 55555555h
.text:0x4006AB sub ebx, edx
.text:0x4006AD mov ecx, ebx
.text:0x4006AF mov edx, ebx
.text:0x4006B1 shr ecx, 2
.text:0x4006B4 and ecx, 33333333h
.text:0x4006BA mov esi, ecx
.text:0x4006BC
.text:0x4006BC loc_4006BC: ; CODE XREF: fibonacci+C2j
.text:0x4006BC and edx, 33333333h
.text:0x4006C2 lea ecx, [rsi+rdx]
.text:0x4006C5 mov edx, ecx
.text:0x4006C7 shr edx, 4
.text:0x4006CA add edx, ecx
.text:0x4006CC mov esi, edx
.text:0x4006CE and edx, 0F0F0F0Fh
.text:0x4006D4 shr esi, 8
.text:0x4006D7 and esi, 0F0F0Fh
.text:0x4006DD lea ecx, [rsi+rdx]
.text:0x4006E0 mov edx, ecx
.text:0x4006E2 shr edx, 10h
.text:0x4006E5 add edx, ecx
.text:0x4006E7 and edx, 1
.text:0x4006EA xor [rbp+0], edx
.text:0x4006ED pop rbx
.text:0x4006EE pop rbp
.text:0x4006EF pop r12
.text:0x4006F1 retn
.text:0x4006F1 ; ---------------------------------------------------------------------------
.text:0x4006F2 align 8
.text:0x4006F8
.text:0x4006F8 loc_4006F8: ; CODE XREF: fibonacci+9j
.text:0x4006F8 mov edx, 1
.text:0x4006FD xor [rbp+0], edx
.text:0x400700 mov eax, 1
.text:0x400705 pop rbx
.text:0x400706 pop rbp
.text:0x400707 pop r12
.text:0x400709 retn
.text:0x400709 ; ---------------------------------------------------------------------------
.text:0x40070A align 10h
.text:0x400710
.text:0x400710 loc_400710: ; CODE XREF: fibonacci+10j
.text:0x400710 xor edi, edi
.text:0x400712 call fibonacci
.text:0x400717 mov edx, eax
.text:0x400719 mov edi, eax
.text:0x40071B shr edx, 1
.text:0x40071D and edx, 55555555h
.text:0x400723 sub edi, edx
.text:0x400725 mov esi, edi
.text:0x400727 mov edx, edi
.text:0x400729 shr esi, 2
.text:0x40072C and esi, 33333333h
.text:0x400732 jmp short loc_4006BC
.text:0x400732 fibonacci endp
At this stage, we have many opportunities to solve this problem. For example, we can recover the code using one of the programming languages and apply optimization there, but the process of recovering the code is a very difficult task during which we can make mistakes. Well, then comparing the code to find an error is generally no good. But, if we use the Unicorn Engine, then we can skip the code reconstruction step and avoid the problem described above. Of course, we can avoid these troubles using frida or writing scripts for gdb, but this is not about that.
Before starting the optimization, we will start the emulation in the Unicorn Engine without any changes to the program. And only after a successful launch, let's move on to optimization.
Step 1: Let Virtualization Come
Let's create the fibonacci.py file and save it next to the binary.
Let's start by importing the required libraries:
from unicorn import *
from unicorn.x86_const import *
import struct
The first line loads the main binary and base Unicorn constants. The second line loads constants for the two x86 and x86_64 architectures.
Next, add some necessary functions:
defread(name):with open(name) as f:
return f.read()
defu32(data):return struct.unpack("I", data)[0]
defp32(num):return struct.pack("I", num)
Here we declared the functions that we will need later:
- read simply returns the contents of the file,
- u32 takes a 4-byte string in LE encoding and converts to int,
- p32 does the opposite — it takes a number and turns it into a 4-byte string in LE encoding.
Note: If you have pwntools installed , then you do not need to create these functions, you just need to import them:
from pwn import *
And so, let's finally start initializing our Unicorn Engine class for the x86_64 architecture:
mu = Uc (UC_ARCH_X86, UC_MODE_64)
Here we call Uc functions with the following parameters:
- The first parameter is the basic architecture. Constants start with UC_ARCH_ ;
- The second parameter is the architecture specification. Constants begin with UC_MODE_ .
You can find all the constants in Cheatsheet .
As I wrote above, to use the Unicorn Engine, we need to initialize the virtual memory manually. For this example, we need to place the code and the stack somewhere in memory.
The base address (Base addr) of the binary starts at 0x400000. Let's place our stack in 0x0 and allocate 1024 * 1024 memory for it. Most likely, we do not need so much space, but it still does not hurt.
We can mark up memory by calling the mem_map method .
Add these lines:
BASE = 0x400000
STACK_ADDR = 0x0
STACK_SIZE = 1024*1024
mu.mem_map(BASE, 1024*1024)
mu.mem_map(STACK_ADDR, STACK_SIZE)
Now we need to load the binary into its main address in the same way as the bootloader does. After that we need to put the RSP in the end of the stack.
mu.mem_write(BASE, read("./fibonacci"))
mu.reg_write(UC_X86_REG_RSP, STACK_ADDR + STACK_SIZE - 1)
Now we can start the emulation and run the code, but we need to figure out which address to start working with and when the emulator should stop.
Take the address of the first command from main () , we can start the emulation with 0x004004e0. The end will be a call to putc ("\ n") , which is located at 0x00400575, after removing the entire flag.
.text:0x400570 mov edi, 0Ah ; c
.text:0x400575 call __IO_putc
We can start emulation:
mu.emu_start(0x004004e0,0x00400575)
Now run the script:
a@x:~/Desktop/unicorn_engine_lessons$ python solve.py
Traceback (most recent call last):
File "solve.py", line 32, in <module>
mu.emu_start(0x00000000004004E0, 0x0000000000400575)
File "/usr/local/lib/python2.7/dist-packages/unicorn/unicorn.py", line 288, in emu_start
raise UcError(status)
unicorn.unicorn.UcError: Invalid memory read (UC_ERR_READ_UNMAPPED)
Oops, something went wrong, but we don't even know what. Immediately before calling mu.emu_start, we can add:
defhook_code(mu, address, size, user_data):
print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))
mu.hook_add(UC_HOOK_CODE, hook_code)
This code adds a hook. We declare our own hook_code function , which is called by the emulator before each command. It takes the following parameters:
- our copy of Uc ,
- instruction address
- size of instruction,
- user data (we can pass this value using an optional argument to hook_add () ).
Now, if we run the script, we should see the following output:a@x:~/Desktop/unicorn_engine_lessons$ python solve.py >>> Tracing instruction at 0x4004e0, instruction size = 0x1 >>> Tracing instruction at 0x4004e1, instruction size = 0x1 >>> Tracing instruction at 0x4004e2, instruction size = 0x2 >>> Tracing instruction at 0x4004e4, instruction size = 0x5 >>> Tracing instruction at 0x4004e9, instruction size = 0x2 >>> Tracing instruction at 0x4004eb, instruction size = 0x4 >>> Tracing instruction at 0x4004ef, instruction size = 0x7 Traceback (most recent call last): File "solve.py", line 41, in <module> mu.emu_start(0x00000000004004E0, 0x0000000000400575) File "/usr/local/lib/python2.7/dist-packages/unicorn/unicorn.py", line 288, in emu_start raise UcError(status) unicorn.unicorn.UcError: Invalid memory read (UC_ERR_READ_UNMAPPED)
At the address where the error occurred, we can understand that our script cannot process this command:
.text:0x4004EF mov rdi, cs:stdout ; stream
This instruction reads data from address 0x601038 (you can see this in IDA Pro). This is a .bss section that we didn’t mark up. My solution would be to simply skip all the problematic instructions, if this does not affect the logic of the program.
Below is another problem instruction:.text:0x4004F6 call _setbuf
We cannot call any functions with glibc, since we do not have loaded glibc in memory. In any case, we don’t need this command, so we can skip it too.
Here is the complete list of commands to skip:.text:0x4004EF mov rdi, cs:stdout ; stream .text:0x4004F6 call _setbuf .text:0x400502 call _printf .text:0x40054F mov rsi, cs:stdout ; fp
To skip commands, we need to overwrite the RIP with the following instruction:
mu.reg_write(UC_X86_REG_RIP, address+size)
Now hook_code should look something like this:
instructions_skip_list = [0x004004ef,0x004004f6,0x00400502,0x0040054f] defhook_code(mu, address, size, user_data): print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size)) if address in instructions_skip_list: mu.reg_write(UC_X86_REG_RIP, address+size)
We also need to do something with instructions that output the flag to the console byte-by-byte.
.text:0x400558 movsx edi, dil ; c .text:0x40055C add rbp, 1 .text:0x400560 call __IO_putc
__IO_putc accepts bytes for output as the first argument (this is the RDI register ).
We can read data directly from the register, output the data to the console and skip this set of instructions. The updated hook_code is presented below:
instructions_skip_list = [0x004004ef,0x004004f6,0x00400502,0x0040054f] defhook_code(mu, address, size, user_data):#print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size))if address in instructions_skip_list: mu.reg_write(UC_X86_REG_RIP, address+size) elif address == 0x400560: # c = mu.reg_read(UC_X86_REG_RDI) print(chr(c),end="") mu.reg_write(UC_X86_REG_RIP, address+size)
We can run and it will all work, but still slow.
Step 2: Increase Speed!
Let's think about increasing the speed of work. Why is this program so slow?
If we look at the decompiled code, we see that main () calls fibonacci () several times and fibonacci () is a recursive function. Take a closer look at this function, it takes and returns two arguments. The first return value is passed through the RAX register, the second is returned via the link that was passed through the second argument of the function. If we look deeper at the link between main () and fibonacci () , we will see that the second argument takes only two possible values: 0 or 1. If you still don’t see this, then run gdb and set a breakpoint at the beginning of the function fibonacci () .
To optimize the operation of the algorithm, we can use dynamic programming to remember the return value for the incoming parameters. Think for yourself, the second argument can take only two possible values, so we only need to remember par.
For those who do not understandfibonacci is a recursive function that calculates the next value as the sum of the two previous ones. At each step, it goes deeper. Every time she starts from the beginning, she goes the same way as before, plus one new value.
Example:
Suppose depth = 6, then: 1 1 2 3 5 8 .
And now depth = 8, then: 1 1 2 3 5 8 13 21.Мы бы могли просто запомнить, что первые 6 членов это 1 1 2 3 5 8, и когда нас просят посчитать больше, чем мы запомнили, то мы берем то, что запомнили и считаем только то, чего не хватает.
Once the RIP is at the beginning of fibonacci () , we can get the function arguments. We know that a function returns a result when it exits a function. Since we cannot operate with two parameters at once, we need a stack for returning parameters. When entering fibonacci (), we need to put the arguments on the stack, and pick them up on exit. We can use a dictionary for storing counted pairs.
How to handle a couple of values?
- At the very beginning of the function, we can check whether this pair is in the results we already know:
- if there is, then we can return this pair. We just need to write the returned values in RAX and at the address of the link, which is in the second argument. We also assign a RIP address for exiting a function. We cannot use RET in fibonacci () , since these calls are hooked, so we take some RET from main () ;
- if there are no these values, then we simply add them to the stack.
- Before exiting the function, we can save the returned pair. We know the incoming arguments, since we can read them from our stack.
This code is presented here.FIBONACCI_ENTRY = 0x00400670 FIBONACCI_END = [ 0x004006f1, 0x00400709] instructions_skip_list = [0x004004ef,0x004004f6,0x00400502,0x0040054f] # Стек для хранения аргументов stack = [] # Словарь, в котором мы храним известные пары d = {} defhook_code(mu, address, size, user_data):if address in instructions_skip_list: mu.reg_write(UC_X86_REG_RIP, address+size) # Эта инструкция выводит байт флагаelif address == 0x400560: c = mu.reg_read(UC_X86_REG_RDI) print(chr(c),end="") mu.reg_write(UC_X86_REG_RIP, address+size) # Мы находимся в начале функции?elif address == FIBONACCI_ENTRY: # Считать первый аргумент из RDI arg0 = mu.reg_read(UC_X86_REG_RDI) # Считать второй аргумент (ссылка) r_rsi = mu.reg_read(UC_X86_REG_RSI) # Считать второй аргумент, перейдя по ссылке arg1 = u32(mu.mem_read(r_rsi, 4)) # Проверить, сохраненная ли это пара?if (arg0,arg1) in d: (ret_rax, ret_ref) = d[(arg0,arg1)] # Поместить сохраненное значение в RAX mu.reg_write(UC_X86_REG_RAX, ret_rax) # Записать результат через ссылку mu.mem_write(r_rsi, p32(ret_ref)) # Поменять RIP на RET инструкцию, так как мы хотим выйти из fibonacci mu.reg_write(UC_X86_REG_RIP, 0x400582) else: # Если аргументы не были ранее сохранены, то помещаем их в стек stack.append((arg0,arg1,r_rsi)) elif address in FIBONACCI_END: # Получаем аргументы из стека (arg0, arg1, r_rsi) = stack.pop() # Считываем возвращаемое значение из RAX ret_rax = mu.reg_read(UC_X86_REG_RAX) # Считываем значение, которое было передано по ссылке ret_ref = u32(mu.mem_read(r_rsi,4)) # Запоминаем эту пару в словарь d[(arg0, arg1)]=(ret_rax, ret_ref)
Here is the whole script#!/usr/bin/env python# -*- coding: utf-8 -*-from __future__ import print_function from unicorn import * from unicorn.x86_const import * import struct defread(name):with open(name) as f: return f.read() defu32(data):return struct.unpack("I", data)[0] defp32(num):return struct.pack("I", num) FIBONACCI_ENTRY = 0x00400670 FIBONACCI_END = [ 0x004006f1, 0x00400709] instructions_skip_list = [0x004004ef,0x004004f6,0x00400502,0x0040054f] # Стек для хранения аргументов stack = [] # Словарь, в котором мы храним известные пары d = {} defhook_code(mu, address, size, user_data):if address in instructions_skip_list: mu.reg_write(UC_X86_REG_RIP, address+size) # Эта инструкция выводит байт флагаelif address == 0x400560: c = mu.reg_read(UC_X86_REG_RDI) print(chr(c),end="") mu.reg_write(UC_X86_REG_RIP, address+size) # Мы находимся в начале функции?elif address == FIBONACCI_ENTRY: # Считать первый аргумент из RDI arg0 = mu.reg_read(UC_X86_REG_RDI) # Считать второй аргумент (ссылка) r_rsi = mu.reg_read(UC_X86_REG_RSI) # Считать второй аргумент, перейдя по ссылке arg1 = u32(mu.mem_read(r_rsi, 4)) # Проверить, сохраненная ли это пара?if (arg0,arg1) in d: (ret_rax, ret_ref) = d[(arg0,arg1)] # Поместить сохраненное значение в RAX mu.reg_write(UC_X86_REG_RAX, ret_rax) # Записать результат через ссылку mu.mem_write(r_rsi, p32(ret_ref)) # Поменять RIP на RET инструкцию. Мы хотим выйти из fibonacci mu.reg_write(UC_X86_REG_RIP, 0x400582) else: # Если аргументы не были ранее сохранены, то помещаем их в стек stack.append((arg0,arg1,r_rsi)) elif address in FIBONACCI_END: # Получаем аргументы из стека (arg0, arg1, r_rsi) = stack.pop() # Считываем возвращаемое значение из RAX ret_rax = mu.reg_read(UC_X86_REG_RAX) # Считываем значение, которое было переданное через ссылку ret_ref = u32(mu.mem_read(r_rsi,4)) # Запоминаем эту пару в словарь d[(arg0, arg1)]=(ret_rax, ret_ref) mu = Uc (UC_ARCH_X86, UC_MODE_64) BASE = 0x400000 STACK_ADDR = 0x0 STACK_SIZE = 1024*1024 mu.mem_map(BASE, 1024*1024) mu.mem_map(STACK_ADDR, STACK_SIZE) mu.mem_write(BASE, read("./fibonacci")) mu.reg_write(UC_X86_REG_RSP, STACK_ADDR + STACK_SIZE - 1) mu.hook_add(UC_HOOK_CODE, hook_code) mu.emu_start(0x004004e0, 0x00400575) print()
Hurray, we were finally able to optimize the application using the Unicorn Engine. Good job!
The note
Now I decided to give you a little homework.
Here you can find three more problems, each of which has a hint and a complete solution. You can pry into Cheatsheet while solving problems.One of the most annoying problems is to remember the name of the desired constant. This is easy to deal with if using IPython Tab add-ons . When you have IPython installed, you can write from unicorn import UC_ARCH_, press Tab and you will see all the constants that start the same way.
- At the very beginning of the function, we can check whether this pair is in the results we already know: