Code Obfuscation Techniques with LLVM

On a habr there are many remarkable articles about possibilities and methods of application of LLVM. I would like to tell more about the popular obfuscation techniques that can be implemented using LLVM, in order to complicate the analysis of applications.
Introduction
The article carries more theory than a practical component and assumes the reader has certain knowledge, as well as a desire to solve interesting problems himself, without getting ready-made solutions. Most of the instructions in LLVM IR are based on three-address code, which means that they take two arguments and return the same value and that the number of instructions available to us is limited.
Used software:
- GCC 4.8.2 (mingw64)
- IDA DEMO
- Clang 3.4
- LLVM 3.4
What can be implemented using LLVM?
1) Random CFG
This method modifies the execution graph of the program supplementing it with basic blocks, the original starting block can be moved, diluted with garbage.
Examples
Original
Original graph

Obfuscation 1 start, function main

Obfuscation 2 start, function main

Obfuscation 3 start, function main

#include
#include
int rand_func()
{
return 5+rand();;
}
int main()
{
int a = rand_func();
goto test;
exit(0);
test:
int b = a+a;
}
Original graph

Obfuscation 1 start, function main

Obfuscation 2 start, function main

Obfuscation 3 start, function main

2) Insert a huge number of base blocks in CFG, with optional execution. (See screenshots from point 1) A
random base block is taken, its terminator is changed (instructions for completing the base block), a lot of base blocks are created and they are all mixed together, they can to be executable and not to be, on the imagination of the author.
3) Littering the code.
Suppose we have a specific code, it is diluted with garbage instructions that try to simulate their usefulness. They can access / modify our data without affecting the execution of the program as a whole. The goal is to complicate the analysis of our application as much as possible, with minimal loss in performance.
One of the many obfuscation options available.

#include
#include
#include
void test()
{
int a = 32;
int b = time(0);
int c = a+a+b;
int d = a-b+c*2;
printf("%d",d);
}
int main()
{
test();
}

4) Hiding constants, data.
Suppose that we have a constant of 15h, we make it so that in the native code a constant will be formed at runtime and will not be encountered in open form. Also, constant data can be hidden using any encryption algorithm.
Example
We find the constant data, namely habrahabr, and insert our data decryptor. The image shows an example with xor, but you can add any encryption algorithm (AES, RC4, etc.) Data after use (printf) on the stack will be encrypted with a random key.

Suppose you want to add data encryption, how is the easiest way to do this?
LLVM can generate its own code from cpp files, which you can insert into your project.
See the hint in the answers to questions section.
#include
#include
int main()
{
const char *habr = "habrahabr";
printf("%s",habr);
}
We find the constant data, namely habrahabr, and insert our data decryptor. The image shows an example with xor, but you can add any encryption algorithm (AES, RC4, etc.) Data after use (printf) on the stack will be encrypted with a random key.

Suppose you want to add data encryption, how is the easiest way to do this?
LLVM can generate its own code from cpp files, which you can insert into your project.
See the hint in the answers to questions section.
5) Cloning functions and using them in random order.
One and the same function is cloned into many (with possible changes), a handler is inserted at the place of the call code, functions are called in random order.
6) Combining functions.
All functions and their code are transferred to one. In some cases, the use of such a method is fraught.
7) Organization from the code of a finite state machine or switch.
A new base block is created, which becomes the entry point for the main function, branches are created from it (possibly based on a variable) to other base blocks.
Example
The original code.
First time.

Second time.

#include
#include
int rand_func()
{
return 5+rand();;
}
int main()
{
const char *habr = "habrahabr";
printf("%s",habr);
int a = rand_func();
goto test;
exit(0);
test:
int b = a+a;
}
First time.

Second time.

8) Creating pseudocycles from code.
Applicable with respect to functions, the base block of a particular function is taken, several more blocks are added to it to organize the cycle, the cycle is executed only once.
9) A random virtual machine is created, all existing code is transformed for it, for me this item is possible so far only in theory.
Where to start learning?
View a list of available three-address commands
// === - llvm / Instruction.def - File that describes Instructions - * - C ++ - * - === //
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
// === ------------------------------------------- --------------------------- === //
//
// This file contains descriptions of the various LLVM instructions. This is
// used as a central place for enumerating the different instructions and
// should eventually be the place to put comments about the instructions.
//
// === ------------------------------------------- --------------------------- === //
FIRST_TERM_INST (1)
HANDLE_TERM_INST (1, Ret, ReturnInst)
HANDLE_TERM_INST (2, Br, BranchInst)
HANDLE_TERM_INST (3, Switch, SwitchInst)
HANDLE_TERM_INST (4, IndirectBr, IndirectBrInst)
HANDLE_TERM_INST (5, Invoke, InvokeInst)
HANDLE_TERM_INST (6, Resume, ResumeInst)
HANDLE_TERM_INST ( 7, Unreachable, UnreachableInst)
LAST_TERM_INST (7)
// Standard binary operators ...
FIRST_BINARY_INST (8)
HANDLE_BINARY_INST (8, Add, BinaryOperator)
HANDLE_BINARY_INST (9, FAdd, BinaryOperator)
HANDLE_BINARY_INST
, 11BANDARY_INB_ANDBINARY_INB_ANDBINARY_INST_INB_ANDBINARY_INST_INST_INST_INB_INST_INST_INST_INB_INST_INB_INST_INST_INST_INST_INB_INB_INB_INB_INB_INB BinaryOperator)
HANDLE_BINARY_INST (12, Mul, BinaryOperator)
HANDLE_BINARY_INST (13, FMul, BinaryOperator)
HANDLE_BINARY_INST (14, UDiv, BinaryOperator)
HANDLE_BINARY_INST (15, SDiv, BinaryOperator)
HANDLE_BINARY_INST (16, FDiv, BinaryOperator)
HANDLE_BINARY_INST (17, URem, BinaryOperator)
HANDLE_BINARY_INST (18, SRem, BinaryOperator)
HANDLE_BINARY_INST (19, FRem, BinaryOperator)
// Logical operators (integer operands)
HANDLE_BINARY_INST (20, Shl, BinaryOperator) // Shift left (logical)
HANDLE_BINARY_INST (21, LShr, BinaryOperator) // Shift right (logical)
HANDLE_BINARY_INST (22, AShr, BinaryOperator) // Shift right ( )
HANDLE_BINARY_INST (23, And, BinaryOperator)
HANDLE_BINARY_INST (24, Or, BinaryOperator)
HANDLE_BINARY_INST (25, Xor, BinaryOperator)
LAST_BINARY_INST (25)
// Memory operators ...
FIRST_MEMORY_INST (26)
HANDLE_MEMORY_INST (26, Alloca, AllocaInst) // Stack management
HANDLE_MEMORY_INST (27, Load, LoadInst) // Memory manipulation instrs
HANDLE_MEMORY_INST (28, Store, StoreInst)
HANDLE_MEMORYTLESTEMTLETLETEMTLESTEMTLETEMTLESTIMENTLEMENTTIN
HANDLE_MEMORY_INST (30, Fence, FenceInst)
HANDLE_MEMORY_INST (31, AtomicCmpXchg, AtomicCmpXchgInst)
HANDLE_MEMORY_INST (32, AtomicRMW, AtomicRMWInst)
LAST_MEMORY_INST (32)
// Cast Ready operators ...
// the NOTE: Matters of The order found here Because CastInst :: isEliminableCastPair
// the NOTE: (see Instructions.cpp) encodes a table based on this ordering.
FIRST_CAST_INST (33)
HANDLE_CAST_INST (33, Trunc, TruncInst) // Truncate integers
HANDLE_CAST_INST (34, ZExt, ZExtInst) // Zero extend integers
HANDLE_CAST_INST (35, SExt, SExtInst) // Sign extend integers
HANDLE_CAST_INST (36, FPToUI, Floating Point) -> UInt
HANDLE_CAST_INST (37, FPToSI, FPToSIInst) // floating point -> SInt
HANDLE_CAST_INST (38, UIToFP, UIToFPInst) UInt // -> floating point
HANDLE_CAST_INST (39, SIToFP, SIToFPInst) SInt // -> floating point
HANDLE_CAST_INST ( 40, FPTrunc, FPTruncInst) // Truncate floating point
HANDLE_CAST_INST (41, FPExt, FPExtInst) // Extend floating point
HANDLE_CAST_INST (42, PtrToInt, PtrToIntInst) // Pointer -> Integer
HANDLE_CAST_INST (43, IntToPtr, IntToPtrInst) // Integer -> Pointer
HANDLE_CAST_INST (44, BitCast, BitCastInst) // Type cast
LAST_CAST_INST (44)
// Other operators ...
FIRST_OTHER_INST (45)
HANDLE_OTHER_INST (45) comparison instruction
HANDLE_OTHER_INST (46, FCmp, FCmpInst) // Floating point comparison instr.
HANDLE_OTHER_INST (47, PHI, PHINode) // PHI node instruction
HANDLE_OTHER_INST (48, Call, CallInst) // Call a function
HANDLE_OTHER_INST (49, Select, SelectInst) // select instruction
HANDLE_OTHER_INST (50, UserOp1, Instruction) // May be used internally in a pass
HANDLE_OTHER_INST (51, UserOp2, Instruction) // Internal to passes only
HANDLE_OTHER_INST (52, VAArg, VAArgInst) // vaarg instruction
HANDLE_OTHER_INST (53, ExtractElement, ExtractElementInst) // extract from vector
HANDLE_OTHER_INST (54, InsertElement, InsertElementInst) // insert into vector
HANDLE_OTHER_INST (55, Shuffle // Shuffle vectors.
HANDLE_OTHER_INST (56, ExtractValue, ExtractValueInst) // extract from aggregate
HANDLE_OTHER_INST (57, InsertValue, InsertValueInst) // insert into aggregate
HANDLE_OTHER_INST (58, LandingPad, LandingPadInst) // Landing pad instruction.
LAST_OTHER_INST (58)
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
// === ------------------------------------------- --------------------------- === //
//
// This file contains descriptions of the various LLVM instructions. This is
// used as a central place for enumerating the different instructions and
// should eventually be the place to put comments about the instructions.
//
// === ------------------------------------------- --------------------------- === //
FIRST_TERM_INST (1)
HANDLE_TERM_INST (1, Ret, ReturnInst)
HANDLE_TERM_INST (2, Br, BranchInst)
HANDLE_TERM_INST (3, Switch, SwitchInst)
HANDLE_TERM_INST (4, IndirectBr, IndirectBrInst)
HANDLE_TERM_INST (5, Invoke, InvokeInst)
HANDLE_TERM_INST (6, Resume, ResumeInst)
HANDLE_TERM_INST ( 7, Unreachable, UnreachableInst)
LAST_TERM_INST (7)
// Standard binary operators ...
FIRST_BINARY_INST (8)
HANDLE_BINARY_INST (8, Add, BinaryOperator)
HANDLE_BINARY_INST (9, FAdd, BinaryOperator)
HANDLE_BINARY_INST
, 11BANDARY_INB_ANDBINARY_INB_ANDBINARY_INST_INB_ANDBINARY_INST_INST_INST_INB_INST_INST_INST_INB_INST_INB_INST_INST_INST_INST_INB_INB_INB_INB_INB_INB BinaryOperator)
HANDLE_BINARY_INST (12, Mul, BinaryOperator)
HANDLE_BINARY_INST (13, FMul, BinaryOperator)
HANDLE_BINARY_INST (14, UDiv, BinaryOperator)
HANDLE_BINARY_INST (15, SDiv, BinaryOperator)
HANDLE_BINARY_INST (16, FDiv, BinaryOperator)
HANDLE_BINARY_INST (17, URem, BinaryOperator)
HANDLE_BINARY_INST (18, SRem, BinaryOperator)
HANDLE_BINARY_INST (19, FRem, BinaryOperator)
// Logical operators (integer operands)
HANDLE_BINARY_INST (20, Shl, BinaryOperator) // Shift left (logical)
HANDLE_BINARY_INST (21, LShr, BinaryOperator) // Shift right (logical)
HANDLE_BINARY_INST (22, AShr, BinaryOperator) // Shift right ( )
HANDLE_BINARY_INST (23, And, BinaryOperator)
HANDLE_BINARY_INST (24, Or, BinaryOperator)
HANDLE_BINARY_INST (25, Xor, BinaryOperator)
LAST_BINARY_INST (25)
// Memory operators ...
FIRST_MEMORY_INST (26)
HANDLE_MEMORY_INST (26, Alloca, AllocaInst) // Stack management
HANDLE_MEMORY_INST (27, Load, LoadInst) // Memory manipulation instrs
HANDLE_MEMORY_INST (28, Store, StoreInst)
HANDLE_MEMORYTLESTEMTLETLETEMTLESTEMTLETEMTLESTIMENTLEMENTTIN
HANDLE_MEMORY_INST (30, Fence, FenceInst)
HANDLE_MEMORY_INST (31, AtomicCmpXchg, AtomicCmpXchgInst)
HANDLE_MEMORY_INST (32, AtomicRMW, AtomicRMWInst)
LAST_MEMORY_INST (32)
// Cast Ready operators ...
// the NOTE: Matters of The order found here Because CastInst :: isEliminableCastPair
// the NOTE: (see Instructions.cpp) encodes a table based on this ordering.
FIRST_CAST_INST (33)
HANDLE_CAST_INST (33, Trunc, TruncInst) // Truncate integers
HANDLE_CAST_INST (34, ZExt, ZExtInst) // Zero extend integers
HANDLE_CAST_INST (35, SExt, SExtInst) // Sign extend integers
HANDLE_CAST_INST (36, FPToUI, Floating Point) -> UInt
HANDLE_CAST_INST (37, FPToSI, FPToSIInst) // floating point -> SInt
HANDLE_CAST_INST (38, UIToFP, UIToFPInst) UInt // -> floating point
HANDLE_CAST_INST (39, SIToFP, SIToFPInst) SInt // -> floating point
HANDLE_CAST_INST ( 40, FPTrunc, FPTruncInst) // Truncate floating point
HANDLE_CAST_INST (41, FPExt, FPExtInst) // Extend floating point
HANDLE_CAST_INST (42, PtrToInt, PtrToIntInst) // Pointer -> Integer
HANDLE_CAST_INST (43, IntToPtr, IntToPtrInst) // Integer -> Pointer
HANDLE_CAST_INST (44, BitCast, BitCastInst) // Type cast
LAST_CAST_INST (44)
// Other operators ...
FIRST_OTHER_INST (45)
HANDLE_OTHER_INST (45) comparison instruction
HANDLE_OTHER_INST (46, FCmp, FCmpInst) // Floating point comparison instr.
HANDLE_OTHER_INST (47, PHI, PHINode) // PHI node instruction
HANDLE_OTHER_INST (48, Call, CallInst) // Call a function
HANDLE_OTHER_INST (49, Select, SelectInst) // select instruction
HANDLE_OTHER_INST (50, UserOp1, Instruction) // May be used internally in a pass
HANDLE_OTHER_INST (51, UserOp2, Instruction) // Internal to passes only
HANDLE_OTHER_INST (52, VAArg, VAArgInst) // vaarg instruction
HANDLE_OTHER_INST (53, ExtractElement, ExtractElementInst) // extract from vector
HANDLE_OTHER_INST (54, InsertElement, InsertElementInst) // insert into vector
HANDLE_OTHER_INST (55, Shuffle // Shuffle vectors.
HANDLE_OTHER_INST (56, ExtractValue, ExtractValueInst) // extract from aggregate
HANDLE_OTHER_INST (57, InsertValue, InsertValueInst) // insert into aggregate
HANDLE_OTHER_INST (58, LandingPad, LandingPadInst) // Landing pad instruction.
LAST_OTHER_INST (58)
The following documentation should be consulted:
LLVM-CheatSheet
LLVM Programmers Manual
LLVM-CheatSheet 2
LLVMBackendCPU
Obfuscating c ++ programs via CFF It is
worth looking at public implementations of code obfuscation for review.
1) Obfuscator-llvm
Implemented the replacement of instructions, compaction of the execution graph.
2) Kryptonite
Replaced instructions with analogs / decomposition of instructions.
Snippets
In order to insert asm instructions, you can use llvm :: InlineAsm or MachinePass , through machine passes you can change, add instructions. A good example is here.
some useful code to get you started
How to read a bytecode file?
How to iterate functions in a module?
How to check for belonging to any instruction?
How to replace the terminator with another instruction?
How to cast one instruction to another?
How to get the first non- phi instruction in the base unit?
How to iterate instructions in a function?
How to find out if the instruction is used elsewhere?
How to get / change the base blocks referenced by InvokeInst and others?
std::string file = "1.bc";
std::string ErrorInfo;
llvm::LLVMContext context;
llvm::MemoryBuffer::getFile(file.c_str(), bytecode);
llvm::Module *module = llvm::ParseBitcodeFile(bytecode.get(), context, &error);
How to iterate functions in a module?
for (auto i = module->getFunctionList().begin(); i != module->getFunctionList().end(); ++i)
{
printf("Function %s",i->getName().str());
}
How to check for belonging to any instruction?
if (llvm::isa(currentInstruction))
printf("BranchInst!");
How to replace the terminator with another instruction?
llvm::BasicBlock *block = (инициализация)
block->replaceAllUsesWith(инструкция которой заменяем);
How to cast one instruction to another?
llvm::Instruction* test = basicBlock->getTerminator();
llvm::BranchInst* branchInst = llvm::dyn_cast(test)
How to get the first non- phi instruction in the base unit?
llvm::Instruction *inst = currentInstruction->getParent()->getFirstNonPHI()
How to iterate instructions in a function?
for(llvm::inst_iterator i = inst_begin(function); i != inst_end(function); i++)
{
llvm::Instruction* inst = &*i;
}
How to find out if the instruction is used elsewhere?
bool IsUsedOutsideParentBlock(llvm::Instruction* inst)
{
for(llvm::inst::use_iterator i = inst->use_begin(); i != inst->use_end(); i++)
{
llvm::User* user = *i;
if(llvm::cast(user)->getParent() != inst->getParent())
return true;
}
return false;
}
How to get / change the base blocks referenced by InvokeInst and others?
invokeInst->getSuccessor(0); //получаем указатель на базовый блок.
invokeInst->setSuccessor(0,basicBlock); //устанавливаем.
Answers on questions
Q: What about deobfuscation?
A: It is all up to you, there is a project based on LLVM to remove obfuscation.
Q: How to generate a byte code file from the source?
A:
clang -emit-llvm -o 1.bc -c 1.c
Q: How to compile byte code?
A:
clang -o 1 1.bc
Q: How to generate asm file from LLVM IR view?
A:
llc foo.ll
Q: How to generate an IR file from the source?
A:
clang -S -emit-llvm 1.c
Q: How to compile a .s file (assembler)?
A:
gcc -o exe 1.s
Q: How to get obj file from bytecode?
A:
llc -filetype=obj 1.bc
Q: How to get the source from LLVM api for cpp file?
A: clang ++ -c -emit-llvm 1.cpp -o 1.ll then
llc -march=cpp -o 1.ll.cpp 1.ll
Q: Compiled clang under windows, but it cannot find the header files, how to treat?
A: You need to find InitHeaderSearch.cpp and add the necessary paths, look towards AddMinGWCPlusPlusIncludePaths, AddMinGW64CXXPaths.
Q: Does Clang compiled as a visual studio work fine?
A: Currently not, it can compile only the simplest C code.
Q: Clang with optimization mode cuts out generated instructions / functions, what should I do?
A: Your passages need to be embedded in clang, in this cpp file. You can also make the compiler think that the code added by us is necessary, for this it is necessary that this code is necessarily used, in the case of functions, they should be called. For tests, you can use the -O0 mode.