
Encoding and decoding PHP code
I am recovering PHP sources from an encoded view.
In this article, I’ll talk about how PHP is currently encoding and decoding.
When executing a PHP script, it is parsed and compiled into the opcodes of the internal PHP virtual machine.
From each PHP file you get:
- an array of classes: in each class - information about the class, class properties and an array of class methods
- an array of functions
- a “script body” - code outside of classes and functions
For brevity, I call the entire internal structure of a compiled file ready for execution in this article " opcodes ."
The opcodes themselves (operations of the internal PHP virtual machine) inside some function look like this:
An important point: the files in compiled form are quite different even between subversions of the PHP interpreter. It is understandable: he compiled for himself - and executed it himself.
There are two fundamentally different types of encoders.
The first ones work exclusively by means of the language itself. They make the code unreadable with base64 encoding, zip-code, various string manipulations, and all eventually use the eval () function. All this is very similar to obfuscators in Javascript. It looks something like this:
Such protection is removed very simply, in the most difficult cases - in a few hours. Another major minus - performance is seriously affected. Therefore, for serious use, such protection is not recommended.
The second type of encoders uses its own plug-ins for the PHP interpreter, which are called loaders ( loaders). In this case, as a rule, not the source code itself is encoded, but the results of its compilation, i.e. internal structures and opcodes. This is already a much more serious defense - even if you decode the opcodes themselves, you still need to restore the original PHP code from them. In addition, in terms of performance, the additional costs of decoding are often offset by savings in compiling code, i.e. execution speed of encoded scripts is often even higher than that of the source code.
During loading the PHP interpreter, loaders of the encoders hang their handlers on the functions of loading PHP files, compilation and execution, so that the work with encoded files would be transparent for the interpreter itself.
The main difficulty for encoders is to make opcodes compiled under one version of PHP during encoding work under a different version of PHP when decoding. Almost all loaders on all encoders, after decoding, make the necessary changes to ensure such compatibility. The main player in this market - IonCube - made great efforts at the time to solve this problem, and its loaders can correctly execute opcodes from PHP 4.x to PHP 5.x on the fly, and even vice versa if possible!
Also, for added protection, most encoders make it possible to obfuscate identifiers: names of variables, names of functions, classes. This process, as a rule, is one-way - like hashing, and also as a result, names with unprintable characters are often obtained, which work fine, but which cannot be used directly in decompiled texts. For example, how to write a function with the name ... * dictated by bytes * 0x0D, 0x07, 0x03, 0x0B, 0x02, 0x04, 0x06?
Special attention is paid to ensure that obfuscated names work correctly. For example, the checkLicense function is called in the code - loader obfuscates the name on the fly, gets {0x0D, 0x07, 0x03, 0x0B, 0x02, 0x04, 0x06} and searches for this key in the hash table with the names of the functions.
Zend Guard even provides the run-time functions zend_obfuscate_function_name and zend_obfuscate_class_name, which allow the calculation of obfuscated names for functions and classes to facilitate the association of encoded files with unencoded ones.
To create a decoder, you need two things: get decrypted opcodes and be able to decompile them into PHP source code.
To get the opcodes, someone came up with a bright idea - to make your own PHP interpreter assembly, which instead of executing a decoded script would send it for decompilation. No need to bother reading the encoder format and its protections - the encoder loader itself does all the necessary work!
For some time this worked well, then the authors of some encoders thought of replacing the decoded functions with stubs, and hiding the real code and getting each function that was called only at the moment of its direct execution.
In response, the authors of the decoders began to modify loaders from encoders so that they did not use such stubs.
A rather big minus turned out to be that for each version of PHP, each encoder had its own loaders, which were also often updated. I had to patch a lot and often, although it is not difficult to just disable the function call or another.
And finally, the authors of one popular encoder took the next step: they began to additionally encode individual operands in some instructions and hang up their handlers for the corresponding commands of the PHP virtual machine. For example, the code
This for a long time slowed down those who “patched loaders”. Firstly, it took a long time to figure out why the seemingly properly pulled opcodes decompiled with errors. Secondly, here it was no longer possible to simply change a couple of bytes in the loader.
Those few who put more effort took the stage - reversed and understood the format of encoded files.
The second part in the work of the decoder is decompilation. This is a difficult, but interesting, purely algorithmic task.
Once upon a time, bright heads wrote a couple of good decompilation algorithms for PHP. Most of those who are engaged in PHP decoding now cannot write their decompiler, therefore they use the ones with minimal corrections.
All open access decompilers correctly recover only 90-95 %% of the code. The rest has to be fixed manually, and here the experience of programming in PHP and the experience of decompilation play a very big role. Errors usually occur typical.
To summarize: there is no fully automatic decoding for major commercial encoders yet.
It is clear that sooner or later, any encoded code will be opened if necessary. But knowing how the decoders work, you can seriously complicate this process:
Generally speaking, decoding PHP files after commercial encoders is illegal. Technically, this is due to the fact that for full decoding, you need to decompile and analyze the encoders themselves, and the law and user agreements directly prohibit this.
There is a loophole on the territory of the European Union: it is allowed to “ensure the compatibility of software instances that you own, and for this, if necessary, bypass the built-in protection systems”. At the same time, a direct ban on reverse engineering for each encoder still takes precedence.
It turns out that “I downloaded a program from the Internet that got me unencrypted opcodes” or “I used a special PHP interpreter assembly that stores decrypted opcodes” - these are conditionally legal methods of decoding. “Conditionally” - because if the case nevertheless reaches the court, it is still unclear who will be right.
It is clear that the creators of encoders would prefer that no one could ever decode encoded files. But for those who are left with code-coded after dishonest freelancers, or after the disappearance of the development company (which happens very often), the opinion about decoding is the exact opposite.
Most of the encoders over the past couple of years just slightly changes the file format "under the hood", and is released under the guise of a new version.
When obfuscating short names, conflicts often arise. Apparently, in such cases, tech support for encoders simply advises against obfuscation.
Freelancers so often use code snippets from the PHP documentation and with StackOverflow that a dictionary made up of identifiers taken from examples from there usually allows to deobfuscate under 90% of all names in an average project.
For all the time I met only five different PHP decompilers. Three of them were written by Russian-speaking programmers, one more - by the Chinese and one more - swore that the Frenchman. A trifle, but nice - proud of “ours” :)
Moreover, the majority of Russian-speaking clients asks in their own way to do the work for free :)
UPD I had to cut out part of the example with the eval-encoded code, because Kaspersky issued a warning on it. Thanks nokimaro !
In this article, I’ll talk about how PHP is currently encoding and decoding.
Very brief educational program on the internal structure of the PHP interpreter
When executing a PHP script, it is parsed and compiled into the opcodes of the internal PHP virtual machine.
From each PHP file you get:
- an array of classes: in each class - information about the class, class properties and an array of class methods
- an array of functions
- a “script body” - code outside of classes and functions
// Классы
class A
{
public $prop1 = NULL;
public function method1() { }
}
// Функции
function FFF() { }
// Тело скрипта
echo "Hello, world!";
For brevity, I call the entire internal structure of a compiled file ready for execution in this article " opcodes ."
The opcodes themselves (operations of the internal PHP virtual machine) inside some function look like this:
[0000] ZEND_INIT_FCALL_BY_NAME -, "defined" -> - [0001] ZEND_SEND_VAL (61) "MVMMALL", - -> - [0002] ZEND_DO_FCALL_BY_NAME (1) -, - -> $ _z_var_120 [0003] ZEND_JMPZ $ _z_var_120, # 0008 -> - [0004] ZEND_INIT_FCALL_BY_NAME -, "defined" -> - [0005] ZEND_SEND_VAL (61) "IN_ADMINCP", - -> - [0006] ZEND_DO_FCALL_BY_NAME (1) -, - -> $ _z_var_120 [0007] ZEND_JMPNZ $ _z_var_120, # 0009 -> - [0008] ZEND_EXIT "Access Denied", - -> -
An important point: the files in compiled form are quite different even between subversions of the PHP interpreter. It is understandable: he compiled for himself - and executed it himself.
How encoders work
There are two fundamentally different types of encoders.
The first ones work exclusively by means of the language itself. They make the code unreadable with base64 encoding, zip-code, various string manipulations, and all eventually use the eval () function. All this is very similar to obfuscators in Javascript. It looks something like this:
eval(base64_decode("DQplcnJvcl9yZXBvcnRpbmcoMCk7DQokcWF6c --- [cut] --- KfQ0KfQ=="));
Such protection is removed very simply, in the most difficult cases - in a few hours. Another major minus - performance is seriously affected. Therefore, for serious use, such protection is not recommended.
The second type of encoders uses its own plug-ins for the PHP interpreter, which are called loaders ( loaders). In this case, as a rule, not the source code itself is encoded, but the results of its compilation, i.e. internal structures and opcodes. This is already a much more serious defense - even if you decode the opcodes themselves, you still need to restore the original PHP code from them. In addition, in terms of performance, the additional costs of decoding are often offset by savings in compiling code, i.e. execution speed of encoded scripts is often even higher than that of the source code.
During loading the PHP interpreter, loaders of the encoders hang their handlers on the functions of loading PHP files, compilation and execution, so that the work with encoded files would be transparent for the interpreter itself.
The main difficulty for encoders is to make opcodes compiled under one version of PHP during encoding work under a different version of PHP when decoding. Almost all loaders on all encoders, after decoding, make the necessary changes to ensure such compatibility. The main player in this market - IonCube - made great efforts at the time to solve this problem, and its loaders can correctly execute opcodes from PHP 4.x to PHP 5.x on the fly, and even vice versa if possible!
Obfuscation
Also, for added protection, most encoders make it possible to obfuscate identifiers: names of variables, names of functions, classes. This process, as a rule, is one-way - like hashing, and also as a result, names with unprintable characters are often obtained, which work fine, but which cannot be used directly in decompiled texts. For example, how to write a function with the name ... * dictated by bytes * 0x0D, 0x07, 0x03, 0x0B, 0x02, 0x04, 0x06?
Special attention is paid to ensure that obfuscated names work correctly. For example, the checkLicense function is called in the code - loader obfuscates the name on the fly, gets {0x0D, 0x07, 0x03, 0x0B, 0x02, 0x04, 0x06} and searches for this key in the hash table with the names of the functions.
Zend Guard even provides the run-time functions zend_obfuscate_function_name and zend_obfuscate_class_name, which allow the calculation of obfuscated names for functions and classes to facilitate the association of encoded files with unencoded ones.
Decoders Strike Back
To create a decoder, you need two things: get decrypted opcodes and be able to decompile them into PHP source code.
To get the opcodes, someone came up with a bright idea - to make your own PHP interpreter assembly, which instead of executing a decoded script would send it for decompilation. No need to bother reading the encoder format and its protections - the encoder loader itself does all the necessary work!
For some time this worked well, then the authors of some encoders thought of replacing the decoded functions with stubs, and hiding the real code and getting each function that was called only at the moment of its direct execution.
In response, the authors of the decoders began to modify loaders from encoders so that they did not use such stubs.
A rather big minus turned out to be that for each version of PHP, each encoder had its own loaders, which were also often updated. I had to patch a lot and often, although it is not difficult to just disable the function call or another.
And finally, the authors of one popular encoder took the next step: they began to additionally encode individual operands in some instructions and hang up their handlers for the corresponding commands of the PHP virtual machine. For example, the code
$a = 0;
turned into $a = 5;
, and at the time of execution, the custom rule handler 5 back to 0. This for a long time slowed down those who “patched loaders”. Firstly, it took a long time to figure out why the seemingly properly pulled opcodes decompiled with errors. Secondly, here it was no longer possible to simply change a couple of bytes in the loader.
Those few who put more effort took the stage - reversed and understood the format of encoded files.
The second part in the work of the decoder is decompilation. This is a difficult, but interesting, purely algorithmic task.
Once upon a time, bright heads wrote a couple of good decompilation algorithms for PHP. Most of those who are engaged in PHP decoding now cannot write their decompiler, therefore they use the ones with minimal corrections.
All open access decompilers correctly recover only 90-95 %% of the code. The rest has to be fixed manually, and here the experience of programming in PHP and the experience of decompilation play a very big role. Errors usually occur typical.
To summarize: there is no fully automatic decoding for major commercial encoders yet.
How to protect yourself from decoding
It is clear that sooner or later, any encoded code will be opened if necessary. But knowing how the decoders work, you can seriously complicate this process:
- if possible, use new versions of PHP and the language itself in full: namespaces, traits, lambdas
- be sure to use obfuscation of names, and try not to use short and standard names: $ ch, $ ci, $ arr, 'license', 'valid' ...
- decoders "adore" the design of the form
connect(...) or die(...);
and their variations of the form:defined('MYCONST') or define('MYCONST', true);
or($_alias = $object_name) OR $_alias = $class;
- Especially “good” decoders understand rare constructions of the form:
$valid ? $a : exit('Error!');
$valid ? $valid : print('Error!'); // вопрос знатокам PHP: знаете ли вы, почему тут именно print ? ;)
- use the “favorite” element of the language:
list( , , $c, $d)
and view constructswhile(list($k, $v) = each($arr))
- try the “dessert for decompiler”:
switch($eatThis) { default: $doNothing = 0; }
(the humor is that decompilers usually expect to see at least one CASE, without which they don’t understand that there was a switch construct here) - some publicly available decoders fail on complex method or property names:
$obj->{'alpha' . $beta}
- the other part crashes with magic methods, including even __construct
Legal Aspects
Generally speaking, decoding PHP files after commercial encoders is illegal. Technically, this is due to the fact that for full decoding, you need to decompile and analyze the encoders themselves, and the law and user agreements directly prohibit this.
There is a loophole on the territory of the European Union: it is allowed to “ensure the compatibility of software instances that you own, and for this, if necessary, bypass the built-in protection systems”. At the same time, a direct ban on reverse engineering for each encoder still takes precedence.
It turns out that “I downloaded a program from the Internet that got me unencrypted opcodes” or “I used a special PHP interpreter assembly that stores decrypted opcodes” - these are conditionally legal methods of decoding. “Conditionally” - because if the case nevertheless reaches the court, it is still unclear who will be right.
It is clear that the creators of encoders would prefer that no one could ever decode encoded files. But for those who are left with code-coded after dishonest freelancers, or after the disappearance of the development company (which happens very often), the opinion about decoding is the exact opposite.
Interesting facts and tales
Most of the encoders over the past couple of years just slightly changes the file format "under the hood", and is released under the guise of a new version.
When obfuscating short names, conflicts often arise. Apparently, in such cases, tech support for encoders simply advises against obfuscation.
Freelancers so often use code snippets from the PHP documentation and with StackOverflow that a dictionary made up of identifiers taken from examples from there usually allows to deobfuscate under 90% of all names in an average project.
For all the time I met only five different PHP decompilers. Three of them were written by Russian-speaking programmers, one more - by the Chinese and one more - swore that the Frenchman. A trifle, but nice - proud of “ours” :)
Moreover, the majority of Russian-speaking clients asks in their own way to do the work for free :)
And finally, a couple of stories
One Arab, after a long discussion of his project, said that "my budget is $ 15, but we all understand ... there is a lot of work, so you just quit all your programs, and we somehow decode everything ourselves."
Several times it turned out that only I could decode a specific file format. And the same files came to decoding through several different intermediaries at the same time.
I was especially amused by this story: a black man with an African name and Swiss citizenship, quarreled with a freelancer-programmer from Australia, did not pay him for his work and stayed with a couple of encoded unfinished files on his website. For a long time I was looking at the freelance exchanges for the one who decodes them, until at last one Indian brought his services to him.
For three weeks this Indian fed the customer breakfasts, and he himself was intensely searching for a real performer. In parallel, the customer (the same bug) under a different name continued to search for other decoders on all the same freelance exchanges. He found me, gave me the project ... and then, literally in half an hour, an Indian knocked on me and with a sense of obvious relief began to persuade him to do his project too. I compared the files, and ...
Of course, it would be worthwhile for educational purposes to take 100% prepayment from both ... but I just made them talk and figure it out.
As a result, the Indian still does not forget to wish me a happy birthday.
The customer even gave me a bonus, and now he moved to Estonia (!) Because it’s cheaper to live there, and periodically persuades me to participate in some of his dubious projects.
Several times it turned out that only I could decode a specific file format. And the same files came to decoding through several different intermediaries at the same time.
I was especially amused by this story: a black man with an African name and Swiss citizenship, quarreled with a freelancer-programmer from Australia, did not pay him for his work and stayed with a couple of encoded unfinished files on his website. For a long time I was looking at the freelance exchanges for the one who decodes them, until at last one Indian brought his services to him.
For three weeks this Indian fed the customer breakfasts, and he himself was intensely searching for a real performer. In parallel, the customer (the same bug) under a different name continued to search for other decoders on all the same freelance exchanges. He found me, gave me the project ... and then, literally in half an hour, an Indian knocked on me and with a sense of obvious relief began to persuade him to do his project too. I compared the files, and ...
Of course, it would be worthwhile for educational purposes to take 100% prepayment from both ... but I just made them talk and figure it out.
As a result, the Indian still does not forget to wish me a happy birthday.
The customer even gave me a bonus, and now he moved to Estonia (!) Because it’s cheaper to live there, and periodically persuades me to participate in some of his dubious projects.
UPD I had to cut out part of the example with the eval-encoded code, because Kaspersky issued a warning on it. Thanks nokimaro !