
Reverse engineering for the smallest: breaking keygen
- Tutorial
This post will be really interesting for those who are just starting to be interested in this topic. In people with experience, he may only cause yawns. Except maybe, perhaps, the epilogue ...
Reverse engineering in that less legal part, where it does not concern debugging and optimization of your own product, also includes the following task: "find out how it works for them." In other words, the restoration of the original algorithm of the program, having in its hands its executable file.
In order to stick to the basics and avoid some problems, we’ll “crack” not something, but ... keygen. In 90%, it will not be packed, encrypted or otherwise protected - including international law ...
So, we need keygen and disassembler. As for the second, let’s assume that it will be Ida Pro. An experimental nameless keygen found on the Web:

Opening the keygen file in Ida, we see a list of functions.

After analyzing this list, we see several standard functions (WinMain, start, DialogFunc) and a bunch of auxiliary-system functions. All of these are standard features that make up the framework.
The disassembler does not recognize user functions that represent the implementation of the tasks of the program, and not its wrapper from API and system calls, and simply calls sub_digits. Given that there is only one such function, it should attract our attention as, most likely, containing the algorithm of interest to us or part of it.

Let's run keygen. He asks for two 4-digit lines. Suppose eight characters are sent to the key calculation function at once. We analyze the function code sub_401100. The answer to the hypothesis is contained in the first two lines:
The second line clearly hints at getting the function argument at offset 8. However, the size of the argument is a double word equal to 4 bytes, not 8. So, most likely, the function processes one line of four characters in one pass, and it is called twice.
The question that may surely arise is: why is an offset of 8 bytes reserved for receiving the function argument, but points to 4, because there is only one argument? As we recall, the stack grows down; when a value is added to the stack, the stack pointer decreases by the corresponding number of bytes. Therefore, after adding a function argument to the stack and before it starts working, something else is added to the stack. This is obviously the return address added to the stack after calling the system call function.
Find the places in the program where sub401100 function calls are found. There really are two of them: at DialogFunc + 97 and DialogFunc + 113. The instructions that interest us begin here:
First, two SendDlgItemMessageA functions are called in a row. This function takes the handle of the element and sends it the Msg system message. In our case, Msg in both cases is 0Dh, which is the hexadecimal equivalent of the WM_GETTEXT constant. Here, the values of two text fields are retrieved into which the user entered “two 4-character strings”. The letter A in the function name indicates that the ASCII format is used - one byte per character.
The first line is written at the offset lParam, the second, which is obvious - at the offset var_1C.
So, after the SendDlgItemMessageA functions are executed, the current state of the registers is saved on the stack using the pusha command, then one byte of one of the lines is written to the ecx, edx and eax registers. As a result, each of the registers takes the form: 000000 ##. Then:
Then this value is stored in the "variable" - in the memory area at offset arg_4. The state of the registers (their previous values), previously stored on the stack, is pulled from the stack and distributed to the registers. Then, the value at the offset arg_4 is again written to the eax register and this value is pushed from the register onto the stack. After that the sub_401100 function call follows.
What is the meaning of these operations? To find out is very simple even in practice, without theory. We set a breakpoint in the debugger, for example, on the push eax instruction (just before the subfunction is called) and run the program for execution. Keygen starts up, asks for lines. Having entered qwer and tyui and stopped at the breakpoint, we look at the value еах: 72657771. Decode to the text: rewq. That is, the physical meaning of these operations is line inversion.
Now we know that in sub_401100 one of the original lines is transmitted, turned upside down, in the size of a double word, which fits entirely in any of the standard registers. Perhaps you can take a look at the instructions sub_401100.
At the very beginning, there is nothing interesting here - the state of the registers is carefully stored on the stack. And here is the first command that we are interested in - following the PUSHA instruction. It writes in exx the function argument stored at offset arg_0. Then this value is transferred to eax. And cut off half: as we recall, in our example 72657771 is passed to sub_401100; a logical shift to the left by 10h (16 in decimal) turns the register value into 77710000.
After that, the register value is inverted with the NOT instruction. This means that in the binary representation of the register, all zeros are converted to units, and units to zeros. The register after executing this instruction contains 888EFFFFF.
The ADD instruction adds (adds, pluses, etc.) the resulting value to the original value of the argument, which is still in the exx register (now it is clear why it was written first in exx and then in exax?). The result is saved in ex. Check what exh looks like after this operation is completed: FAF47770.
This result is copied from exx to exah, after which the SHR instruction is applied to the contents of exax. This operation is the opposite of SHL - if the latter shifts the digits to the left, then the first shifts them to the right. Just as a logical left shift operation is equivalent to multiplying by a power of two, a logical right shift operation is equivalent to the same division. Let's see what value will be the result of this operation: 7D7A3BB.
Now we commit another violence against the contents of ex and ex: the XOR instruction is an addition modulo 2 or “exclusive OR”. The essence of this operation, roughly speaking, is that its result is equal to unity (truth) only if its operands are unambiguous. For example, in the case of 0 xor 1, the result is true, or one. In the case of 0 xor 0 or 1 xor 1, the result will be false, or zero. In our case, as a result of following this instruction with respect to the registers eax (7D7A3BB) and exx (FAF47770), the value FD23D4CB is written to the register eax.
The next LEA ecx command, [eax + eax * 8] elegantly and easily multiplies ex by 9 and writes the result in exx. Then this value is copied to edx and shifted to the right by 13 digits: we get 73213 in edx and E6427B23 in exx. Then - again Xorim exx and edx, writing in exx E6454930. Copy this to eax, shift it to the left by 9 digits: 8A926000, then invert it, getting 756D9FFF. Add this value to the exx register - we have 5BB2E92F. Copy this to eax, shift right already by 17 digits - 2DD9 - and Xorim with exx. We get a total of 5BB2C4F6. Then ... then ... what do we have there? What all?..
So, we save this value to the memory area at var_4 offset, load from the register status stack, again take the final value from the memory and finally remove from the stack the remaining register states stored at the beginning. Exit the function. Hurray! .. however, it’s too early to rejoice, so far at the exit from the first function call we have a maximum of four half-printed characters, and yet we still have a whole raw string, and we still need to bring this to a divine form.
Let's move on to a higher level of analysis - from the disassembler to the decompiler. Imagine the entire DialogFunc function, which contains sub_401100 calls, in the form of a C-like pseudocode. Actually, this disassembler calls it a "pseudo-code", in fact, this is practically the C code, only ugly. We look:
This is already easier to read than assembly listing. However, not in all cases you can rely on the decompiler: you need to be ready to watch the thread of assembler logic for hours, the status of the registers and the stack in the debugger ... and then give written explanations to the FSB or FBI. In the evening I have especially funny jokes.
As I said, reading is easier, but still far from perfect. Let's analyze the code and give the variables more readable names. We will give clear and logical names to key variables, and simpler and simpler to counters.
Level complete . The next (and final) goal is to write your keygen according to this algorithm. Out of habit, I will write in the Linux shell script language bash. test $ {# reg1} -gt && reg1 = `echo" $ {reg1: -8} "` is a truncation of a string containing an emulated register value, up to 8 low characters. When performing operations, extra senior bits were added there. All the rest is a painstaking emulation of assembler listing. I pointed out the Abnormal Programming hub at the top, right? ..
bash implementation of the notorious sub_401100: Keygen

main function:

Testing and comparison:


Now we could generate keys to some gaming software directly from the Linux console, but this is impossible for several reasons: firstly, I don’t know what kind of software this keygen is intended for - I downloaded it at random on the Internet; secondly, the use of fake keys and unlicensed proprietary software is prohibited by international law. ;)

Reverse engineering in that less legal part, where it does not concern debugging and optimization of your own product, also includes the following task: "find out how it works for them." In other words, the restoration of the original algorithm of the program, having in its hands its executable file.
In order to stick to the basics and avoid some problems, we’ll “crack” not something, but ... keygen. In 90%, it will not be packed, encrypted or otherwise protected - including international law ...
In the beginning was the word. Double
So, we need keygen and disassembler. As for the second, let’s assume that it will be Ida Pro. An experimental nameless keygen found on the Web:

Opening the keygen file in Ida, we see a list of functions.

After analyzing this list, we see several standard functions (WinMain, start, DialogFunc) and a bunch of auxiliary-system functions. All of these are standard features that make up the framework.
The disassembler does not recognize user functions that represent the implementation of the tasks of the program, and not its wrapper from API and system calls, and simply calls sub_digits. Given that there is only one such function, it should attract our attention as, most likely, containing the algorithm of interest to us or part of it.

Let's run keygen. He asks for two 4-digit lines. Suppose eight characters are sent to the key calculation function at once. We analyze the function code sub_401100. The answer to the hypothesis is contained in the first two lines:
var_4 = dword ptr -4
arg_0 = dword ptr 8
The second line clearly hints at getting the function argument at offset 8. However, the size of the argument is a double word equal to 4 bytes, not 8. So, most likely, the function processes one line of four characters in one pass, and it is called twice.
The question that may surely arise is: why is an offset of 8 bytes reserved for receiving the function argument, but points to 4, because there is only one argument? As we recall, the stack grows down; when a value is added to the stack, the stack pointer decreases by the corresponding number of bytes. Therefore, after adding a function argument to the stack and before it starts working, something else is added to the stack. This is obviously the return address added to the stack after calling the system call function.
Find the places in the program where sub401100 function calls are found. There really are two of them: at DialogFunc + 97 and DialogFunc + 113. The instructions that interest us begin here:
Relatively long piece of code
loc_401196:
mov esi, [ebp+hDlg]
mov edi, ds:SendDlgItemMessageA
lea ecx, [ebp+lParam]
push ecx ; lParam
push 0Ah ; wParam
push 0Dh ; Msg
push 3E8h ; nIDDlgItem
push esi ; hDlg
call edi ; SendDlgItemMessageA
lea edx, [ebp+var_1C]
push edx ; lParam
push 0Ah ; wParam
push 0Dh ; Msg
push 3E9h ; nIDDlgItem
push esi ; hDlg
call edi ; SendDlgItemMessageA
pusha
movsx ecx, byte ptr [ebp+lParam+2]
movsx edx, byte ptr [ebp+lParam+1]
movsx eax, byte ptr [ebp+lParam+3]
shl eax, 8
or eax, ecx
movsx ecx, byte ptr [ebp+lParam]
shl eax, 8
or eax, edx
shl eax, 8
or eax, ecx
mov [ebp+arg_4], eax
popa
mov eax, [ebp+arg_4]
push eax
call sub_401100
First, two SendDlgItemMessageA functions are called in a row. This function takes the handle of the element and sends it the Msg system message. In our case, Msg in both cases is 0Dh, which is the hexadecimal equivalent of the WM_GETTEXT constant. Here, the values of two text fields are retrieved into which the user entered “two 4-character strings”. The letter A in the function name indicates that the ASCII format is used - one byte per character.
The first line is written at the offset lParam, the second, which is obvious - at the offset var_1C.
So, after the SendDlgItemMessageA functions are executed, the current state of the registers is saved on the stack using the pusha command, then one byte of one of the lines is written to the ecx, edx and eax registers. As a result, each of the registers takes the form: 000000 ##. Then:
- The SHL command shifts the bit contents of the eax register by 1 byte or, in other words, multiplies the arithmetic contents by 100 in the hexadecimal system or by 256 in decimal. As a result, eax takes the form 0000 ## 00 (for example, 00001200).
- An OR operation is performed between the received eax value and the ecx register in the form 000000 ## (let it be 00000034). As a result, eax will look like this: 00001234.
- The last, fourth byte of the string is written to the "free" exx.
- The contents of ex are again shifted by byte, freeing up space in the low byte for the next OR command. Now eax looks like this: 00123400.
- The OR instruction is executed, this time between ex and edx, which contains, say, 00000056. Now ex is 00123456.
- The two steps SHL eax, 8 and OR are repeated, as a result of which the new ecx content (00000078) is added to the “end” eax. As a result, eax stores the value 12345678.
Then this value is stored in the "variable" - in the memory area at offset arg_4. The state of the registers (their previous values), previously stored on the stack, is pulled from the stack and distributed to the registers. Then, the value at the offset arg_4 is again written to the eax register and this value is pushed from the register onto the stack. After that the sub_401100 function call follows.
What is the meaning of these operations? To find out is very simple even in practice, without theory. We set a breakpoint in the debugger, for example, on the push eax instruction (just before the subfunction is called) and run the program for execution. Keygen starts up, asks for lines. Having entered qwer and tyui and stopped at the breakpoint, we look at the value еах: 72657771. Decode to the text: rewq. That is, the physical meaning of these operations is line inversion.
Now we know that in sub_401100 one of the original lines is transmitted, turned upside down, in the size of a double word, which fits entirely in any of the standard registers. Perhaps you can take a look at the instructions sub_401100.
Another relatively long piece of code
sub_401100 proc near
var_4= dword ptr -4
arg_0= dword ptr 8
push ebp
mov ebp, esp
push ecx
push ebx
push esi
push edi
pusha
mov ecx, [ebp+arg_0]
mov eax, ecx
shl eax, 10h
not eax
add ecx, eax
mov eax, ecx
shr eax, 5
xor eax, ecx
lea ecx, [eax+eax*8]
mov edx, ecx
shr edx, 0Dh
xor ecx, edx
mov eax, ecx
shl eax, 9
not eax
add ecx, eax
mov eax, ecx
shr eax, 11h
xor eax, ecx
mov [ebp+var_4], eax
popa
mov eax, [ebp+var_4]
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
retn
sub_401100 endp
At the very beginning, there is nothing interesting here - the state of the registers is carefully stored on the stack. And here is the first command that we are interested in - following the PUSHA instruction. It writes in exx the function argument stored at offset arg_0. Then this value is transferred to eax. And cut off half: as we recall, in our example 72657771 is passed to sub_401100; a logical shift to the left by 10h (16 in decimal) turns the register value into 77710000.
After that, the register value is inverted with the NOT instruction. This means that in the binary representation of the register, all zeros are converted to units, and units to zeros. The register after executing this instruction contains 888EFFFFF.
The ADD instruction adds (adds, pluses, etc.) the resulting value to the original value of the argument, which is still in the exx register (now it is clear why it was written first in exx and then in exax?). The result is saved in ex. Check what exh looks like after this operation is completed: FAF47770.
This result is copied from exx to exah, after which the SHR instruction is applied to the contents of exax. This operation is the opposite of SHL - if the latter shifts the digits to the left, then the first shifts them to the right. Just as a logical left shift operation is equivalent to multiplying by a power of two, a logical right shift operation is equivalent to the same division. Let's see what value will be the result of this operation: 7D7A3BB.
Now we commit another violence against the contents of ex and ex: the XOR instruction is an addition modulo 2 or “exclusive OR”. The essence of this operation, roughly speaking, is that its result is equal to unity (truth) only if its operands are unambiguous. For example, in the case of 0 xor 1, the result is true, or one. In the case of 0 xor 0 or 1 xor 1, the result will be false, or zero. In our case, as a result of following this instruction with respect to the registers eax (7D7A3BB) and exx (FAF47770), the value FD23D4CB is written to the register eax.
The next LEA ecx command, [eax + eax * 8] elegantly and easily multiplies ex by 9 and writes the result in exx. Then this value is copied to edx and shifted to the right by 13 digits: we get 73213 in edx and E6427B23 in exx. Then - again Xorim exx and edx, writing in exx E6454930. Copy this to eax, shift it to the left by 9 digits: 8A926000, then invert it, getting 756D9FFF. Add this value to the exx register - we have 5BB2E92F. Copy this to eax, shift right already by 17 digits - 2DD9 - and Xorim with exx. We get a total of 5BB2C4F6. Then ... then ... what do we have there? What all?..
So, we save this value to the memory area at var_4 offset, load from the register status stack, again take the final value from the memory and finally remove from the stack the remaining register states stored at the beginning. Exit the function. Hurray! .. however, it’s too early to rejoice, so far at the exit from the first function call we have a maximum of four half-printed characters, and yet we still have a whole raw string, and we still need to bring this to a divine form.
Let's move on to a higher level of analysis - from the disassembler to the decompiler. Imagine the entire DialogFunc function, which contains sub_401100 calls, in the form of a C-like pseudocode. Actually, this disassembler calls it a "pseudo-code", in fact, this is practically the C code, only ugly. We look:
Need more code. Need to build a ziggurat.
SendDlgItemMessageA(hDlg, 1000, 0xDu, 0xAu, (LPARAM)&lParam);
SendDlgItemMessageA(hDlg, 1001, 0xDu, 0xAu, (LPARAM)&v15);
v5 = sub_401100((char)lParam | ((SBYTE1(lParam) | ((SBYTE2(lParam) | (SBYTE3(lParam) << 8)) << 8)) << 8));
v6 = 0;
do
{
v21[v6] = v5 % 0x24;
v7 = v21[v6];
v5 /= 0x24u;
if ( v7 >= 10 )
v8 = v7 + 55;
else
v8 = v7 + 48;
v21[v6++] = v8;
}
while ( v6 < 4 );
v22 = 0;
v9 = sub_401100(v15 | ((v16 | ((v17 | (v18 << 8)) << 8)) << 8));
v10 = 0;
do
{
v19[v10] = v9 % 0x24;
v11 = v19[v10];
v9 /= 0x24u;
if ( v11 >= 10 )
v12 = v11 + 55;
else
v12 = v11 + 48;
v19[v10++] = v12;
}
while ( v10 < 4 );
v20 = 0;
wsprintfA(&v13, "%s-%s-%s-%s", &lParam, &v15, v21, v19);
SendDlgItemMessageA(hDlg, 1002, 0xCu, 0, (LPARAM)&v13);
This is already easier to read than assembly listing. However, not in all cases you can rely on the decompiler: you need to be ready to watch the thread of assembler logic for hours, the status of the registers and the stack in the debugger ... and then give written explanations to the FSB or FBI. In the evening I have especially funny jokes.
As I said, reading is easier, but still far from perfect. Let's analyze the code and give the variables more readable names. We will give clear and logical names to key variables, and simpler and simpler to counters.
The same, only translated from Chinese to Hindu.
SendDlgItemMessageA(hDlg, 1000, 0xDu, 0xAu, (LPARAM)&first_given_string);
SendDlgItemMessageA(hDlg, 1001, 0xDu, 0xAu, (LPARAM)&second_given_string);
first_given_string_encoded = sub_401100((char)first_given_string | ((SBYTE1(first_given_string) | ((SBYTE2(first_given_string) | (SBYTE3(first_given_string) << 8)) << 8)) << 8));
i = 0;
do
{
first_result_string[i] = first_string_encoded % 0x24;
temp_char = first_result_string[i];
first_string_encoded /= 0x24u;
if ( temp_char >= 10 )
next_char = temp_char + 55;
else
next_char = temp_char + 48;
first_result_string[i++] = next_char;
}
while ( i < 4 );
some_kind_of_data = 0;
second_string_encoded = sub_401100(byte1 | ((byte2 | ((byte3 | (byte4 << 8)) << 8)) << 8));
j = 0;
do
{
second_result_string[j] = second_string_encoded % 0x24;
temp_char2 = second_result_string[j];
second_string_encoded /= 0x24u;
if ( temp_char2 >= 10 )
next_char2 = temp_char2 + 55;
else
next_char2 = temp_char2 + 48;
second_result_string[j++] = next_char2;
}
while ( j < 4 );
yet_another_some_kind_of_data = 0;
wsprintfA(&buffer, "%s-%s-%s-%s", &first_given_string, &second_given_string, first_result_string, second_result_string);
SendDlgItemMessageA(hDlg, 1002, 0xCu, 0, (LPARAM)&buffer);
Epilogue
Level complete . The next (and final) goal is to write your keygen according to this algorithm. Out of habit, I will write in the Linux shell script language bash. test $ {# reg1} -gt && reg1 = `echo" $ {reg1: -8} "` is a truncation of a string containing an emulated register value, up to 8 low characters. When performing operations, extra senior bits were added there. All the rest is a painstaking emulation of assembler listing. I pointed out the Abnormal Programming hub at the top, right? ..
bash implementation of the notorious sub_401100: Keygen

main function:

Testing and comparison:


Conclusion
Now we could generate keys to some gaming software directly from the Linux console, but this is impossible for several reasons: firstly, I don’t know what kind of software this keygen is intended for - I downloaded it at random on the Internet; secondly, the use of fake keys and unlicensed proprietary software is prohibited by international law. ;)
