Breaking a Simple Crack with Ghidra - Part 2

  • Tutorial
In the first part of the article, using Ghidra, we automatically analyzed a simple crack program (which we downloaded from crackmes.one). We figured out how to rename “incomprehensible” functions right in the decompiler listing, and also understood the algorithm of the “top-level” program, i.e. which is done by main () .

In this part, as I promised, we will take up the analysis of the _construct_key () function , which, as we found out, is responsible for reading the binary file transferred to the program and checking the data read.

Step 5 - Overview of the _construct_key () Function


Let's look at the full listing of this function right away:

Listing _construct_key ()
char ** __cdecl _construct_key(FILE *param_1)
{
  int iVar1;
  size_t sVar2;
  uint uVar3;
  uint local_3c;
  byte local_36;
  char local_35;
  int local_34;
  char *local_30 [4];
  char *local_20;
  undefined4 local_19;
  undefined local_15;
  char **local_14;
  int local_10;
  local_14 = (char **)__prepare_key();
  if (local_14 == (char **)0x0) {
    local_14 = (char **)0x0;
  }
  else {
    local_19 = 0;
    local_15 = 0;
    _text(&local_19,1,4,param_1);
    iVar1 = _text((char *)&local_19,*(char **)local_14[1],4);
    if (iVar1 == 0) {
      _text(local_14[1] + 4,2,1,param_1);
      _text(local_14[1] + 6,2,1,param_1);
      if ((*(short *)(local_14[1] + 6) == 4) && (*(short *)(local_14[1] + 4) == 5)) {
        local_30[0] = *local_14;
        local_30[1] = *local_14 + 0x10c;
        local_30[2] = *local_14 + 0x218;
        local_30[3] = *local_14 + 0x324;
        local_20 = *local_14 + 0x430;
        local_10 = 0;
        while (local_10 < 5) {
          local_35 = 0;
          _text(&local_35,1,1,param_1);
          if (*local_30[local_10] != local_35) {
            _free_key(local_14);
            return (char **)0x0;
          }
          local_36 = 0;
          _text(&local_36,1,1,param_1);
          if (local_36 == 0) {
            _free_key(local_14);
            return (char **)0x0;
          }
          *(uint *)(local_30[local_10] + 0x104) = (uint)local_36;
          _text(local_30[local_10] + 1,1,*(size_t *)(local_30[local_10] + 0x104),param_1);
          sVar2 = _text(local_30[local_10] + 1);
          if (sVar2 != *(size_t *)(local_30[local_10] + 0x104)) {
            _free_key(local_14);
            return (char **)0x0;
          }
          local_3c = 0;
          _text(&local_3c,1,1,param_1);
          local_3c = local_3c + 7;
          uVar3 = _text(param_1);
          if (local_3c < uVar3) {
            _free_key(local_14);
            return (char **)0x0;
          }
          *(uint *)(local_30[local_10] + 0x108) = local_3c;
          _text(param_1,local_3c,0);
          local_10 = local_10 + 1;
        }
        local_34 = 0;
        _text(&local_34,4,1,param_1);
        if (*(int *)(*local_14 + 0x53c) == local_34) {
          _text("Markers seem to still exist");
        }
        else {
          _free_key(local_14);
          local_14 = (char **)0x0;
        }
      }
      else {
        _free_key(local_14);
        local_14 = (char **)0x0;
      }
    }
    else {
      _free_key(local_14);
      local_14 = (char **)0x0;
    }
  }
  return local_14;
}


With this function we will do the same as before with main () - for a start we will go over the “veiled” function calls. As expected, all these functions are from the standard C libraries. I will not describe the procedure for renaming functions again - return to the first part of the article, if necessary. As a result of renaming, the following standard functions were “found”:

  • fread ()
  • strncmp ()
  • strlen ()
  • ftell ()
  • fseek ()
  • puts ()

We renamed the corresponding wrapper functions in our code (those that the decompiler brazenly hid behind the word _text ) by adding index 2 (so that there would be no confusion with the original C-functions). Almost all of these functions are for working with file streams. It is not surprising - a quick glance at the code is enough to understand that it sequentially reads data from a file (the descriptor of which is passed to the function as the only parameter) and compares the read data with a certain two-dimensional array of local_14 bytes .

Let's assume that this array contains data for key verification. Let's call it, say key_array. Since Hydra allows you to rename not only functions, but also variables, we will use this and rename the incomprehensible local_14 into a more understandable key_array . This is done the same way as for functions: through the menu of the right mouse button ( Rename local ) or by the L key from the keyboard.

So, immediately after the declaration of local variables, a certain function _prepare_key () is called :

key_array = (char **)__prepare_key();
if (key_array == (char **)0x0) {
  key_array = (char **)0x0;
}

We will return to _prepare_key () , this is the 3rd level of nesting in our call hierarchy: main () -> _construct_key () -> _prepare_key () . In the meantime, we accept that it creates and somehow initializes this “test” two-dimensional array. And only if this array is not empty, the function continues its work, as evidenced by the else block immediately after the above condition.

Next, the program reads the first 4 bytes from the file and compares with the corresponding section of the key_array array . (The code below is after renaming, including the local_19 variable , I renamed first_4bytes .)

first_4bytes = 0;
				/* прочитать первые 4 байта из файла */
fread2(&first_4bytes,1,4,param_1);
				/* сравнить с key_array[1][0...3] */
iVar1 = strncmp2((char *)&first_4bytes,*(char **)key_array[1],4);
if (iVar1 == 0) { ... }

Thus, further execution occurs only if the first 4 bytes coincide (remember this). Then we read 2 2-byte blocks from the file (and the same key_array is used as a buffer for writing data ):

fread2(key_array[1] + 4,2,1,param_1);
fread2(key_array[1] + 6,2,1,param_1);

And again - further the function only works if the next condition is true:

if ((*(short *)(key_array[1] + 6) == 4) && (*(short *)(key_array[1] + 4) == 5)) { 
   // выполняем дальше ...
}

It is easy to see that the first of the 2-byte blocks read above should be the number 5, and the second should be the number 4 (the data type short just occupies 2 bytes on 32-bit platforms).

Next is this:

local_30[0] = *key_array;  // т.е. key_array[0]
local_30[1] = *key_array + 0x10c;
local_30[2] = *key_array + 0x218;
local_30[3] = *key_array + 0x324;
local_20 = *key_array + 0x430;

Here we see that the local_30 array (declared as char * local_30 [4]) contains the offsets of the key_array pointer . That is, local_30 is an array of marker lines into which the data from the file will probably be read. Under this assumption, I renamed local_30 to markers . In this section of code, only the last line seems a little suspicious, where the assignment of the last offset (at index 0x430, i.e. 1072) is performed not by the next markers element , but by a separate local_20 variable ( char * ). But we will figure it out yet, but for now - let's move on!

Next we are waiting for a cycle:

 i = 0; // local_10 переименовал в i
 while (i < 5) {
    // ...
    i = i + 1;
}

Those. Only 5 iterations from 0 to 4 inclusive. In the loop, reading from the file and checking for compliance with our markers array immediately begins :

char c_marker = 0; // переименовал из local_35
		/* прочитать след. байт из файла */
fread2(&c_marker, 1, 1, param_1);
if (*markers[i] != c_marker) {
		/*  здесь и далее - вернуть пустой массив при ошибке */
	_free_key(key_array);
	return (char **)0x0;
}

That is, the next byte from the file is read into the c_marker variable (in the original decompiled code - local_35 ) and checked for compliance with the first character of the i-th markers element . In case of a mismatch, the key_array array is nullified and an empty double pointer is returned. Further along the code, we see that this is done whenever the read data does not match the verification data.

But here, as they say, "the dog is buried." Let's take a closer look at this cycle. It has 5 iterations, as we found out. You can check this if you want by looking at the assembler code:





Indeed, the CMP command compares the value of the local_10 variable (we already have i) with the number 4 and if the value is less than or equal to 4 (JLE command), the transition to the label LAB_004017eb , i.e. beginning of the body of the cycle. Those. the condition will be met for i = 0, 1, 2, 3, and 4 - only 5 iterations! Everything would be fine, but markers are also indexed by this variable in a loop, and after all, this array is declared with only 4 elements:

char *markers [4];

So, someone is clearly trying to deceive someone :) Remember, I said that this line is doubtful?

local_20 = *key_array + 0x430;

And how! Just look at the entire listing of the function and try to find at least one more reference to the local_20 variable . She is not there! We conclude from this: this offset should also be stored in the markers array , and the array itself should contain 5 elements. Let's fix it. We pass to the declaration of the variable, press Ctrl + L (Retype variable) and boldly change the size of the array to 5:



Done. Scroll down to the code for assigning pointer offsets to markers , and - lo and behold ! - an incomprehensible extra variable disappears and everything falls into place:

markers[0] = *key_array;
markers[1] = *key_array + 0x10c;
markers[2] = *key_array + 0x218;
markers[3] = *key_array + 0x324;
markers[4] = *key_array + 0x430; // убежавшее было присвоение... мы поймали тебя!

We return to our while loop (in the source code, this will most likely be for , but we don’t care). Next, the byte from the file is read again and its value is checked:

byte n_strlen1 = 0; // переименован из local_36
		/* прочитать след. байт из файла */
fread2(&n_strlen1,1,1,param_1);
if (n_strlen1 == 0) {
		/* значение не должно быть нулевым */
	_free_key(key_array);
	return (char **)0x0;
}

OK, this n_strlen1 must be nonzero. Why? You will see now, but at the same time you will understand why I gave this variable the following name:

          /* записываем значение n_strlen1) в (markers[i] + 0x104) */
*(uint *)(markers[i] + 0x104) = (uint)n_strlen1;
          /* прочитать из файла (n_strlen1) байт (--> некая строка?) */
fread2(markers[i] + 1,1,*(size_t *)(markers[i] + 0x104),param_1);
n_strlen2 = strlen2(markers[i] + 1); // переименован из sVar2
if (n_strlen2 != *(size_t *)(markers[i] + 0x104)) {
          /* длина прочитанной строки (n_strlen2) должна == n_strlen1 */
       _free_key(key_array);
       return (char **)0x0;
}

I added comments on which everything should be clear. N_strlen1 bytes are read from the file and saved as a sequence of characters (ie a string) into the markers [i] array - that is, after the corresponding “stop-symbol”, which are already written there from key_array . Saving the value n_strlen1 in markers [i] at offset 0x104 (260) does not play any role here (see the first line in the code above). In fact, this code can be optimized as follows (and certainly this is the case in the source code):

fread2(markers[i] + 1, 1, (size_t) n_strlen1, param_1);
n_strlen2 = strlen2(markers[i] + 1);
if (n_strlen2 != (size_t) n_strlen1) { ... }

It also checks that the length of the read line is n_strlen1 . This may seem unnecessary, given that this parameter was passed to the fread function , but fread reads no more than so many specified bytes and can read less than indicated, for example, in the case of meeting the end-of-file marker (EOF). That is, everything is strict: the length of the line (in bytes) is indicated in the file, then the line itself goes - and exactly 5 times. But we are getting ahead of ourselves.

Further waters this code (which I also immediately commented):

uint n_pos = 0; // переименован из local_3c
		/* прочитать след. байт из файла */
fread2(&n_pos,1,1,param_1);
		/* увеличить на 7 */
n_pos = n_pos + 7;
		/* получить позицию файлового курсора */
uint n_filepos = ftell2(param_1); // переименован из uVar3
if (n_pos < n_filepos) {
		/* n_pos должна быть >= n_filepos */
	_free_key(key_array);
	return (char **)0x0;
}

It is still simpler here: we take the next byte from the file, add 7 and compare the resulting value with the current cursor position in the file stream obtained by the ftell () function . The value of n_pos must be no less than the cursor position (i.e. offset in bytes from the beginning of the file).

The final line in the loop:

fseek2(param_1,n_pos,0);

Those. rearrange the file cursor (from the beginning) to the position indicated by n_pos by the fseek () function . OK, we do all these operations in the loop 5 times. The _construct_key () function ends with the following code:

int i_lastmarker = 0; // переименован из local_34
			/* прочитать последние 4 байт из файла (int32) */
fread2(&i_lastmarker,4,1,param_1);
if (*(int *)(*key_array + 0x53c) == i_lastmarker) {
			/* это число должно == key_array[0][1340]
			   ...тогда все ОК :) */
  puts2("Markers seem to still exist");
}
else {
  _free_key(key_array);
  key_array = (char **)0x0;
}

Thus, the last block of data in the file should be a 4-byte integer value and it should equal the value in key_array [0] [1340] . In this case, we will receive a congratulatory message in the console. Otherwise, the empty array still returns without any praise :)

Step 6 - Overview of __prepare_key () Function


We have only one unassembled function left - __prepare_key () . We have already guessed that it is in it that the verification data is generated in the form of the key_array array , which is then used in the _construct_key () function to verify the data from the file. It remains to find out what kind of data there!

I will not analyze this function in detail and immediately give a complete listing with comments after all the necessary renaming of variables:

__Prepare_key () function listing
void ** __prepare_key(void)
{
  void **key_array;
  void *pvVar1;
                    /* key_array = new char*[2]; // 2 4-байтных указателя (char*) */
  key_array = (void **)calloc2(1,8);
  if (key_array == (void **)0x0) {
    key_array = (void **)0x0;
  }
  else {
    pvVar1 = calloc2(1,0x540);
                    /* key_array[0] = new char[1340] */
    *key_array = pvVar1;
    pvVar1 = calloc2(1,8);
                    /* key_array[1] = new char[8] */
    key_array[1] = pvVar1;
                    /* "VOID" */
    *(undefined4 *)key_array[1] = 0x404024;
                    /* 5 и 4 (2-байтные слова) */
    *(undefined2 *)((int)key_array[1] + 4) = 5;
    *(undefined2 *)((int)key_array[1] + 6) = 4;
                    /* key_array[0][0] = 'b' */
    *(undefined *)*key_array = 0x62;
    *(undefined4 *)((int)*key_array + 0x104) = 3;
                    /* 'W' */
    *(undefined *)((int)*key_array + 0x218) = 0x57;
                    /* 'p' */
    *(undefined *)((int)*key_array + 0x324) = 0x70;
                    /* 'l' */
    *(undefined *)((int)*key_array + 0x10c) = 0x6c;
                    /* 152 (не ASCII) */
    *(undefined *)((int)*key_array + 0x430) = 0x98;
                    /* последний маркер = 1122 (int32) */
    *(undefined4 *)((int)*key_array + 0x53c) = 0x462;
  }
  return key_array;
}


The only place worth considering is this line:

*(undefined4 *)key_array[1] = 0x404024;

How do I understand that here lies the line "VOID"? The fact is that 0x404024 is the address in the address space of the program leading to the .rdata section . Double-clicking on this value allows us to see at a glance what is there:



By the way, this can also be understood from the assembler code for this line: The data corresponding to the VOID line is at the very beginning of the .rdata section (at zero offset from the corresponding addresses). So, at the exit from this function, a two-dimensional array should be formed with the following data:

004015da c7 00 24 MOV dword ptr [EAX], .rdata = 56h V
40 40 00





[0] [0]:'b' [268]:'l' [536]:'W' [804]:'p' [1072]:152 [1340]:1122
[1] [0-3]:"VOID" [4-5]:5 [6-7]:4

Step 7 - Prepare the binary for the crack


Now we can start the synthesis of the binary file. All the initial data in our hands:
1) verification data (“stop symbols”) and their positions in the verification array;
2) the sequence of data in the file

Let's restore the structure of the desired file according to the algorithm of the _construct_key () function . So, the sequence of data in the file will be as follows:

File structure
  1. 4 bytes == key_array [1] [0 ... 3] == "VOID"
  2. 2 bytes == key_array [1] [4] == 5
  3. 2 bytes == key_array [1] [6] == 4
  4. 1 byte == key_array [0] [0] == 'b' (token)
  5. 1 byte == (next line length) == n_strlen1
  6. n_strlen1 bytes == (any string) == n_strlen1
  7. 1 byte == (+7 == next token) == n_pos
  8. 1 byte == key_array [0] [0] == 'l' (token)
  9. 1 byte == (next line length) == n_strlen1
  10. n_strlen1 bytes == (any string) == n_strlen1
  11. 1 byte == (+7 == next token) == n_pos
  12. 1 byte == key_array [0] [0] == 'W' (token)
  13. 1 byte == (next line length) == n_strlen1
  14. n_strlen1 bytes == (any string) == n_strlen1
  15. 1 byte == (+7 == next token) == n_pos
  16. 1 byte == key_array [0] [0] == 'p' (token)
  17. 1 byte == (next line length) == n_strlen1
  18. n_strlen1 bytes == (any string) == n_strlen1
  19. 1 byte == (+7 == next token) == n_pos
  20. 1 byte == key_array [0] [0] == 152 (token)
  21. 1 byte == (next line length) == n_strlen1
  22. n_strlen1 bytes == (any string) == n_strlen1
  23. 1 byte == (+7 == next token) == n_pos
  24. 4 bytes == (key_array [1340]) == 1122


For clarity, I made in Excel a label with the data of the file you are looking for:



Here, on the 7th line - the data itself in the form of characters and numbers, on the 6th line - their hexadecimal representations, on the 8th line - the size of each element (in bytes ), in the 9th line - the offset relative to the beginning of the file. This view is very convenient because allows you to enter any lines in the future file (marked with a yellow fill), while the values ​​of the lengths of these lines, as well as the position offsets of the next stop symbol are calculated by formulas automatically, as the program algorithm requires. Above (in lines 1-4), the structure of the key_array check array is shown .

The excel itself plus other source materials for the article can be downloaded here .

Binary file generation and validation


The only thing left is to generate the desired file in binary format and feed it with our crack. To generate the file, I wrote a simple Python script:

Script to generate the file
import sys, os
import struct
import subprocess
out_str = ['!', 'I', ' solved', ' this', ' crackme!']
def write_file(file_path):
    try:      
        with open(file_path, 'wb') as outfile:
            outfile.write('VOID'.encode('ascii'))  
            outfile.write(struct.pack('2h', 5, 4)) 
            outfile.write('b'.encode('ascii'))
            outfile.write(struct.pack('B', len(out_str[0])))
            outfile.write(out_str[0].encode('ascii'))
            pos = 10 + len(out_str[0])
            outfile.write(struct.pack('B', pos - 6))
            outfile.write('l'.encode('ascii'))
            outfile.write(struct.pack('B', len(out_str[1])))
            outfile.write(out_str[1].encode('ascii'))
            pos += 3 + len(out_str[1])
            outfile.write(struct.pack('B', pos - 6))
            outfile.write('W'.encode('ascii'))
            outfile.write(struct.pack('B', len(out_str[2])))
            outfile.write(out_str[2].encode('ascii'))
            pos += 3 + len(out_str[2])
            outfile.write(struct.pack('B', pos - 6))
            outfile.write('p'.encode('ascii'))
            outfile.write(struct.pack('B', len(out_str[3])))
            outfile.write(out_str[3].encode('ascii'))
            pos += 3 + len(out_str[3])
            outfile.write(struct.pack('B', pos - 6))
            outfile.write(struct.pack('B', 152))
            outfile.write(struct.pack('B', len(out_str[4])))
            outfile.write(out_str[4].encode('ascii'))
            pos += 3 + len(out_str[4])
            outfile.write(struct.pack('B', pos - 6))
            outfile.write(struct.pack('i', 1122))
    except Exception as err:
        print(err)
        raise
def main():
    if len(sys.argv) != 2:
        print('USAGE: {this_script.py} path_to_crackme[.exe]')
        return
    if not os.path.isfile(sys.argv[1]):
        print('File "{}" unavailable!'.format(sys.argv[1]))
        return
    file_path = os.path.splitext(sys.argv[1])[0] + '.dat'
    try:
        write_file(file_path)
    except:
        return
    try:
        outputstr = subprocess.check_output('"{}" -f "{}"'.format(sys.argv[1], file_path), stderr=subprocess.STDOUT)
        print(outputstr.decode('utf-8'))
    except Exception as err:
        print(err)        
if __name__ == '__main__':
    main()


The script takes the path to the cracks as a single parameter, then generates a binary file with the key in the same directory and calls the cracks with the corresponding parameter, translating the program output to the console.

To convert text data to binary, use the struct package . The pack () method allows you to write binary data in a format that indicates the data type ("B" = "byte", "i" = int, etc.), and you can also specify the sequence (">" = "Big -endian "," <"=" Little-endian "). The default order is Little-endian. Because we already determined in the first article that this is exactly our case, then we indicate only the type.

All code as a whole reproduces the program algorithm we found. As a line, output if successful, I specified “I solved this crackme!” (You can modify this script so that it is possible to specify any line).

Check the conclusion:



Hooray, everything works! So, having sweated a little and having sorted out a couple of functions, we were able to completely restore the program algorithm and “crack” it. Of course, this is just a simple crack, a test program, and even that of the 2nd difficulty level (out of 5 offered on that site). In reality, we will deal with a complex hierarchy of calls and dozens - hundreds of functions, and in some cases - encrypted sections of data, garbage code and other obfuscation techniques, up to the use of internal virtual machines and P-code ... But this, as they say, is already a completely different story.

Materials for the article.

Also popular now: