Investigation of one unknown archive

Relocation. New town. Job seeking. Even for an IT professional, this can take a long time. A series of interviews that, in general, are very similar to each other. And as it usually happens when you have already found a job, after a while, one interesting office is announced.

It was difficult to understand what she was specifically doing, however, her area of ​​interest was the study of other people's software. It sounds intriguing, although when you realize that this seems to be not a vendor who releases software for cybersecurity, you stop for a second and start scratching your turnips.

In short: they threw off the archive and offered to examine it as a test task and try to calculate a certain signature based on the presented input data. It is worth noting that I had very little experience in such activities and, probably, that’s why in the first iteration of the solution I only had a couple of hours - the motivation for doing this later came to naught. And yes, of course, the first thing I tried to run it on the phone / emulator - this application is invalid.

What we have: an archive with the extension ".apk" . I placed the task itself under the spoiler so that it is not indexed by search engines: what if the guys don’t like it, that I put the solution on Habr?

Task itself
The APK contains functionality for generating signatures for an associative array.
Try to get a signature for the following data set:

     "user" : "LeetD3vM4st3R",
     "password": "__s33cr$$tV4lu3__",
     "hash": "34765983265937875692356935636464"

Roll up the sleeves

It is said that the archive contains the functionality of signing an associative array. By the file extension, we immediately understand that we are dealing with an application written for Android. First we unpack the archive. In fact, this is a regular ZIP archive, and any archiver will cope with it lightly. I used the apktool utility, and, as it turned out, accidentally bypassed a couple of rakes. Yes, it happens (usually the opposite, yes?). The spell is pretty simple:

apktool d

It turns out that the code and resources in the apk file are also stored packed in separate binaries, and other software will be needed to extract them. apktool implicitly pulled out class bytes, resources, and decomposed it all into a natural file hierarchy. You can proceed.

├── AndroidManifest.xml
├── apktool.yml
├── lib
│   └── arm64-v8a
├── original
│   ├── AndroidManifest.xml
│   └── META-INF
├── res
│   ├── anim
│   ├── color
│   ├── drawable
│   ├── layout
│   ├── layout-watch-v20
│   ├── mipmap-anydpi-v26
│   ├── values
│   └── values-af
├── smali
│   ├── android
│   ├── butterknife
│   ├── com
│   ├── net
│   └── org
└── unknown
    └── org

We see a similar hierarchy (left its simplified version) and are trying to figure out where to start. It is worth noting that I still once wrote a couple of small applications for Android, so the essence of the part of the directories and, in general, the principles of the device Android applications, I’m pretty clear.

To begin with, I decide to just “walk” through the files. I open AndroidManifest.xml and start reading meaningfully. My attention is attracted by a strange attribute


It turns out that he is responsible for supporting languages ​​with the letter "right-to-left" in the application. We begin to strain. Not good.

Further, my gaze clings to the unknown folder. Under it is a hierarchy of the form: and a huge number of text files with obscure content. Google the full name of the package and it turns out what is stored here, something related to the search algorithm for words phonetically similar to the given one. Frankly, here I began to strain harder. Having poked around the directories a bit, I actually found the code itself, and then the fun began. I was met not by the usual Java bytecode, with which I once managed to play around, but something else. Very similar, but different.

As it turned out, Android has its own virtual machine - Dalvik. And, like every respected virtual machine, it has its own bytecode. It seems that on the first attempt to solve this problem, it was on this sad note that I announced intermission, bowed, dropped the curtain and threw it all for 4 months until my curiosity completely finished me.

Roll up the sleeves [2]

“But can’t it be so that everything is easier?” - this is the question I asked myself when I started the task for the second time. I started searching the internet for a decompiler from smali to Java. I saw only that it is impossible to carry out this process unambiguously. Frowning a little, he went to Github and drove a couple of key phrases into the search line. The first came smali2java .

git clone
gradle build
java -jar smali2java.jar ..

Mistakes I see a huge stack trace and errors on several pages of the terminal. Having read a little about the essence of the content (and restraining emotions from the size of the stack trace), I find that this tool works on the basis of a certain grammar described and the bytecode that she met clearly does not correspond to it. I open smali bytecode and see annotations, synthetic methods and other strange constructions in it. There was no such thing in Java bytecode! How long? Delete!

More details
The Dalvik virtual machine (as well as the JVM), as it turned out, is not aware of the existence of such concepts as inner / outside classes (read nested classes), and the compiler generates the so-called “synthetic” methods to provide access from the nested class to external fields, for example.

As an example:

If the outer class (OuterClass) has a field

public class OuterClass {
	List a;

So that the private class can access the field of the external class, the compiler will implicitly generate the following method:

static synthetic java.util.List getList(OuterClass p1) {
	p1 = p1.a;
	return p1;

Also, due to such a “engine compartment” kitchen, the work of some other mechanisms that the language provides is achieved.

You can begin to study this question in more detail from here .

Does not help. He even swears at a seemingly not suspicious bytecode. I open the source code of the decompiler, read and see something very strange: even Hindu programmers (with all due respect) would not have written this. A thought creeps in: not really the generated code. I reject the idea for about 30 minutes, trying to understand what the mistake is. COMPLICATED. I open Github again - and really, a grammar-generated parser. And here is the generator itself generator . Putting it all away and trying to approach from the other side.

It is worth noting that a little later I still tried to change the grammar and in places even the bytecode itself so that the decompiler could still digest it. But even when the bytecode became valid in terms of decompiler grammar, the program simply didn’t return anything to me. Open source ...

I leaf through the bytecode and stumble upon constants unknown to me. Googling, I meet the same in the book on reverse Android applications. I recall that this is just the ID assigned by the compiler preprocessor, which is assigned to the resources of the Android application (the code writing time constant is R. *). The next half hour - hour, I will briefly examine which registers are responsible for what, in what order the arguments are passed, and generally delve into the syntax.

What does it look like?

I found the layout of the main application window, and from it I already understood what was going on in the application: on the main screen (Activity) there is a RecyclerView (conditionally, a View that can reuse UI objects that are not currently displayed for memory utilization) with input fields key / value pairs, a couple of buttons that are responsible for adding a new key / value pair to a certain abstract container, and a button that generates a signature (signature) for this container.

Looking at the annotations and observing a certain amount of code suspiciously similar to the generated one, I begin to google. The project uses the ButterKnife library, which allows using annotations to produce inflate () -> bind ()UI elements automatically. If there are annotations in the class, the ButterKnife annotation processor implicitly creates another binder class of the form__ViewBinding, which does all the dirty work under the hood. Actually, I got all this information from only one MainActivity file after I manually recreated the similarity of the Java source from it. After half an hour, I realized that the annotations of this library can also set a callback on button actions and found those key functions that were actually responsible for adding a key / value pair to the container and generating a signature.

Of course, during the study, I had to get into the “giblets” of various libraries and plugins, because even beautiful landos with cookies do not cover all use cases and details, which for any “reverser”, I think, is a common practice.

Laziness is a friend of a programmer

Having spent some more time on the second source, I was completely tired and realized that it wasn’t possible to cook porridge. I’m climbing on Github again, and this time I’m looking more closely. I find the Smali2PsuedoJava project - a decompiler in “pseudo-Java code”. Even if this utility, at least something can lead to a human appearance, then for me the author is a mug of his favorite beer (well, or at least put an asterisk on Github, for starters).  

And it really works! Effect on the face:


A little later, studying the Java pseudo-code of the project and incredulously comparing it with the smali bytecode, I find a strange library in the code - Googling, I find out that it is the encryption of a set of compilation time values ​​inside the APK-archive. This is usually necessary when the application uses constants of the form: IP addresses, credentials for an external database, tokens for authorization, etc. - what can be obtained with the help of reverse engineering of the application. True, the author clearly writes that this project is abandoned, they say, go away. This is getting interesting.

This library provides access to values ​​through the Java library, where the specific method is the key of interest to us. It only fuels my interest, and I begin to climb deeper.

In short, what does do and how does it work:

  • in the Gradle file of our project the keys and the corresponding values ​​are registered
  • all key values ​​will be automatically packed into a separate dynamic library (.so), which will be generated at compile time. Yes - yes, WILL be generated.
  • then these keys can be obtained from Java methods generated by
  • after creating the APK, the key names are hashed by MD5 (for greater security, of course)

Having found the dynamic library I need in the archive folder, I proceed to pick it. To begin with, as an experienced reverse (no), I try to start with a simple one - I decide to look at the section with constants and for interesting lines in an ELF-like binar. Unfortunately, users of the mac readelf out of the box is missing, and before the beginning we say the cherished:

brew install binuitls

And do not forget to write the path to / usr / local in PATH , because brew protects you from everything in a gentlemanly way ...

greadelf -p .rodata lib/arm64-v8a/ | head -n 15

We limit the output to the first 15 lines, otherwise this can lead to shock for an unprepared engineer.

In the lower addresses we notice suspicious lines. As I found out, studying the sources of, the keys and values ​​are placed in the usual std :: map: this gives little information, but we know that in the binar along with encrypted passwords there are also obfuscated keys.

How is encryption of values? Studying the source, I found that encryption occurs using AES - the standard symmetric encryption system. So, if there are encrypted values, then the key should be nearby ... Having studied for a while, I came across an issue in the same project with the provocative title “Insecure key storage: secrets are very easy to retreive”. In it, in fact, I found out that the key is stored in clear form in the binar, and found the decryption algorithm. In the example, the key was at the zero address, and although I understood that the compiler could put it in another place in the .rodata section of the binary file, I decided that this suspicious unit at the zero address is the key.

Attempt # 1: I proceed to decipher the values ​​and believe that the encryption key is the same one. Error. OpenSSL hints that something is not right. After reading the sources of a little, I understand that if the user does not specify a key during assembly, then the default key is used - .

Attempt # 2:Again a mistake. Hmm ... Is it really redefined by this constant? It’s quite simple to make a mistake: confusing code written in Gradle, with “gone” formatting. I check again. Everything seems to be so.

Instead of the keys are their MD5 hashes, and then I try to try my luck and open a service with rainbow tables. Voila - one of the keys is the word "password". There is no second. It gives us, of course, not much. Both of these keys are at addresses 240 and 2a2, respectively. In principle, recognizing them right away is easy - 32 characters (MD5).

I checked it all again and tried to do the decryption with all the other lines (which are in the lower addresses) as a key for decryption - everything is in vain.
So, there is some other secret key, the algorithm of actions seems to be correct. I throw this task aside and try not to bury myself.

Having rummaged a bit in the container signature algorithm, I still see calls to the library and code that also uses the cryptographic functions of the Java library.

A riddle (which I never solved)

In the function that is responsible for encryption, at the very beginning there is a check for keys in the container.

public byte[] a(java/util/Map p1) {
		v0 = p1.size()
		v1 = 0x0;
		if (v0 != 0) goto :cond_0
		p1 = new byte[v1];
		return p1;
		v0 = "user";
		v0 = p1.containsKey(v0)
		if (v0 == 0) goto :cond_1
		p1 = new byte[v1];
		return p1;

Literally: if there is a "user" key, then this container is not signed (a zero signature is returned). A strange feeling: it seems like the problem is solved, but it seems somehow suspiciously simple. Then why invent everything else? To lead astray? Then why haven't I studied this code fluently before? Hmm ...

No, that's not true. I specified the answer from a certain user in a blue messenger, whose contacts I was provided with when giving the assignment. Digging further. Perhaps the input key / value set somehow changes as it is added to the container? I read the code carefully.

Please note that the decompiler removed annotations from smali code. What if he removed something important? I check the main files - it seems, nothing significant. Everything important is in place, but the meaning is not lost. I check callback functions that are responsible for writing a key / value pair from conditional TextBox to internal containers. I did not find anything criminal.

I became as skeptical as possible about every line of code - I can no longer trust anyone.

Simple solution # 2: I noticed that the signing procedure begins by checking for the presence of some value (substring in the string) in the signature of the certificate with which the application was signed.

@OnClick // генерация сигнатуры
protected void huvot324yo873yvo837yvo() {
	String signature = "no data";
	boolean result = some_packages.isKeyInSignature(this);
	if result {
		Map map = new HashMap();

The meaning itself, of course, lies encrypted in that ill-fated binar. And actually, if this value is not in the signature, then the algorithm will not sign anything, but simply return the string “no data”, as the signature ... Again, we are taken for Cipher ...

Key decryption final fight

To understand the scale of the tragedy, I got confused like this:

I made a hex dump of this section and looked at the first two lines, the suspicions from which did not subside from the very beginning.

If you pay attention, the character that separates the lines here is '0x00'. It is also commonly used by the standard C library, in string functions. From that it is no less interesting, what kind of space character is in the middle of the first line? Next, crazy attempts begin, where the key is:

  • whole first row
  • first line before space
  • first line from space to the end
  • ...

The degree of paranoia can already be estimated. When you don’t understand how difficult and cunning the task should be, then you start to drive. And yet, not that. Then the thought came to my mind: “Does the algorithm work correctly from issue on my machine?”. In general, the sequence of actions there is logical and did not raise questions, but the question is: do the commands on my machine do what it requires of them? So what do you think?

Having checked all the steps manually, it turned out that

echo "some_base64_input" | openssl base64 -d 

on some input arguments it suddenly returns an empty string. Hmm.

Replacing it with the first base64 decoder on the machine, and sorting through the main candidates, a suitable key was immediately found, and the keys were decrypted accordingly.

Retrieving Signatures from a Certificate

class a {
public static boolean isKeyInSignature(android.content.Context p1) {
	v0 = 0x0;
	try TRY_0{
		v1 = p1.getPackageManager()
		p0 = p1.getPackageName()
		v2 = 0x40; // GET_SIGNATURES
		PackageInfo p0 = v1.getPackageInfo(p0, v2)[] p0 = p0.signatures; 
		// Order are not guaranteed
		v1 = p0.length;
		v2 = 0x0;
		if (v2 >= v1) goto :cond_1
		v3 = p0[v2];
		String v3 = v3.toCharsString()
		String v4 =
		v3 = v3.contains(v4)
	catch TRY_0 (android/content/pm/PackageManager$NameNotFoundException) goto :catch_0;
		if (v3 == 0) goto :cond_0
		p1 = 0x1;
		return p1;
		v2 = v2 + 0x1;
		goto :goto_0
		p0 = Thrown Exception
		return v0;

This is what the generated pseudocode looks like after my minor edits. Confuses a couple of things:

  • poor knowledge of cryptography and the "kitchen" of the device certificates
  • according to the documentation, this method does not guarantee the order of certificates in the returned collection, and accordingly, it would not be possible to loop around in the same order - what if the application was signed with more than one certificate?
  • lack of knowledge on how to extract the certificate from the APK, given that it is not clear what Android Runtime does in this case

I had to delve into all these issues and the result was as follows:

  • the certificate itself lies in the directory original / META-INF / CERT.RSA

    in this directory there is only one file with this extension - that means the application is signed with just one certificate
  • On the site about research engineering of Android applications, a listing was found that can extract the signature we need as Android does. According to the author, at least.

By running this code, I can figure out the signature, and in reality, the key we need is a substring. Move on. Simple Solution # 2 is being swept away.

Indeed, the key is in the certificate, it remains only to understand what is next, because if we have the “user” key, we all also get a zero signature, and as we learned above, this is the wrong answer.

Write the documentation carefully!

Further research into the fact that the data entered from the text fields is changed is discarded for lack of evidence. Paranoia rolls with renewed vigor: maybe the code that pulled the signature from the certificate is incorrect or is it a code implementation for old Android releases? I open the documentation again and see the following: ( () ):

Attention:The function encodes the signature as ASCII text. The output I received above was a hex representation of the data. This API seemed strange to me, but if you believe the documentation, it turns out that I was stalled again, and the encrypted key is not a substring of the signature. After sitting thoughtfully on the code for a while, I could not stand it and opened the source code for this class. The

answer was not long in coming. And actually, in the code itself - an oil painting: the output format is an ordinary hex-string. And now think: either I don’t understand something, or the documentation is written “slightly” incorrectly. Having scolded in no way, I set to work again.


The following n hours have passed:

  • checking the correctness of work in the code with RecyclerView and ascertaining its behavior through the source code since again, not all points are covered in detail in the dock and even on Stackoverflow
  • manual decompilation of the code fragment responsible for signing the collection into compiled Java. I took for the assumption that I still missed something and the first key in the container ("user") was implicitly dropped out of the collection. I decided to set the rest of the data on the code.

In general, this code refused to sign even the remaining arguments (further in the code when working with cryptography these arguments implicitly threw me from a distance).

Not. It turned out that you can’t sign this input. Unfortunately, I will not be able to pass this work and find out if it really is so. It's a pity. For a while it occupied my thoughts, but I reassured myself that I had done everything I could.

In fact, I spent a lot of time on this task, and at the same time on the restoration of knowledge gaps. It was really helpful. You can trace all the way and pay attention to how at first I clung to absolutely non-decision parts. Perhaps this will help someone to understand how beginners solve problems of this kind, because we usually read “success stories”, where all the steps are logical, consistent and lead to the right result.

If someone wants to try to dig deeper with this task a little more or ask a question - write to me in the blue arturbrsg messenger .

Stay tuned.

Also popular now: