6 ways to hide data in the Android application
Hi, dear reader, I’ve been studying mobile apps for quite a while now. Most applications do not try to somehow hide their “secret” functionality from me. And at this time I am happy, because I do not have to study someone's obfuscated code.
In this article, I would like to share my vision of obfuscation, and also tell about an interesting method of hiding business logic in applications with the NDK, which I found relatively recently. So if you are interested in live examples of obfuscated code in Android - I ask for cat.
Under the obfuscation in the framework of this article we will mean the reduction of the executable code of the Android application to a difficult to analyze form. There are several reasons why it is difficult to analyze the code:
- No business wants to be picked in its "insides".
- Even if you have a dummy application, you can always find an interesting one (example with instagram ).
Many developers solve the problem with a simple fork of the ProGuard config. This is not the best way to protect data (if this is your first time hearing about this, see the wiki ).
I want to give a good example of why the alleged “protection” with ProGuard does not work. Take any simple example from Google Samples.
Having connected to it ProGuard with a standard config, we will receive the decompiled code:
“Oooh, nothing is clear” - we say calm down. But after a couple of minutes of switching between files, we find similar pieces of code:
In this example, the application code seems rather difficult (data logging, video capture creation), so some of the methods used in the original code are easily understood after processing by the ProGuard config.
Further more, take a look at the data classes in Kotlin. The default data class creates the “toString” method, which contains the names of the instance variables and the name of the class itself.
Source data class:
It can turn into a tasty morsel for reverser:
(autogeneration of the toString method in Kotlin)
It turns out that ProGuard hides far from the entire source code of the project.
If I still have not convinced you of the inexpediency of protecting the code in this way, then let's try to leave the “.source” attribute in our project.
-keepattributes SourceFile
This line is in many opensource projects. It allows you to view StackTrace when the application crashes. However, by pulling the “.source” from the smali code, we will get the whole project hierarchy with the full names of the classes.
By definition, obfuscation is “casting the source code in an unreadable form in order to counteract different types of recordings”. However, ProGuard (when used with a standard config) does not make the code unreadable - it works as a minifier, compressing the names and throwing out extra classes from the project.
This use of ProGuard is an easy, but not quite suitable for good obfuscation solution on the “chance”. A good developer needs to force the receiver (or an attacker) to be frightened by “Chinese characters” that are difficult to deobfuse.
If you are interested in learning more about ProGuard, then I suggest the following informative article .
What are we hiding
Now let's see what is usually hidden in applications.
- Encryption keys:
- The specific logic of the application:
In the code, something more unexpected can often be hidden (observations from personal experience), for example:
- Project developer names
- Full path to the project
- “Client_secret” for Oauth2 protocol
- PDF book "How to develop under Android" (probably, to always be at hand)
Now we know that we can hide in Android applications and can move on to the main point, namely how to hide this data.
Ways to hide data
Option 1: Do not hide anything, leave everything in sight
In that case, I'll just show you this picture :)
“Help Dasha find business logic.”
This is a cost-free and completely free solution suitable for:
- Simple applications that do not interact with the network and do not store sensitive user information;
- Applications that use only public API.
Option 2: Use ProGuard with the right settings.
This decision still has the right to life, because, first of all, it is simple and free. Despite the aforementioned drawbacks, it has a significant plus: if ProGuard rules are properly configured, the application can really become obfuscated.
However, you need to understand that such a solution after each build requires the developer to decompile and check whether everything is normal. After spending a few minutes studying the APK file, the developer (and his company) can become more confident in the safety of their product.
Check the application for obfuscation is quite simple.
In order to get the APK file from the project there are several ways:
- взять из директории проекта (в Android Studio обычно название папки “build”);
- установить приложение на смартфон и достать APK с помощью приложения “Apk Extractor”.
После этого, пользуясь утилитой Apktool, получаем Smali-код (инструкция по получению здесь https://ibotpeaches.github.io/Apktool/documentation) и пытаемся найти что-нибудь подозрительно читаемое в строках проекта. Кстати, для поиска читаемых кодов можно запастись уже заранее готовыми bash-командами.
This solution is suitable for:
- Applications of toys, applications online stores, etc .;
- Applications that are really thin clients, and all data arrives exclusively from the server side;
- Applications that do not write on all of their banners "Secure application number 1".
Option 3: Use Open Source Obfuscator
Unfortunately, I don’t know really good free obfuscators for mobile applications. And obfuscators that can be found on the network can bring you a lot of headaches, since it will be too difficult to build such a project for new API versions.
Historically, the existing cool obfuscators are made for machine code (for C / C ++). Good examples:
- Obfuscator-LLVM, https://github.com/obfuscator-llvm/obfuscator
- Movfuscator, https://github.com/xoreaxeaxeax/movfuscator
For example, Movfuscator replaces all opcodes mov-s, makes the code linear, removing all branching. However, it is highly recommended not to use such a method of obfuscation in a combat project, because then the code risks becoming very slow and heavy.
This solution is suitable for applications where the main part of the code is NDK.
Option 4: Use proprietary solution
This is the most competent choice for serious applications, as proprietary software:
a) is supported;
b) will always be relevant.
An example of obfuscated code when using such solutions:
In this code snippet you can see:
- The most incomprehensible variable names (with the presence of Russian letters);
- Chinese characters in the lines, not giving to understand what is really happening in the project;
- There are a lot of traps added to the project (“switch”, “goto”), which greatly change the codeflow of the application.
This solution is suitable for:
- Banks;
- Insurance companies;
- Mobile operators, applications for storing passwords, etc.
Option 5: Use React-Native
I decided to highlight this point, since the writing of cross-platform applications has now become a really popular activity.
In addition to a very large community, JS has a very large number of open obfuscators. For example, they can turn your application into emoticons:
I would really like to advise you this solution, but then your project will work a little faster than a turtle.
But, having reduced the requirement for obfuscation of a code, we can create a really well-protected project. So google “js obfuscator” and obfuscate our output bundle file.
This solution is suitable for those who are ready to write a cross-platform application on React Native.
Было бы очень интересно узнать про обфускаторы на Xamarin, если у вас есть опыт их использования – расскажите, пожалуйста, о нем в комментариях.
Option 6: Use NDK
I myself often had to use NDK in my code. And I know that some developers believe that using NDK saves their application from reversers. This is not quite true. First you need to understand exactly how concealment works with NDK.
It turns out very simple. In the code there is some JNI-agreement that when you call C / C ++ code in the project, it will be converted as follows.
Native class NativeSummator:
Implementation of the native sum method:
Implementation of the native static sum method:
It becomes clear that to call the native method, the function search Java_<package name>_<Static?><class>_<method>
in the dynamic library is used.
If you look at the Dalvik / ART code, we will find the following lines:
( source )
First, we will generate the next line from the Java object Java_<package name>_<class>_<method>
, and then we will try to split the method in the dynamic library using the “dlsym” call, which will try to find the function we need in the NDK.
This is how JNI works. Its main problem is that by decompiling the dynamic library, we will see all the methods at a glance:
So, we need to come up with a solution so that the address of the function is obfuscated.
At first, I tried to write data directly to our JNI table, but I realized that the ASLR mechanisms and different versions of Android simply wouldn't allow me to make this method work on all devices. Then I decided to find out what methods NDK provides developers.
And, about a miracle, there was a method “RegisterNatives”, which does exactly what we need (it calls the internal function dvmRegisterJNIMethod ).
We define an array describing our native method:
And register our declared method in the JNI_OnLoad function (the method is called after initializing the dynamic library, tyts ):
Hurray, we independently hid the “hideFunc” function. Now apply our favorite llvm-obfuscator and rejoice in the security code in the final form.
This solution is suitable for applications that already use NDK (connecting the NDK to a project involves a large number of difficulties, so for non-NDK applications this solution is not so relevant).
Conclusion
In fact, the application should not store any sensitive data, or it should be available only after user authentication. However, it happens that business logic forces developers to store tokens, keys and specific elements of the code logic inside the application. I hope this article will help you if you do not want to share such sensitive data and be an “open book” for writers.
I consider obfuscation to be an important structural part of any modern application.
Be thoughtful about code hiding questions and don’t look for easy ways! :)
By the way, thanks to miproblema user for help with some issues. Subscribe to her telegram channel, it's interesting there.
And also many thanks to the users of sverkunchik and SCaptainCAP for their help in editing the article.