MadHacker November 5, 2013 at 14:26

Hash lines at compile time with annotation

From the sandbox
Tutorial

Recently, I started developing an application for Android and I had the task of protecting it from reverse. A quick look at Google suggested that ProGuard, which is part of Android Studio, will cope with the task. The result really suited me with the exception of one small detail - the line.
The program exchanges information with the service using Intent. The key part of which is the action line. And if the line must have a certain format for interacting with the system or other applications, then its uniqueness is enough for exchanges within the application. For convenience, it is recommended that this line be composed of the package name and the action name. For instance:

public final class HandlerConst {
    public static final String ACTION_LOGIN = "com.example.app.ACTION_LOGIN";
}

This is convenient for debugging, but greatly reduces the quality of code obfuscation. I would like to see, for example, its MD5 hash instead of this line in the program release.

public final class HandlerConst {
    public static final String ACTION_LOGIN = "7f315954193d1fd99b017081ef8acdc3";
}

Under the cut, it is described how to achieve this behavior with the help of a bicycle at hand.

Some lyrics

I was very surprised to learn that ProGuard does not work with strings. From the documentation on the official website it was possible to find out that an advanced paid version is able to work with strings. It’s only that it encrypts the strings in order to decrypt them in the original version while the program is running. I could not find a solution to turn a string into its MD5 value.
Attempts to find a solution to this problem led me to an article demonstrating the wonders of optimizing C ++ compilers: Computing CRC32 lines in compile-time . But in Java, a similar method did not take off. ProGuard curtailed the methods quite a bit, but stumbled on getting an array of bytes from a string.
After that, I decided not to waste time trying to automate and just solve the problem with my hands:

public final class HandlerConst {
    public static final String ACTION_LOGIN;
    static {
        if (BuildConfig.DEBUG) ACTION_LOGIN = "com.example.app.ACTION_LOGIN";
        else ACTION_LOGIN = "7f315954193d1fd99b017081ef8acdc3";
    }
}

But when I saw the Custom Annotation Preprocessor article on the hub - creating an Android-based application and configuration in IntelliJ IDEA , I realized that this was the solution to my problem.

Implement annotation

The study of annotations according to tradition began with the lack of the necessary information in Russian. Most articles discuss the use of runtime annotations. However, a suitable article was found on Habré: Counting the execution time of a method through annotation .
To create compilation time annotations, we need:

Describe the annotation;
Implement the descendant of the AbstractProcessor class , which will process our annotation;
Tell the compiler where to look for our processor.

An annotation description might look like this:

package com.example.annotation;
@Target({ElementType.FIELD})
@Retention(RetentionPolicy.SOURCE)
public @interface Hashed {
    String method() default "MD5";
}

Target - defines objects for which the annotation is applicable. In this case, annotation can be applied to variable declarations in a class. Unfortunately, to anyone, but more on that later.
Retention - annotation lifetime. We indicate that it exists only in the source code.
In the annotation itself, we set up a field that defines the hashing method. The default is MD5.
This is enough to use annotation in the code, but there will be no sense in it until we write an annotation handler.

An annotation handler inherits from javax.annotation.processing.AbstractProcessor . The minimum handler class looks like this:

package com.example.annotation;
@SupportedAnnotationTypes(value = {"com.example.annotation.Hashed"})
@SupportedSourceVersion(SourceVersion.RELEASE_7)
public class HashedAnnotationProcessor extends AbstractProcessor {
    @Override
    public boolean process(Set annotations, RoundEnvironment roundEnv) {
        return false;
    }
}

SupportedAnnotationTypes - defines the names of annotation classes that will be processed by our processor.
SupportedSourceVersion - a supported version of the source. The point is that the processor does not break the language constructs that appeared in newer versions of the language when processing annotations.
Instead of these annotations, you can override the getSupportedAnnotationTypes and getSupportedSourceVersion methods .
The method of processGets a list of raw supported annotations and a compiler interaction object. If the method returns false, the compiler passes the annotation for processing to the next processor, which supports this type of annotation. If the method returned the truth, the annotation is considered processed and will not get anywhere else. This must be taken into account so as not to accidentally nail other people's annotations.
If during the operation of any processor the source codes have changed or added - the compiler will go to the next pass. RoundEnvironment

will not be enough for us to change the source code, so we override the init method and get JavacProcessingEnvironment from it. This class allows you to access source codes, a system for throwing warnings and compilation errors, and much more. There we get TreeMaker - an auxiliary tool for changing source codes.

    private JavacProcessingEnvironment javacProcessingEnv;
    private TreeMaker maker;
    @Override
    public void init(ProcessingEnvironment procEnv) {
        super.init(procEnv);
        this.javacProcessingEnv = (JavacProcessingEnvironment) procEnv;
        this.maker = TreeMaker.instance(javacProcessingEnv.getContext());
    }

Now we just have to iterate over our annotated fields and replace the values of string constants. I give the code in abbreviation. Link to GitHub at the end of the article.

 @Override
    public boolean process(Set annotations, RoundEnvironment roundEnv) {
        if ( annotations == null || annotations.isEmpty()) {
            return false;
        }
        for (TypeElement annotation : annotations)
        {
            // Выбираем все элементы, у которых стоит наша аннотация
            final Set fields = roundEnv.getElementsAnnotatedWith(annotation);
            JavacElements utils = javacProcessingEnv.getElementUtils();
            for (final Element field : fields) {
                //Получаем аннотацию, потом возьмём из неё метод хеширования.
                Hashed hashed = field.getAnnotation(Hashed.class);
                //преобразовываем аннотированный элемент в дерево
                JCTree blockNode = utils.getTree(field);
                if (blockNode instanceof JCTree.JCVariableDecl) {
                    //Помним, что поле может оказаться не только строковым.
                    JCTree.JCVariableDecl var = (JCTree.JCVariableDecl) blockNode;    
                    //получаем инициализатор (то что после знака = )
                    JCTree.JCExpression initializer = var.getInitializer();
                    //Проверка отсечёт поля с инициализацией в конструкторе, а так же конструкции вида:
                    // "" + 1
                    // new String("new string")
                    if ((initializer != null) && (initializer  instanceof JCTree.JCLiteral)){
                        JCTree.JCLiteral lit = (JCTree.JCLiteral) initializer;
                        //получаем строку
                        String value = lit.getValue().toString();
                        try {
                            MessageDigest md = MessageDigest.getInstance(hashed.method());
                            //Для однообразия на разных платформах задаём локаль.
                            md.update(value.getBytes("UTF-8"));
                            byte[] hash = md.digest();
                            StringBuilder str = new StringBuilder(hash.length * 2);
                            for (byte val : hash) {
                                str.append(String.format("%02X", val & 0xFF));
                            }
                            value = str.toString();
                            lit = maker.Literal(value);
                            var.init = lit;
                        } catch (NoSuchAlgorithmException e) {
                            //ошибка компиляции: неверный алгоритм хеширования
                        } catch (UnsupportedEncodingException e) {
                            //ошибка компиляции: такое вообще возможно??
                        }
                    }else{
                        //Ошибка компиляции: неверное применение аннотации.
                    }
                }
            }
        }
    }

In the method, we run through the list of annotations (do we remember that in the general case the processor processes more than one annotation?), For each annotation we select a list of elements. After that, magic begins. We use the tools provided by com.sun.tools.javac to convert elements to a source tree, which has a huge number of features and, by tradition, the complete absence of Russian-language documentation. Therefore, please do not be surprised that the code for working with this tree is far from ideal.
When we received the variable declaration in the form of the JCTree.JCVariableDecl var tree - we can make sure that it is a string variable. In my case, this check is carried out with a crutch:

if (!"String".equals(var.vartype.toString())){
     //Ошибка компиляции: аннотация применима только к строкам.
    continue;
}

vartype is a type of field that can certainly be compared with some constant or determine whether it belongs to a particular class, but, as I said, there is no documentation, and a quick check showed that casting to the string gives us the type name.

The second interesting point is that we can only process lines similar to the example from the very beginning of the article. The thing is that at this stage we are working with the source text. Therefore, if the variable is initialized in the constructor, then JCTree.JCExpression initializer = var.getInitializer (); will return us null . No less unpleasant situation will turn out if we try to process constructions of the form:

public String demo1 = new String("habrahabr");
public String demo2 = "habra"+"habr";
public String demo3 = "" + 1;

To do this, a second check is introduced (initializer instanceof JCTree.JCLiteral) . This will cut off all the described examples, since they are not literals in their pure form and in the tree will be represented by an expression of several elements.
Further code is obvious. We take a line, hash, replace, rejoice? Not.
Comments indicate several places where obvious errors occur. And in our case, ignoring them is not the correct behavior. In order to inform the user about the error, we need the javax.annotation.processing.Messager object . It allows you to throw a warning, a compilation error, or just an informational message. For example, we may report an invalid hash algorithm:

catch (NoSuchAlgorithmException e) {
    javacProcessingEnv.getMessager().printMessage(Diagnostic.Kind.ERROR,
        String.format("Unsupported digest method %s", hashed.method()),
        field);
}

It should be understood that throwing an error message does not interrupt the execution of the method. The compiler will wait at least for the end of our method before interrupting the compilation process. This allows you to immediately throw away all the errors of applying annotations to the user. The third argument to the printMessage method allows us to specify the object on which we stumbled. It is not mandatory, but greatly facilitates life.

Connect Annotation Processor

It remains to tell the compiler that we are and are ready to accept annotations for tearing. Many articles come up with instructions on how to add your processor to <development environment name>. Apparently, this goes back to ancient times, when such things were done on the knees by craftsmen. However, quite a long time ago, the mechanism for processing annotations has been part of javac and, in fact, our class handler is a plug-in for javac. This means that we can connect our annotation to any environment without shamanism with settings using quite standard means.
We need to create a subdirectory of services in the META-INF directory , and in it the javax.annotation.processing.Processor file. In the file itself, we need to place a list of our processor classes. In the specific case, com.example.annotation.HashedAnnotationProcessor . And that’s it. Now we are building our library containing the annotation and its processor. We connect this library to the project. And it works.
At the same time, neither the library itself nor the remnants of the annotations will fall into the compiled code.

Using

Abstract is ready. Strings are hashed. That's just the problem is still not solved.
If we connect the annotation to the project in this form, our lines will always be hashed. And we need only in the release.
In Java, the concept of debugging and release builds is very arbitrary and depends on the user's views. Therefore, we ensure that the assembleDebug task for the Android project does not hash the string, and in all other cases MD5 hashes remain from the strings.
To solve this problem, we will pass an additional parameter to our annotation processor.
First, modify the processor:

@SupportedOptions({"Hashed"})
public class HashedAnnotationProcessor extends AbstractProcessor {
    private boolean enable = true;
    @Override
    public void init(ProcessingEnvironment procEnv) {
        //Добавленный код
        java.util.Map opt = javacProcessingEnv.getOptions();
        if (opt.containsKey(ENABLE_OPTIONS_NAME) && opt.get(ENABLE_OPTIONS_NAME).equals("disable")){
            enable = false;
        }
    }
    @Override
    public boolean process(Set annotations, RoundEnvironment roundEnv) {
        if (!enable){
            javacProcessingEnv.getMessager().printMessage(Diagnostic.Kind.NOTE,
                    "Annotation Hashed is disable");
            return false;
        }
        //...
    }
}

We announced that we are waiting for the “Hashed” option and if it is “disable”, then we do nothing and display the information to the user. Messages like Diagnostic.Kind.NOTE are informational and at default settings, many development environments will not show these messages at all.
At the same time, we inform the compiler that they did not begin to process the annotation. If the system still has processors that process annotations of this type, or do not parse the type at all, they can get our annotation. True, I can say absolutely nothing about the order in which the compiler will try to dispose of the annotation. So far, we only have our library and exactly one annotation - this is not relevant, but if you use several libraries of annotations, be prepared for surfacing pitfalls.
It remains to pass this option to the compiler. Options for processors are passed to the compiler with the -A switch. In our case, "-AHashed = disable".
It remains only to pull Gradle to pass this option at the right time. And crutches again:

tasks.withType(JavaCompile) {
      if (name == "compileDebug"){
          options.compilerArgs << "-AHashed=disable"
      }
}

This is for the current version of Android Studio. For earlier tasks.withType (Compile).
Crutch, because this block is called for each type of assembly, regardless of the task. In theory, there should be something similar to buildTypes from the android block, but I already had no strength to look for a beautiful solution. After all, everyone has already guessed that there is traditionally no documentation in Russian?
In the code, annotations may look like this:

    @Hashed 
    public static final String demo1 = "habr";
    @Hashed (method="SHA-1")
    public static final String demo2 = "habrahabr";
    @Hashed(method="SHA-256")
    public static final String demo3 = "habracadabra";

The method can be any of the supported MessageDigest .

Total

The problem is solved. Of course, only for one very specific way of declaring constants, of course, not in the most efficient way, and for many the problem statement itself will raise more questions than the material in the article. And I just hope that someone will spend less time and nerves if a similar problem meets in his way.
But even more I hope that someone will be interested in this topic and the Habrr will see articles that will explain why all this magic works.
And, of course, the promised code: GitHub :: DemoAnnotation

Tags: