Compiling nested classes: javac and ecj

    As you know, in the Java language there are nested classes declared inside another class. There are even four of them - static nested, internal (inner) , local (local) and anonymous (anonymous)(in this article we do not touch on lambda expressions that appeared in Java 8). All of them are united by one interesting feature: the Java virtual machine does not have a clue about the special status of these classes. From her point of view, these are ordinary classes located in the same package as the outer class. All the work of converting nested classes to regular ones rests with the compiler. And here it is interesting to see how different compilers deal with it. We will look at the behavior of javac 1.8.0.20 and the ecj compiler from Eclipse JDT Core 3.10 (comes bundled with Eclipse Luna).

    Here are the main problems associated with compiling nested classes:
    • Access rights;
    • Passing a reference to an object of an external class (irrelevant for static nested classes);
    • Passing local variables from an external context (similar to a closure).

    This article will talk about the first two issues.

    Access rights


    With access rights, a big hassle arises. We can declare a field or method of a nested class as private, and according to the Java specification, this field or method can still be accessed from an external class. You can and vice versa: refer to a private field or method of an external class from a nested one, or from another nested class to use another. However, from the point of view of the Java machine, accessing private members of another class is unacceptable. The same goes for access to protected members of the parent class located in another package. To get around this limitation, compilers create special access methods. They are all static, have package-private access, and are named starting with access $. Moreover, ecj calls them simply access $ 0, access $ 1, etc., and javac adds at least three digits, where the last two encode a specific operation (read = 00, write = 02), and the initial ones - a field or method. Access methods are required to read fields, write fields, and call methods.

    Access methods for reading fields have one parameter - an object, and methods for writing fields - two parameters (object and new value). At the same time, in ecj, recording methods return void, and in javac, a new value (second parameter). Take for example the following code:

    publicclassOuter{
      privateint a;
      staticclassNested{
        int b;
        voidmethod(Outer i){
          b = i.a;
          i.a = 5;
        }
      }
    }


    If you translate the bytecode generated by Javac back to Java, you get something like this:
    publicclassOuter{
      privateint a;
      staticint access$000(Outer obj) {
        return obj.a;
      }
      staticint access$002(Outer obj, int val) {
        return (obj.a = val);
      }
    }
    classOuter$Nested{
      int b;
      voidmethod(Outer i){
        b = Outer.access$000(i);
        Outer.access$002(i, 5);
      }
    }
    

    The ecj code is similar, only the methods are called access $ 0, access $ 1 and the second returns void. Everything will become much simpler if you remove the word private: then access methods are not required and the fields can be accessed directly.

    Interestingly, javac behaves smarter when incrementing the field. For example, compile this code:
    publicclassOuter{
      privateint a;
      staticclassNested{
        voidinc(Outer i){
          i.a++;
        }
      }
    }

    Javac will output something like the following:
    publicclassOuter{
      privateint a;
      staticint access$008(Outer obj) {
        return obj.a++;
      }
    }
    classOuter$Nested{
      voidinc(Outer i){
        Outer.access$008(i);
      }
    }

    A similar behavior is observed with decrement (the method name will end with 10), as well as with pre-increment and pre-increment (04 and 06). The ecj compiler in all these cases will first call the read method, then add or subtract one, and call the write method. If someone is interested in where the odd numbers went, they will be used with direct access to the protected fields of the parent of the outer class (for example, Outer.super.x = 2, I have no idea where this could come in handy!).

    By the way, it is curious that javac 1.7 behaved even smarter, generating special methods for any assignment operations of the type + =, << =, etc. (the right part was calculated and passed to the generated method as a separate parameter). A special method was generated even if you applied + = to an inaccessible string field. In javac 1.8, this functionality broke down, and it seems that by chance: the corresponding code is present in the source code of the compiler.

    If the programmer himself creates a method with the appropriate signature (for example, access $ 000, never do that!), Javac will refuse to compile the file with the message “the symbol (method) conflicts with a compiler-synthesized symbol in (class)”. The ecj compiler calmly transfers conflicts by simply increasing the counter until it finds a free method name.

    When trying to call an inaccessible method, an auxiliary static method is created that has the same parameters and return type, only an additional parameter is added to pass the object. A more interesting situation is the use of a private constructor. When constructing an object, you must call the constructor. Therefore, compilers generate a new private constructor that calls the desired private one. How to create a constructor that doesn’t conflict with existing ones by signature? Javac generates a new class for this purpose! Take this code:

    publicclassOuter{
      privateOuter(){}
      staticclassNested{
        voidcreate(){
          new Outer();
        }
      }
    }

    When compiling, not only Outer.class and Outer $ Nested.class will be created, but another Outer class $ 1.class. The code generated by the compiler looks something like this:
    publicclassOuter{
      privateOuter(){}
      Outer(Outer$1 ignore) {
        this();
      }
    }
    classOuter$1 {} // в этом классе нет вообще конструктора, даже приватного, его никак не инстанциироватьclassOuter$Nested{
      voidcreate(){
        new Outer((Outer$1)null);
      }
    }
    

    The solution is convenient in the sense that a conflict over the signature of the constructor will not be guaranteed. The ecj compiler decided to do without an extra class and add the same class with a dummy parameter:
    publicclassOuter{
      privateOuter(){}
      Outer(Outer ignore) {
        this();
      }
    }
    classOuter$Nested{
      voidcreate(){
        new Outer((Outer)null);
      }
    }
    

    In case of conflict with the existing constructor, new dummy parameters are added. For example, you have three constructors:
    privateOuter(){}
      privateOuter(Outer i1){}
      privateOuter(Outer i1, Outer i2){}
    

    If you use each of them from a nested class, ecj will create three new ones that will have three, four, and five Outer parameters.

    Passing a reference to an object of an external class


    Inner classes (including local and anonymous) are tied to a specific object of the outer class. To achieve this, the compiler adds a new final field (usually named this $ 0) to the inner class, which contains a reference to the surrounding class. In this case, a corresponding parameter is added to each constructor. If you take such a simple code:
    publicclassOuter{
      classNested{}
      voidtest(){
        new Nested();
      }
    }

    Compilers (here the behavior of ecj and javac is similar) will turn this code into something like this (I remind you that I manually restore it bytecode to make it clearer):
    publicclassOuter{
      voidtest(){
        new Outer$Nested(this);
      }
    }
    classOuter$Nested{
      final Outer this$0;
      Outer$Nested(Outer obj) {
        this.this$0 = obj;
        super();
      }
    }

    Curiously, the assignment of this $ 0 occurs before calling the constructor of the parent class. In normal Java code, you cannot assign a value to a field until the parent constructor is executed, but the bytecode does not prevent this. Thanks to this, if you override the method called by the constructor of the parent class, this $ 0 will already be initialized and you can easily access the fields and methods of the outer class.

    If you create a conflict by name, having a field called this $ 0 in the Nested class (never do this!), This will not confuse the compilers: they will name their internal field this $ 0 $.

    The Java language allows you to create an instance of an inner class not only on the basis of this, but also on the basis of another object of the same type:
    publicclassOuter{
      classNested{}
      voidtest(Outer other){
        other.new Nested();
      }
    }

    An interesting point arises here: after all, other may turn out to be null. For good, you should fall in this place with a NullPointerException. Usually the virtual machine itself ensures that you do not dereference null, but there will not actually be dereferencing here until you use the outer class inside the Nested object, which can happen much later or not at all. Compilers again have to get out: they insert a fake call, turning the code into something like this:
    publicclassOuter{
      voidtest(Outer other){
        other.getClass();
        new Outer$Nested(other);
      }
    }

    The call to getClass () is safe: it must succeed for any object and takes a little time. If it turned out that in other null, an exception will occur even before the creation of the Nested object.

    If the nesting level of classes is more than one, then new variables appear in the innermost ones: this $ 1 and so on. As an example, consider this:

    publicclassOuter{
      classNested{
        classSubNested{
          {test();}
        }
      }
      voidtest(){
        new Nested().new SubNested();
      }
    }

    Here javac will create something like this:

    publicclassOuter{
      voidtest(){
        Outer$Nested tmp = new Outer$Nested(this);
        tmp.getClass(); // явно излишне, но ладноnew Outer$Nested$SubNested(tmp);
      }
    }
    classOuter$Nested{
      final Outer this$0;
      Outer$Nested(Outer obj) {
        this.this$0 = obj;
        super();
      }
    }
    classOuter$Nested$SubNested{
      final Outer$Nested this$1;
      Outer$Nested$SubNested(Outer$Nested obj) {
        this.this$1 = obj;
        super();
        this.this$1.this$0.test();
      }
    }

    The call to getClass () could be removed, since we just created this object, but the compiler does not bother. But ecj generally unexpectedly generated an access method:

    classOuter$Nested{
      final Outer this$0;
      Outer$Nested(Outer obj) {
        this.this$0 = obj;
        super();
      }
      static Outer access$0(Outer$Nested obj) {
        return obj.this$0;
      }
    }
    classOuter$Nested$SubNested{
      final Outer$Nested this$1;
      Outer$Nested$SubNested(Outer$Nested obj) {
        this.this$1 = obj;
        super();
        Outer$Nested.access$0(obj).test();
      }
    }

    Very strange, given that this $ 0 does not have a private flag. On the other hand, ecj guessed to reuse the obj parameter instead of accessing the field this.this $ 1.

    findings


    Nested classes represent some headache for compilers. Do not disdain package-private access: in this case, the compiler will do without autogenerated methods. Of course, modern virtual machines almost always inline them, but still, the presence of these methods requires more memory, inflates the pool of class constants, lengthens stack traces and adds extra steps when debugging.

    Different compilers can generate very different code in similar situations: even the number of generated classes can vary. If you are writing tools for analyzing bytecode, you must consider the behavior of different compilers.

    Also popular now: