Java bytecode "Hello world"

    There is already an article on the hub about java bytecode . I decided to supplement it a bit and to the best of my ability to develop the topic. It seems to me pretty logical to parse a simple Java application. And what could be simpler than “Hello world”?
    For my experiment, I created the src directory, where I put the App.java file in the hello folder:

    package hello;
    public class App {
        public static void main(String[] args) {
            System.out.println("Hello world!");
        }
    }
    




    We copy the file with the command:

    javac src / hello / App.java -d classes /

    At the output in the classes folder, I got the file App.class First, let's compare the sizes of java and class files.

    App.java 139B
    App.class 418B

    This was unexpected. For some reason, it seemed to me that the compiled file should be smaller. I will try to open the class file:

    hexdump App.class

    0000000 ca fe ba be 00 00 00 34 00 1d 0a 00 06 00 0f 09
    0000010 00 10 00 11 08 00 12 0a 00 13 00 14 07 00 15 07
    0000020 00 16 01 00 06 3c 69 6e 69 74 3e 01 00 03 28 29
    0000030 56 01 00 04 43 6f 64 65 01 00 0f 4c 69 6e 65 4e
    0000040 75 6d 62 65 72 54 61 62 6c 65 01 00 04 6d 61 69
    0000050 6e 01 00 16 28 5b 4c 6a 61 76 61 2f 6c 61 6e 67
    0000060 2f 53 74 72 69 6e 67 3b 29 56 01 00 0a 53 6f 75
    0000070 72 63 65 46 69 6c 65 01 00 08 41 70 70 2e 6a 61
    0000080 76 61 0c 00 07 00 08 07 00 17 0c 00 18 00 19 01
    0000090 00 0c 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 07 00
    00000a0 1a 0c 00 1b 00 1c 01 00 09 68 65 6c 6c 6f 2f 41
    00000b0 70 70 01 00 10 6a 61 76 61 2f 6c 61 6e 67 2f 4f
    00000c0 62 6a 65 63 74 01 00 10 6a 61 76 61 2f 6c 61 6e
    00000d0 67 2f 53 79 73 74 65 6d 01 00 03 6f 75 74 01 00
    00000e0 15 4c 6a 61 76 61 2f 69 6f 2f 50 72 69 6e 74 53
    00000f0 74 72 65 61 6d 3b 01 00 13 6a 61 76 61 2f 69 6f
    0000100 2f 50 72 69 6e 74 53 74 72 65 61 6d 01 00 07 70
    0000110 72 69 6e 74 6c 6e 01 00 15 28 4c 6a 61 76 61 2f
    0000120 6c 61 6e 67 2f 53 74 72 69 6e 67 3b 29 56 00 21
    0000130 00 05 00 06 00 00 00 00 00 02 00 01 00 07 00 08
    0000140 00 01 00 09 00 00 00 1d 00 01 00 01 00 00 00 05
    0000150 2a b7 00 01 b1 00 00 00 01 00 0a 00 00 00 06 00
    0000160 01 00 00 00 03 00 09 00 0b 00 0c 00 01 00 09 00
    0000170 00 00 25 00 02 00 01 00 00 00 09 b2 00 02 12 03
    0000180 b6 00 04 b1 00 00 00 01 00 0a 00 00 00 0a 00 02
    0000190 00 00 00 06 00 08 00 07 00 01 00 0d 00 00 00 02
    00001a0 00 0e
    00001a2
    


    Quite an unusual appearance for Java code. Let's try to understand what is encoded here using the description of the class file format .

    ca fe ba be 
    


    This is 4 bytes for magic, which defines the file format.

    00 00 
    

    minor version - The minor version as the name suggests

    00 34 
    

    major version - 2 bytes for the major version.
    The combination of minor and major version suggests that I compiled this code using J2SE 8.

    00 1d 
    

    These two bytes represent constant_pool_count and are responsible for the size of constant_pool. In my case, count is 29, and the size of the pool is 28, respectively. Next come the elements of the form:

    cp_info {
    u1 tag; // 1 byte per tag
    u1 info []; // array with description
    }

    Consider the elements in constant_pool.

    1st element:

    0a 
    

    This tag corresponds to CONSTANT_Methodref, and so there should be a description further:

    CONSTANT_Methodref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
    }
    respectively:
    00 06 
    

    class_index, points to 6 element in constant_pool
    00 0f 
    

    name_and_type_index, points to the 15th element in constant_pool

    It is not yet clear which method this link points to and we go further:

    2nd element:

    09 
    

    This is CONSTANT_Fieldref, which means we are looking further:

    CONSTANT_Fieldref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
    }

    And here everything is very similar to the previous element, although it is not clear what this field is, in my class I didn’t declare anything like that.
    00 10 
    

    class_index in 16 element
    00 11 
    

    name_and_type_index in 17

    elements 3rd element:
    08
    

    tag for CONSTANT_String

    And by:

    CONSTANT_String_info {
        u1 tag;
        u2 string_index;
    }
    


    we get that the most interesting lies in the 18th element:
    00 12 
    


    4th element:
    0a 
    

    Tag corresponding to the method reference:
    whose class is described in 19 elements
    00 13  
    

    a name and type in 20 elements:
    00 14  
    


    5th element:
    Tag for CONSTANT_Class
    07 
    

    name in 21 elements
    00 15 
    


    6th element:
    CONSTANT_Class again
    07 
    

    with a title in 22 elements
    00 16
    

    As we recall, the 1st element constant_pool belongs to this class.

    7th element:
    tag, CONSTANT_Utf8, first line
    01 
    

    It must match:

    CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes [length];
    }

    Then the length of our line is 6 bytes:
    00 06 
    

    And the meaning is "":
    3c 69 6e 69 74 3e 
    

    This is a special name, so designers are labeled .

    8th element:

    CONSTANT_Utf8
    01 
    


    line of length 3 - "() V":

    00 03 
    28 29 56
    


    This is the description of our constructor without parameters, which was mentioned in the seventh element.

    9th element:
    CONSTANT_Utf8
    01 
    


    Line "Code":

    00 04 
    43 6f 64 65 
    


    10th item:
    LineNumberTable string
    01 
    00 0f 
    4c 69 6e 65 4e 75 6d 62 65 72 54 61 62 6c 65
    


    11th element
    "main":
    01 
    00 04 
    6d 61 69 6e 
    


    12th element
    "([Ljava / lang / String;) V"
    01 
    00 16 
    28 5b 4c 6a 61 76 61 2f 6c 61 6e 67 2f 53 74 72 69 6e 67 3b 29 56 
    


    13th element
    "SourceFile"
    01 
    00 0a 
    53 6f 75 72 63 65 46 69 6c 65 
    


    The 14th element of
    "App.java":
    01 
    00 08 
    41 70 70 2e 6a 61 76 61 
    


    15th
    Tag element , corresponding to CONSTANT_NameAndType
    0c  
    


    which means we need

    CONSTANT_NameAndType_info {
    u1 tag;
    u2 name_index;
    u2 descriptor_index;
    }

    and then:
    reference to the 7th element
    00 07 
    

    8 element link
    00 08  
    


    Given that the first element referred to this, we can conclude that the first constructor of the class was declared without parameters. The name of the class we must find in 22 elements.

    16th element:
    Tag, for CONSTANT_Class
    07 
    

    with a name in 23 elements
    00 17
    


    17th element:
    Tag, CONSTANT_NameAndType, with reference to 24 and 25 constant_pool element
    0c  
    00 18 
    00 19 
    


    18th Element:
    Cheers “Hello world!”
    01
    00 0c 
    48 65 6c 6c 6f 20 77 6f 72 6c 64 21 
    


    19th element:
    Tag, for CONSTANT_class with a name in the 25th element
    07 
    00 1a 
    


    20th element:
    Tag CONSTANT_NameAndType with a link to the 27th and 28th elements
    0c 
    00 1b 
    00 1c 
    


    21st element:
    "hello / App"
    01 
    00 09 
    68 65 6c 6c 6f 2f 41 70 70 
    


    22nd element:
    "java / lang / Object"
    01 
    00 10 
    6a 61 76 61 2f 6c 61 6e 67 2f 4f 62 6a 65 63 74
    


    23rd element:
    "java / lang / System"
    01
    00 10
    6a 61 76 61 2f 6c 61 6e 67 2f 53 79 73 74 65 6d 
    


    24th element:
    "out"
    01
    00 03
    6f 75 74 
    


    25th element:
    "Ljava / io / PrintStream;"
    01
    00 15 
    4c 6a 61 76 61 2f 69 6f 2f 50 72 69 6e 74 53 74 72 65 61 6d 3b
    


    26th element:
    "java / io / PrintStream"
    01
    00 13
    6a 61 76 61 2f 69 6f 2f 50 72 69 6e 74 53 74 72 65 61 6d 
    


    27th element:
    "println"
    01
    00 07
    70 72 69 6e 74 6c 6e 
    


    28th element:
    "(Ljava / lang / String;) V"
    01 
    00 15 
    28 4c 6a 61 76 61 2f 6c 61 6e 67 2f 53 74 72 69 6e 67 3b 29 56 
    


    This is where the constant_pool table ends. Next come
    access_flags docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.1-200-E.1
    00 21 
    

    this_class
    00 05 
    

    super_class
    00 06
    

    00 00 // interfaces_count
    00 00 // fields_count
    

    methods_count we have 2 methods in the class, the default constructor and the main method:
    00 02 
    

    Method 1 - Constructor
    00 01 - access_flags
    00 07 - name_index
    00 08 - descriptor_index
    00 01 - attributes_count
    

    Attribute 1
    00 09 // name_index (Code)
    00 00 00 1d // attribute_length 
    00 01 // max_stack 
    00 01 // max_locals 
    00 00 00 05  // code_length
    

    One of the most interesting attributes with the code of our method code [code_length], parsing instructions is a separate big topic:
    2a  // aload_0 
    b7 00 01 // invokespecial (Запустить первый метод в constant_pool)
    b1 // return
    


    The attribute has ended and the method description continues.
    00 00 // exception_table_length 
    00 01 // attributes_count 
    00 0a // attribute_name_index (LineNumberTable - 10 элемент)
    00 00 00 06 // attribute_length
    00 01 // line_number_table_length
    00 00 // start_pc
    00 03 // line_number
    


    Method 2 - main
    00 09 // access_flags 
    00 0b // name_index
    00 0c // descriptor_index
    00 01 // atributes_count
    

    Attribute 1 method code main
    00 09  // name_index (Code)
    00 00 00 25 // attribute_length
    00 02 // max_stack
    00 01 // max_locals 
    00 00 00 09  // code_length
    


    code [code_length]
    b2 00 02 // getstatic 2, распутывая цепочку будет java.lang.System
    12 03 // ldc 3
    b6 00 04 // invokevirtual 4  
    b1 // return
    


    00 00 // exception_table_length 
    00 01 // attributes_count
    00 0a // attribute_name_index (LineNumberTable - 10 элент) 
    00 00 00 0a // attribute_length
    00 02 // line_nuber_table_length
    00 00 // start_pc
    00 06 // line_number
    00 08 // start_pc
    00 07 // line_number
    


    The description of the methods is completed and the description of the class attributes
    00 01 // attributes_count
    00 0d // name_index (SourceFile)
    00 00 00 02 // attributes_length
    00 0e // sourcefile_index(App.java)
    


    Now that we are done with a byte-by-bit parsing of the class file, it becomes clear how it works:

    javap -c -s -verbose classes / hello / App.class

    It automatically displays the same thing that I wrote out with my hands:

    Classfile /.../classes/hello/App.class
      Last modified Aug 14, 2015; size 418 bytes
      MD5 checksum e9d96126a9f5bbd95f154f1a40d46b53
      Compiled from "App.java"
    public class hello.App
      minor version: 0
      major version: 52
      flags: ACC_PUBLIC, ACC_SUPER
    Constant pool:
       #1 = Methodref          #6.#15         // java/lang/Object."":()V
       #2 = Fieldref           #16.#17        // java/lang/System.out:Ljava/io/PrintStream;
       #3 = String             #18            // Hello world!
       #4 = Methodref          #19.#20        // java/io/PrintStream.println:(Ljava/lang/String;)V
       #5 = Class              #21            // hello/App
       #6 = Class              #22            // java/lang/Object
       #7 = Utf8               
       #8 = Utf8               ()V
       #9 = Utf8               Code
      #10 = Utf8               LineNumberTable
      #11 = Utf8               main
      #12 = Utf8               ([Ljava/lang/String;)V
      #13 = Utf8               SourceFile
      #14 = Utf8               App.java
      #15 = NameAndType        #7:#8          // "":()V
      #16 = Class              #23            // java/lang/System
      #17 = NameAndType        #24:#25        // out:Ljava/io/PrintStream;
      #18 = Utf8               Hello world!
      #19 = Class              #26            // java/io/PrintStream
      #20 = NameAndType        #27:#28        // println:(Ljava/lang/String;)V
      #21 = Utf8               hello/App
      #22 = Utf8               java/lang/Object
      #23 = Utf8               java/lang/System
      #24 = Utf8               out
      #25 = Utf8               Ljava/io/PrintStream;
      #26 = Utf8               java/io/PrintStream
      #27 = Utf8               println
      #28 = Utf8               (Ljava/lang/String;)V
    {
      public hello.App();
        descriptor: ()V
        flags: ACC_PUBLIC
        Code:
          stack=1, locals=1, args_size=1
             0: aload_0
             1: invokespecial #1                  // Method java/lang/Object."":()V
             4: return
          LineNumberTable:
            line 3: 0
      public static void main(java.lang.String[]);
        descriptor: ([Ljava/lang/String;)V
        flags: ACC_PUBLIC, ACC_STATIC
        Code:
          stack=2, locals=1, args_size=1
             0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
             3: ldc           #3                  // String Hello world!
             5: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
             8: return
          LineNumberTable:
            line 6: 0
            line 7: 8
    }
    SourceFile: "App.java"
    


    And here you can see an example of parsing the class file :

    ClassFile(InputStream in, Attribute.Factory attributeFactory) throws IOException, ConstantPoolException {
            ClassReader cr = new ClassReader(this, in, attributeFactory);
            magic = cr.readInt();
            minor_version = cr.readUnsignedShort();
            major_version = cr.readUnsignedShort();
            constant_pool = new ConstantPool(cr);
            access_flags = new AccessFlags(cr);
            this_class = cr.readUnsignedShort();
            super_class = cr.readUnsignedShort();
            int interfaces_count = cr.readUnsignedShort();
            interfaces = new int[interfaces_count];
            for (int i = 0; i < interfaces_count; i++)
                interfaces[i] = cr.readUnsignedShort();
            int fields_count = cr.readUnsignedShort();
            fields = new Field[fields_count];
            for (int i = 0; i < fields_count; i++)
                fields[i] = new Field(cr);
            int methods_count = cr.readUnsignedShort();
            methods = new Method[methods_count];
            for (int i = 0; i < methods_count; i++)
                methods[i] = new Method(cr);
            attributes = new Attributes(cr);
        }
    

    Also popular now: