String concatenation, or Patch bytecode

I recently read an article about optimizing the performance of Java code — in particular, string concatenation. It left the question - why when using StringBuilder in the code under the cut the program runs slower than with simple addition. At the same time, + = when compiled, they turn into calls to StringBuilder.append ().

I immediately had a desire to understand the problem.

// ~20 000 000 операций в секундуpublic String stringAppend(){
    String s = "foo";
    s += ", bar";
    s += ", baz";
    s += ", qux";
    s += ", bar";
    s += ", bar";
    s += ", bar";
    s += ", bar";
    s += ", bar";
    s += ", bar";
    s += ", baz";
    s += ", qux";
    s += ", baz";
    s += ", qux";
    s += ", baz";
    s += ", qux";
    s += ", baz";
    s += ", qux";
    s += ", baz";
    s += ", qux";
    s += ", baz";
    s += ", qux";
    return s;
}
// ~7 000 000 операций в секундуpublic String stringAppendBuilder(){
    StringBuilder sb = new StringBuilder();
    sb.append("foo");
    sb.append(", bar");
    sb.append(", bar");
    sb.append(", baz");
    sb.append(", qux");
    sb.append(", baz");
    sb.append(", qux");
    sb.append(", baz");
    sb.append(", qux");
    sb.append(", baz");
    sb.append(", qux");
    sb.append(", baz");
    sb.append(", qux");
    sb.append(", baz");
    sb.append(", qux");
    sb.append(", baz");
    sb.append(", qux");
    sb.append(", baz");
    sb.append(", qux");
    sb.append(", baz");
    sb.append(", qux");
    sb.append(", baz");
    sb.append(", qux");
    return sb.toString();
}

Then all my reasoning came down to the fact that this is inexplicable magic inside the JVM, and I gave up trying to realize what was happening. However, during the next discussion of the differences in platforms in the speed of working with strings, we and a friend of yegorf1 decided to figure out why and how exactly this magic happens.

Oracle Java SE


upd: tests were conducted in Java 8 The
obvious solution is to compile the sources into bytecode, and then see its contents. So we did. In the comments there were suggestions that acceleration is associated with optimization - constant strings should obviously be glued together at the compilation level. It turned out that this is not the case. I will give a part of the bytecode decompiled using javap:

public java.lang.String stringAppend();
    Code:
       0: ldc           #2// String foo2: astore_1
       3: new           #3// class java/lang/StringBuilder6: dup
       7: invokespecial #4// Method java/lang/StringBuilder."<init>":()V10: aload_1
      11: invokevirtual #5// Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;14: ldc           #6// String , bar16: invokevirtual #5// Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;

You may notice that no optimizations have been made. Strange isn't it? Okay, let's see the second function bytecode.

public java.lang.String stringAppendBuilder();
    Code:
       0: new           #3// class java/lang/StringBuilder3: dup
       4: invokespecial #4// Method java/lang/StringBuilder."<init>":()V7: astore_1
       8: aload_1
       9: ldc           #2// String foo11: invokevirtual #5// Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;14: pop
      15: aload_1
      16: ldc           #6// String , bar18: invokevirtual #5// Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;

Here again, no optimizations? Moreover, let's look at the instructions on 8, 14, and 15 bytes. A strange thing happens there - first, a reference to an object of the StringBuilder class is loaded onto the stack, then it is thrown from the stack and loaded again. The simplest solution comes to mind:

public java.lang.String stringAppendBuilder();
    Code:
       0: new           #41// class java/lang/StringBuilder3: dup
       4: invokespecial #4// Method java/lang/StringBuilder."<init>":()V7: astore_1
       8: aload_1
       9: ldc           #2// String foo11: invokevirtual #5// Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;14: ldc           #6// String , bar16: invokevirtual #5// Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;

Throwing out extra instructions, we get a code that works 1.5 times faster than the stringAppend version, in which this optimization has already been carried out. Thus, the culprit of "magic" is the unfinished bytecode compiler, which cannot perform fairly simple optimizations.

Android ART


upd: the code was built under sdk 28 by re-releasing buildtools
So, it turned out that the problem is related to the implementation of the Java compiler in bytecode for the stack JVM. Here we remembered the existence of ART, which is part of the Android Open Source Project . This virtual machine, or rather, the bytecode compiler in the native code, was written in the terms of the suit from Oracle, which gives us every reason to believe that the differences from the implementation of Oracle are significant. In addition, due to the specifics of ARM processors, this virtual machine is a register one, not a stack one.

Let's take a look at Smali (one of the bytecode representations under ART):

# virtual methods
.method public stringAppend()Ljava/lang/String;
    .registers 4
    .prologue
    .line 6const-string/jumbo v0, "foo"
    .line 7
    .local v0, "s":Ljava/lang/String;
    new-instance v1, Ljava/lang/StringBuilder;
    invoke-direct {v1}, Ljava/lang/StringBuilder;-><init>()V
    invoke-virtual {v1, v0}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    move-result-object v1
    const-string/jumbo v2, ", bar"
    invoke-virtual {v1, v2}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    move-result-object v1
//...
.method public stringAppendBuilder()Ljava/lang/String;
    .registers 3
    .prologue
    .line 13
    new-instance v0, Ljava/lang/StringBuilder;
    invoke-direct {v0}, Ljava/lang/StringBuilder;-><init>()V
    .line 14
    .local v0, "sb":Ljava/lang/StringBuilder;
    const-string/jumbo v1, "foo"
    invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    .line 15const-string/jumbo v1, ", bar"
    invoke-virtual {v0, v1}, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
//...

In this variant of stringAppendBuilder there are no more problems with the stack - the machine is register-based, and they cannot arise in principle. However, this does not interfere with the existence of absolutely magical things:

move-result-object v1

This string in stringAppend does nothing - the reference to the StringBuilder object we need is already in the v1 register. It would be logical to assume that it is stringAppend that will work slower. This is confirmed empirically - the result is similar to the result of the “patched” version of the program for the stack JVM: StringBuilder works almost one and a half times faster.

Also popular now: