Analysis of performance tasks with JBreak (part 4)

    Analysis of the last fourth task:

        public double octaPow(double a) {
            return Math.pow(a, 8);
        }
        public double octaPow(double a) {
            return a * a * a * a * a * a * a * a;
        }
        public double octaPow(double a) {
            return Math.pow(Math.pow(Math.pow(a, 2), 2), 2);
        }
        public double octaPow(double a) {
            a *= a; a *= a; return a * a;
        }

    Condition (simplified):
    Determine which methods are fast and which are slow (JRE 1.8.0_161).
    Under the cut benchmarks, pieces of assembler and analysis of optimizations from the JVM.

    Other publications in the series: Part 1 , Part 2 and Part 3 .

    Commentary on the task


    As you know, floating-point operations are notorious:

    1. Complex and implementation dependent.
    2. Not associative.
    3. They give illogical results.
    4. Comparing the results on ==in most cases does not make sense.

    In this regard, it is important to understand that the proposed methods can give different results of calculations, but not in terms of performance, but in the arithmetic sense .

    A couple of examples
        public static void main(String[] args) {
            double value = 1e15;
            double delta = 0.0001;
            System.out.println(value + delta == value); // true
            double a = 1.010101;
            double b = 101.0101;
            double c = 10101.01;
            System.out.println((a * b) * c != a * (b * c)); // true
        }


    Obvious wrong answers


    There were four types of algorithms in this task, therefore, there are more potential answers:
    All options are the same, because Java has a cool JIT compiler!/* Славный ответ */
    The second or fourth option is the fastest, because it is a simple multiplication.

    Detailed analysis of the investigated methods


        public double mathOctaPow(double a) {
            return Math.pow(a, 8);
        }
        public double plainOctaPow(double a) {
            return a * a * a * a * a * a * a * a;
        }
        public double trickyMathOctaPow(double a) {
            return Math.pow(Math.pow(Math.pow(a, 2), 2), 2);
        }
        public double trickyPlainOctaPow(double a) {
            a *= a; a *= a; return a * a;
        }

    Disassembled code was output using the following set of keys:

    -XX:+UnlockDiagnosticVMOptions
    -XX:CompileCommand=print,<класс>.<метод>
    -XX:PrintAssemblyOptions=intel

    plainOctaPow


    Let's start with the simplest case plainOctaPow. In fact, the code

    a * a * a * a * a * a * a * a

    equivalent to code

    ((((((a * a) * a) * a) * a) * a) * a) * a

    due to the left-associativity of the operation of multiplication.

    The content of this code was compiled by the JIT compiler (c1) into the following set of instructions ( xmm0only the parameter value is in the register double a):

      0x0000000002c96a3e: vmovapd xmm1, xmm0
      0x0000000002c96a42: vmulsd  xmm1, xmm1, xmm0
      0x0000000002c96a46: vmulsd  xmm1, xmm1, xmm0
      0x0000000002c96a4a: vmulsd  xmm1, xmm1, xmm0
      0x0000000002c96a4e: vmulsd  xmm1, xmm1, xmm0
      0x0000000002c96a52: vmulsd  xmm1, xmm1, xmm0
      0x0000000002c96a56: vmulsd  xmm1, xmm1, xmm0
      0x0000000002c96a5a: vmulsd  xmm1, xmm1, xmm0
      0x0000000002c96a5e: vmovapd xmm0, xmm1
    

    Brief instruction manual
    vmovapd xmm1, xmm2- place aligned double-precision floating-point numbers (double-precision float-point will be called double hereinafter everywhere) from register xmm2to register xmm1. Since the size of the XMMregisters is equal 128bit, you can take up to two doubles at a time. This instruction supports the YMM and ZMM registers, the sizes of which are 256bit and 512bit, respectively.

    vmulsd xmm1, xmm2, xmm3- multiply double-values ​​from the register xmm2and xmm3, and place the result in the register xmm1. Similar to the previous instruction - you can multiply simultaneously up to two doubles. If you use YMM and ZMM registers, then up to four and eight double, respectively.

    The sequence of instructions corresponds exactly to what is written in our code - sequentially multiplying the intermediate result by a. In this case, one cannot violate the left-associativity and in any way optimize the resulting code.

    trickyPlainOctaPow


    Let me remind you that we have no problem in obtaining an equivalent result. Therefore, we can independently try to optimize the code by reducing the number of operations, for example, replace consecutive multiplications with 3 operations of squaring.

    The method code trickyPlainOctaPow()compiles meaningfully into the following set of instructions:

    0x0000000002b501be: vmovapd xmm1, xmm0
    0x0000000002b501c2: vmulsd  xmm1, xmm1, xmm0
    0x0000000002b501c6: vmovapd xmm0, xmm1
    0x0000000002b501ca: vmulsd  xmm0, xmm0, xmm1
    0x0000000002b501ce: vmovapd xmm1, xmm0
    0x0000000002b501d2: vmulsd  xmm1, xmm1, xmm0
    0x0000000002b501d6: vmovapd xmm0, xmm1

    As you can see, the total number of operations has decreased: instead of 7 multiplications, we got 3 multiplications and 2 more instructions vmovapdfor preparing the second operand in the multiplication. The resulting code is approximately twice as fast if you count in the conditionals of latencyeach instruction.

    mathOctaPow


    Let's look inside the implementation of the method Math.pow():

        public static double pow(double a, double b) {
            return StrictMath.pow(a, b);
        }

    The first thing to note is that the degree value passed by the second argument is of type double. For this reason, the implementation of the function can no longer be as simple as in the case of ordinary multiplication.

    This StrictMath.pow()is the native method:

        public static native double pow(double a, double b);

    In a practical sense, this means that the call Math.pow()comes down to invoking the native method using JNI, which, as you know, is expensive . JDK, on ​​the other hand, makes extensive use of intrinsic functions (see the full list of intrinsics in HotSpot ). Among which there is _dpow- intrinsic replacing the challenge Math.pow().

    The latter means that after warming up, when the code is compiled by the JIT compiler, we can get this code for calculating the degree in the method mathOctaPow():

    The content of the assembler code of the mathOctaPow () method
      0x0000000002aaacd0: vmovsd xmm1,QWORD PTR [rip+0xffffffffffffff68]        # 0x0000000002aaac40
                                                    ;   {section_word}
      0x0000000002aaacd8: vmovsd QWORD PTR [rsp],xmm1
      0x0000000002aaacdd: fld    QWORD PTR [rsp]
      0x0000000002aaace0: vmovsd QWORD PTR [rsp],xmm0
      0x0000000002aaace5: fld    QWORD PTR [rsp]
      0x0000000002aaace8: movabs rax,0x6c4ba7d0     ;   {external_word}
      0x0000000002aaacf2: fld    QWORD PTR [rax]
      0x0000000002aaacf4: fucomip st,st(2)
      0x0000000002aaacf6: jp     0x0000000002aaad0f
      0x0000000002aaacfc: jne    0x0000000002aaad0f
      0x0000000002aaad02: fxch   st(1)
      0x0000000002aaad04: ffree  st(0)
      0x0000000002aaad06: fincstp 
      0x0000000002aaad08: fmul   st,st(0)
      0x0000000002aaad0a: jmp    0x0000000002aab166
      0x0000000002aaad0f: fldz   
      0x0000000002aaad11: fucomip st,st(1)
      0x0000000002aaad13: ja     0x0000000002aaad96
      0x0000000002aaad19: fld    st(1)
      0x0000000002aaad1b: fld    st(1)
      0x0000000002aaad1d: sub    rsp,0x8
      0x0000000002aaad21: fstcw  WORD PTR [rsp]
      0x0000000002aaad25: mov    eax,DWORD PTR [rsp]
      0x0000000002aaad28: or     eax,0x300
      0x0000000002aaad2e: push   rax
      0x0000000002aaad2f: fldcw  WORD PTR [rsp]
      0x0000000002aaad32: pop    rax
      0x0000000002aaad33: fyl2x  
      0x0000000002aaad35: sub    rsp,0x8
      0x0000000002aaad39: fld    st(0)
      0x0000000002aaad3b: frndint 
      0x0000000002aaad3d: fsubr  st(1),st
      0x0000000002aaad3f: fistp  DWORD PTR [rsp]
      0x0000000002aaad42: f2xm1  
      0x0000000002aaad44: fld1   
      0x0000000002aaad46: faddp  st(1),st
      0x0000000002aaad48: mov    eax,DWORD PTR [rsp]
      0x0000000002aaad4b: mov    ecx,0xfffff800
      0x0000000002aaad50: add    eax,0x3ff
      0x0000000002aaad56: mov    edx,eax
      0x0000000002aaad58: shl    eax,0x14
      0x0000000002aaad5b: add    edx,0x1
      0x0000000002aaad5e: cmove  eax,ecx
      0x0000000002aaad61: cmp    edx,0x1
      0x0000000002aaad64: cmove  eax,ecx
      0x0000000002aaad67: test   ecx,edx
      0x0000000002aaad69: cmovne eax,ecx
      0x0000000002aaad6c: mov    DWORD PTR [rsp+0x4],eax
      0x0000000002aaad70: mov    DWORD PTR [rsp],0x0
      0x0000000002aaad77: fmul   QWORD PTR [rsp]
      0x0000000002aaad7a: add    rsp,0x8
      0x0000000002aaad7e: fldcw  WORD PTR [rsp]
      0x0000000002aaad81: add    rsp,0x8
      0x0000000002aaad85: fucomi st,st(0)
      0x0000000002aaad87: jp     0x0000000002aaae36
      0x0000000002aaad8d: ffree  st(2)
      0x0000000002aaad8f: ffree  st(1)
      0x0000000002aaad91: jmp    0x0000000002aab166
      0x0000000002aaad96: fld    st(1)
      0x0000000002aaad98: frndint 
      0x0000000002aaad9a: fucomi st,st(2)
      0x0000000002aaad9c: jne    0x0000000002aaae36
      0x0000000002aaada2: sub    rsp,0x8
      0x0000000002aaada6: fistp  QWORD PTR [rsp]
      0x0000000002aaada9: fld    st(1)
      0x0000000002aaadab: fld    st(1)
      0x0000000002aaadad: fabs   
      0x0000000002aaadaf: sub    rsp,0x8
      0x0000000002aaadb3: fstcw  WORD PTR [rsp]
      0x0000000002aaadb7: mov    eax,DWORD PTR [rsp]
      0x0000000002aaadba: or     eax,0x300
      0x0000000002aaadc0: push   rax
      0x0000000002aaadc1: fldcw  WORD PTR [rsp]
      0x0000000002aaadc4: pop    rax
      0x0000000002aaadc5: fyl2x  
      0x0000000002aaadc7: sub    rsp,0x8
      0x0000000002aaadcb: fld    st(0)
      0x0000000002aaadcd: frndint 
      0x0000000002aaadcf: fsubr  st(1),st
      0x0000000002aaadd1: fistp  DWORD PTR [rsp]
      0x0000000002aaadd4: f2xm1  
      0x0000000002aaadd6: fld1   
      0x0000000002aaadd8: faddp  st(1),st
      0x0000000002aaadda: mov    eax,DWORD PTR [rsp]
      0x0000000002aaaddd: mov    ecx,0xfffff800
      0x0000000002aaade2: add    eax,0x3ff
      0x0000000002aaade8: mov    edx,eax
      0x0000000002aaadea: shl    eax,0x14
      0x0000000002aaaded: add    edx,0x1
      0x0000000002aaadf0: cmove  eax,ecx
      0x0000000002aaadf3: cmp    edx,0x1
      0x0000000002aaadf6: cmove  eax,ecx
      0x0000000002aaadf9: test   ecx,edx
      0x0000000002aaadfb: cmovne eax,ecx
      0x0000000002aaadfe: mov    DWORD PTR [rsp+0x4],eax
      0x0000000002aaae02: mov    DWORD PTR [rsp],0x0
      0x0000000002aaae09: fmul   QWORD PTR [rsp]
      0x0000000002aaae0c: add    rsp,0x8
      0x0000000002aaae10: fldcw  WORD PTR [rsp]
      0x0000000002aaae13: add    rsp,0x8
      0x0000000002aaae17: fucomi st,st(0)
      0x0000000002aaae19: pop    rax
      0x0000000002aaae1a: jp     0x0000000002aaae36
      0x0000000002aaae20: ffree  st(2)
      0x0000000002aaae22: ffree  st(1)
      0x0000000002aaae24: test   eax,0x1
      0x0000000002aaae29: je     0x0000000002aab166
      0x0000000002aaae2f: fchs   
      0x0000000002aaae31: jmp    0x0000000002aab166
      0x0000000002aaae36: ffree  st(0)
      0x0000000002aaae38: fincstp 
      0x0000000002aaae3a: mov    QWORD PTR [rsp-0x28],rsp
      0x0000000002aaae3f: sub    rsp,0x80
      0x0000000002aaae46: mov    QWORD PTR [rsp+0x78],rax
      0x0000000002aaae4b: mov    QWORD PTR [rsp+0x70],rcx
      0x0000000002aaae50: mov    QWORD PTR [rsp+0x68],rdx
      0x0000000002aaae55: mov    QWORD PTR [rsp+0x60],rbx
      0x0000000002aaae5a: mov    QWORD PTR [rsp+0x50],rbp
      0x0000000002aaae5f: mov    QWORD PTR [rsp+0x48],rsi
      0x0000000002aaae64: mov    QWORD PTR [rsp+0x40],rdi
      0x0000000002aaae69: mov    QWORD PTR [rsp+0x38],r8
      0x0000000002aaae6e: mov    QWORD PTR [rsp+0x30],r9
      0x0000000002aaae73: mov    QWORD PTR [rsp+0x28],r10
      0x0000000002aaae78: mov    QWORD PTR [rsp+0x20],r11
      0x0000000002aaae7d: mov    QWORD PTR [rsp+0x18],r12
      0x0000000002aaae82: mov    QWORD PTR [rsp+0x10],r13
      0x0000000002aaae87: mov    QWORD PTR [rsp+0x8],r14
      0x0000000002aaae8c: mov    QWORD PTR [rsp],r15
      0x0000000002aaae90: sub    rsp,0x100
      0x0000000002aaae97: vextractf128 XMMWORD PTR [rsp],ymm0,0x1
      0x0000000002aaae9e: vextractf128 XMMWORD PTR [rsp+0x10],ymm1,0x1
      0x0000000002aaaea6: vextractf128 XMMWORD PTR [rsp+0x20],ymm2,0x1
      0x0000000002aaaeae: vextractf128 XMMWORD PTR [rsp+0x30],ymm3,0x1
      0x0000000002aaaeb6: vextractf128 XMMWORD PTR [rsp+0x40],ymm4,0x1
      0x0000000002aaaebe: vextractf128 XMMWORD PTR [rsp+0x50],ymm5,0x1
      0x0000000002aaaec6: vextractf128 XMMWORD PTR [rsp+0x60],ymm6,0x1
      0x0000000002aaaece: vextractf128 XMMWORD PTR [rsp+0x70],ymm7,0x1
      0x0000000002aaaed6: vextractf128 XMMWORD PTR [rsp+0x80],ymm8,0x1
      0x0000000002aaaee1: vextractf128 XMMWORD PTR [rsp+0x90],ymm9,0x1
      0x0000000002aaaeec: vextractf128 XMMWORD PTR [rsp+0xa0],ymm10,0x1
      0x0000000002aaaef7: vextractf128 XMMWORD PTR [rsp+0xb0],ymm11,0x1
      0x0000000002aaaf02: vextractf128 XMMWORD PTR [rsp+0xc0],ymm12,0x1
      0x0000000002aaaf0d: vextractf128 XMMWORD PTR [rsp+0xd0],ymm13,0x1
      0x0000000002aaaf18: vextractf128 XMMWORD PTR [rsp+0xe0],ymm14,0x1
      0x0000000002aaaf23: vextractf128 XMMWORD PTR [rsp+0xf0],ymm15,0x1
      0x0000000002aaaf2e: sub    rsp,0x100
      0x0000000002aaaf35: vmovdqu XMMWORD PTR [rsp],xmm0
      0x0000000002aaaf3a: vmovdqu XMMWORD PTR [rsp+0x10],xmm1
      0x0000000002aaaf40: vmovdqu XMMWORD PTR [rsp+0x20],xmm2
      0x0000000002aaaf46: vmovdqu XMMWORD PTR [rsp+0x30],xmm3
      0x0000000002aaaf4c: vmovdqu XMMWORD PTR [rsp+0x40],xmm4
      0x0000000002aaaf52: vmovdqu XMMWORD PTR [rsp+0x50],xmm5
      0x0000000002aaaf58: vmovdqu XMMWORD PTR [rsp+0x60],xmm6
      0x0000000002aaaf5e: vmovdqu XMMWORD PTR [rsp+0x70],xmm7
      0x0000000002aaaf64: vmovdqu XMMWORD PTR [rsp+0x80],xmm8
      0x0000000002aaaf6d: vmovdqu XMMWORD PTR [rsp+0x90],xmm9
      0x0000000002aaaf76: vmovdqu XMMWORD PTR [rsp+0xa0],xmm10
      0x0000000002aaaf7f: vmovdqu XMMWORD PTR [rsp+0xb0],xmm11
      0x0000000002aaaf88: vmovdqu XMMWORD PTR [rsp+0xc0],xmm12
      0x0000000002aaaf91: vmovdqu XMMWORD PTR [rsp+0xd0],xmm13
      0x0000000002aaaf9a: vmovdqu XMMWORD PTR [rsp+0xe0],xmm14
      0x0000000002aaafa3: vmovdqu XMMWORD PTR [rsp+0xf0],xmm15
      0x0000000002aaafac: sub    rsp,0x10
      0x0000000002aaafb0: fstp   QWORD PTR [rsp]
      0x0000000002aaafb3: fstp   QWORD PTR [rsp+0x8]
      0x0000000002aaafb7: vmovsd xmm0,QWORD PTR [rsp]
      0x0000000002aaafbc: vmovsd xmm1,QWORD PTR [rsp+0x8]
      0x0000000002aaafc2: sub    rsp,0x20
      0x0000000002aaafc6: test   esp,0xf
      0x0000000002aaafcc: je     0x0000000002aaafe4
      0x0000000002aaafd2: sub    rsp,0x8
      0x0000000002aaafd6: call   0x000000006bf240d0  ;   {runtime_call}
      0x0000000002aaafdb: add    rsp,0x8
      0x0000000002aaafdf: jmp    0x0000000002aaafe9
      0x0000000002aaafe4: call   0x000000006bf240d0  ;   {runtime_call}
      0x0000000002aaafe9: add    rsp,0x20
      0x0000000002aaafed: vmovsd QWORD PTR [rsp],xmm0
      0x0000000002aaaff2: fld    QWORD PTR [rsp]
      0x0000000002aaaff5: add    rsp,0x10
      0x0000000002aaaff9: vmovdqu xmm0,XMMWORD PTR [rsp]
      0x0000000002aaaffe: vmovdqu xmm1,XMMWORD PTR [rsp+0x10]
      0x0000000002aab004: vmovdqu xmm2,XMMWORD PTR [rsp+0x20]
      0x0000000002aab00a: vmovdqu xmm3,XMMWORD PTR [rsp+0x30]
      0x0000000002aab010: vmovdqu xmm4,XMMWORD PTR [rsp+0x40]
      0x0000000002aab016: vmovdqu xmm5,XMMWORD PTR [rsp+0x50]
      0x0000000002aab01c: vmovdqu xmm6,XMMWORD PTR [rsp+0x60]
      0x0000000002aab022: vmovdqu xmm7,XMMWORD PTR [rsp+0x70]
      0x0000000002aab028: vmovdqu xmm8,XMMWORD PTR [rsp+0x80]
      0x0000000002aab031: vmovdqu xmm9,XMMWORD PTR [rsp+0x90]
      0x0000000002aab03a: vmovdqu xmm10,XMMWORD PTR [rsp+0xa0]
      0x0000000002aab043: vmovdqu xmm11,XMMWORD PTR [rsp+0xb0]
      0x0000000002aab04c: vmovdqu xmm12,XMMWORD PTR [rsp+0xc0]
      0x0000000002aab055: vmovdqu xmm13,XMMWORD PTR [rsp+0xd0]
      0x0000000002aab05e: vmovdqu xmm14,XMMWORD PTR [rsp+0xe0]
      0x0000000002aab067: vmovdqu xmm15,XMMWORD PTR [rsp+0xf0]
      0x0000000002aab070: add    rsp,0x100
      0x0000000002aab077: vinsertf128 ymm0,ymm0,XMMWORD PTR [rsp],0x1
      0x0000000002aab07e: vinsertf128 ymm1,ymm1,XMMWORD PTR [rsp+0x10],0x1
      0x0000000002aab086: vinsertf128 ymm2,ymm2,XMMWORD PTR [rsp+0x20],0x1
      0x0000000002aab08e: vinsertf128 ymm3,ymm3,XMMWORD PTR [rsp+0x30],0x1
      0x0000000002aab096: vinsertf128 ymm4,ymm4,XMMWORD PTR [rsp+0x40],0x1
      0x0000000002aab09e: vinsertf128 ymm5,ymm5,XMMWORD PTR [rsp+0x50],0x1
      0x0000000002aab0a6: vinsertf128 ymm6,ymm6,XMMWORD PTR [rsp+0x60],0x1
      0x0000000002aab0ae: vinsertf128 ymm7,ymm7,XMMWORD PTR [rsp+0x70],0x1
      0x0000000002aab0b6: vinsertf128 ymm8,ymm8,XMMWORD PTR [rsp+0x80],0x1
      0x0000000002aab0c1: vinsertf128 ymm9,ymm9,XMMWORD PTR [rsp+0x90],0x1
      0x0000000002aab0cc: vinsertf128 ymm10,ymm10,XMMWORD PTR [rsp+0xa0],0x1
      0x0000000002aab0d7: vinsertf128 ymm11,ymm11,XMMWORD PTR [rsp+0xb0],0x1
      0x0000000002aab0e2: vinsertf128 ymm12,ymm12,XMMWORD PTR [rsp+0xc0],0x1
      0x0000000002aab0ed: vinsertf128 ymm13,ymm13,XMMWORD PTR [rsp+0xd0],0x1
      0x0000000002aab0f8: vinsertf128 ymm14,ymm14,XMMWORD PTR [rsp+0xe0],0x1
      0x0000000002aab103: vinsertf128 ymm15,ymm15,XMMWORD PTR [rsp+0xf0],0x1
      0x0000000002aab10e: add    rsp,0x100
      0x0000000002aab115: mov    r15,QWORD PTR [rsp]
      0x0000000002aab119: mov    r14,QWORD PTR [rsp+0x8]
      0x0000000002aab11e: mov    r13,QWORD PTR [rsp+0x10]
      0x0000000002aab123: mov    r12,QWORD PTR [rsp+0x18]
      0x0000000002aab128: mov    r11,QWORD PTR [rsp+0x20]
      0x0000000002aab12d: mov    r10,QWORD PTR [rsp+0x28]
      0x0000000002aab132: mov    r9,QWORD PTR [rsp+0x30]
      0x0000000002aab137: mov    r8,QWORD PTR [rsp+0x38]
      0x0000000002aab13c: mov    rdi,QWORD PTR [rsp+0x40]
      0x0000000002aab141: mov    rsi,QWORD PTR [rsp+0x48]
      0x0000000002aab146: mov    rbp,QWORD PTR [rsp+0x50]
      0x0000000002aab14b: mov    rbx,QWORD PTR [rsp+0x60]
      0x0000000002aab150: mov    rdx,QWORD PTR [rsp+0x68]
      0x0000000002aab155: mov    rcx,QWORD PTR [rsp+0x70]
      0x0000000002aab15a: mov    rax,QWORD PTR [rsp+0x78]
      0x0000000002aab15f: add    rsp,0x80
      0x0000000002aab166: fstp   QWORD PTR [rsp]
      0x0000000002aab169: vmovsd xmm0,QWORD PTR [rsp]  ;*invokestatic pow
                                                    ; - ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark::mathOctaPow@4 (line 55)

    Here, the first instruction writes the constant 8.0to the register xmm1, and the xmm0value is already in the register a. Next is the body of the intrinsic function .

    trickyMathOctaPow


    Great, instead of one expensive call ,Math.pow() we got three. The JIT compiler replaced the method body trickyMathOctaPow()with three consecutive _dpow implementations .

    Sequential inlining _dpow
      0x0000000002a70b14: vmovsd xmm1,QWORD PTR [rip+0xffffffffffffff44]        # 0x0000000002a70a60
                                                    ;   {section_word}
      0x0000000002a70b1c: vmovsd QWORD PTR [rsp],xmm1
      0x0000000002a70b21: fld    QWORD PTR [rsp]
      0x0000000002a70b24: vmovsd QWORD PTR [rsp],xmm0
      0x0000000002a70b29: fld    QWORD PTR [rsp]
      0x0000000002a70b2c: movabs rax,0x6c4ba7d0     ;   {external_word}
      0x0000000002a70b36: fld    QWORD PTR [rax]
      0x0000000002a70b38: fucomip st,st(2)
      0x0000000002a70b3a: jp     0x0000000002a70b53
      0x0000000002a70b40: jne    0x0000000002a70b53
      0x0000000002a70b46: fxch   st(1)
      0x0000000002a70b48: ffree  st(0)
      0x0000000002a70b4a: fincstp 
      0x0000000002a70b4c: fmul   st,st(0)
      0x0000000002a70b4e: jmp    0x0000000002a70faa
      0x0000000002a70b53: fldz   
      0x0000000002a70b55: fucomip st,st(1)
      0x0000000002a70b57: ja     0x0000000002a70bda
      0x0000000002a70b5d: fld    st(1)
      0x0000000002a70b5f: fld    st(1)
      0x0000000002a70b61: sub    rsp,0x8
      0x0000000002a70b65: fstcw  WORD PTR [rsp]
      0x0000000002a70b69: mov    eax,DWORD PTR [rsp]
      0x0000000002a70b6c: or     eax,0x300
      0x0000000002a70b72: push   rax
      0x0000000002a70b73: fldcw  WORD PTR [rsp]
      0x0000000002a70b76: pop    rax
      0x0000000002a70b77: fyl2x  
      0x0000000002a70b79: sub    rsp,0x8
      0x0000000002a70b7d: fld    st(0)
      0x0000000002a70b7f: frndint 
      0x0000000002a70b81: fsubr  st(1),st
      0x0000000002a70b83: fistp  DWORD PTR [rsp]
      0x0000000002a70b86: f2xm1  
      0x0000000002a70b88: fld1   
      0x0000000002a70b8a: faddp  st(1),st
      0x0000000002a70b8c: mov    eax,DWORD PTR [rsp]
      0x0000000002a70b8f: mov    ecx,0xfffff800
      0x0000000002a70b94: add    eax,0x3ff
      0x0000000002a70b9a: mov    edx,eax
      0x0000000002a70b9c: shl    eax,0x14
      0x0000000002a70b9f: add    edx,0x1
      0x0000000002a70ba2: cmove  eax,ecx
      0x0000000002a70ba5: cmp    edx,0x1
      0x0000000002a70ba8: cmove  eax,ecx
      0x0000000002a70bab: test   ecx,edx
      0x0000000002a70bad: cmovne eax,ecx
      0x0000000002a70bb0: mov    DWORD PTR [rsp+0x4],eax
      0x0000000002a70bb4: mov    DWORD PTR [rsp],0x0
      0x0000000002a70bbb: fmul   QWORD PTR [rsp]
      0x0000000002a70bbe: add    rsp,0x8
      0x0000000002a70bc2: fldcw  WORD PTR [rsp]
      0x0000000002a70bc5: add    rsp,0x8
      0x0000000002a70bc9: fucomi st,st(0)
      0x0000000002a70bcb: jp     0x0000000002a70c7a
      0x0000000002a70bd1: ffree  st(2)
      0x0000000002a70bd3: ffree  st(1)
      0x0000000002a70bd5: jmp    0x0000000002a70faa
      0x0000000002a70bda: fld    st(1)
      0x0000000002a70bdc: frndint 
      0x0000000002a70bde: fucomi st,st(2)
      0x0000000002a70be0: jne    0x0000000002a70c7a
      0x0000000002a70be6: sub    rsp,0x8
      0x0000000002a70bea: fistp  QWORD PTR [rsp]
      0x0000000002a70bed: fld    st(1)
      0x0000000002a70bef: fld    st(1)
      0x0000000002a70bf1: fabs   
      0x0000000002a70bf3: sub    rsp,0x8
      0x0000000002a70bf7: fstcw  WORD PTR [rsp]
      0x0000000002a70bfb: mov    eax,DWORD PTR [rsp]
      0x0000000002a70bfe: or     eax,0x300
      0x0000000002a70c04: push   rax
      0x0000000002a70c05: fldcw  WORD PTR [rsp]
      0x0000000002a70c08: pop    rax
      0x0000000002a70c09: fyl2x  
      0x0000000002a70c0b: sub    rsp,0x8
      0x0000000002a70c0f: fld    st(0)
      0x0000000002a70c11: frndint 
      0x0000000002a70c13: fsubr  st(1),st
      0x0000000002a70c15: fistp  DWORD PTR [rsp]
      0x0000000002a70c18: f2xm1  
      0x0000000002a70c1a: fld1   
      0x0000000002a70c1c: faddp  st(1),st
      0x0000000002a70c1e: mov    eax,DWORD PTR [rsp]
      0x0000000002a70c21: mov    ecx,0xfffff800
      0x0000000002a70c26: add    eax,0x3ff
      0x0000000002a70c2c: mov    edx,eax
      0x0000000002a70c2e: shl    eax,0x14
      0x0000000002a70c31: add    edx,0x1
      0x0000000002a70c34: cmove  eax,ecx
      0x0000000002a70c37: cmp    edx,0x1
      0x0000000002a70c3a: cmove  eax,ecx
      0x0000000002a70c3d: test   ecx,edx
      0x0000000002a70c3f: cmovne eax,ecx
      0x0000000002a70c42: mov    DWORD PTR [rsp+0x4],eax
      0x0000000002a70c46: mov    DWORD PTR [rsp],0x0
      0x0000000002a70c4d: fmul   QWORD PTR [rsp]
      0x0000000002a70c50: add    rsp,0x8
      0x0000000002a70c54: fldcw  WORD PTR [rsp]
      0x0000000002a70c57: add    rsp,0x8
      0x0000000002a70c5b: fucomi st,st(0)
      0x0000000002a70c5d: pop    rax
      0x0000000002a70c5e: jp     0x0000000002a70c7a
      0x0000000002a70c64: ffree  st(2)
      0x0000000002a70c66: ffree  st(1)
      0x0000000002a70c68: test   eax,0x1
      0x0000000002a70c6d: je     0x0000000002a70faa
      0x0000000002a70c73: fchs   
      0x0000000002a70c75: jmp    0x0000000002a70faa
      0x0000000002a70c7a: ffree  st(0)
      0x0000000002a70c7c: fincstp 
      0x0000000002a70c7e: mov    QWORD PTR [rsp-0x28],rsp
      0x0000000002a70c83: sub    rsp,0x80
      0x0000000002a70c8a: mov    QWORD PTR [rsp+0x78],rax
      0x0000000002a70c8f: mov    QWORD PTR [rsp+0x70],rcx
      0x0000000002a70c94: mov    QWORD PTR [rsp+0x68],rdx
      0x0000000002a70c99: mov    QWORD PTR [rsp+0x60],rbx
      0x0000000002a70c9e: mov    QWORD PTR [rsp+0x50],rbp
      0x0000000002a70ca3: mov    QWORD PTR [rsp+0x48],rsi
      0x0000000002a70ca8: mov    QWORD PTR [rsp+0x40],rdi
      0x0000000002a70cad: mov    QWORD PTR [rsp+0x38],r8
      0x0000000002a70cb2: mov    QWORD PTR [rsp+0x30],r9
      0x0000000002a70cb7: mov    QWORD PTR [rsp+0x28],r10
      0x0000000002a70cbc: mov    QWORD PTR [rsp+0x20],r11
      0x0000000002a70cc1: mov    QWORD PTR [rsp+0x18],r12
      0x0000000002a70cc6: mov    QWORD PTR [rsp+0x10],r13
      0x0000000002a70ccb: mov    QWORD PTR [rsp+0x8],r14
      0x0000000002a70cd0: mov    QWORD PTR [rsp],r15
      0x0000000002a70cd4: sub    rsp,0x100
      0x0000000002a70cdb: vextractf128 XMMWORD PTR [rsp],ymm0,0x1
      0x0000000002a70ce2: vextractf128 XMMWORD PTR [rsp+0x10],ymm1,0x1
      0x0000000002a70cea: vextractf128 XMMWORD PTR [rsp+0x20],ymm2,0x1
      0x0000000002a70cf2: vextractf128 XMMWORD PTR [rsp+0x30],ymm3,0x1
      0x0000000002a70cfa: vextractf128 XMMWORD PTR [rsp+0x40],ymm4,0x1
      0x0000000002a70d02: vextractf128 XMMWORD PTR [rsp+0x50],ymm5,0x1
      0x0000000002a70d0a: vextractf128 XMMWORD PTR [rsp+0x60],ymm6,0x1
      0x0000000002a70d12: vextractf128 XMMWORD PTR [rsp+0x70],ymm7,0x1
      0x0000000002a70d1a: vextractf128 XMMWORD PTR [rsp+0x80],ymm8,0x1
      0x0000000002a70d25: vextractf128 XMMWORD PTR [rsp+0x90],ymm9,0x1
      0x0000000002a70d30: vextractf128 XMMWORD PTR [rsp+0xa0],ymm10,0x1
      0x0000000002a70d3b: vextractf128 XMMWORD PTR [rsp+0xb0],ymm11,0x1
      0x0000000002a70d46: vextractf128 XMMWORD PTR [rsp+0xc0],ymm12,0x1
      0x0000000002a70d51: vextractf128 XMMWORD PTR [rsp+0xd0],ymm13,0x1
      0x0000000002a70d5c: vextractf128 XMMWORD PTR [rsp+0xe0],ymm14,0x1
      0x0000000002a70d67: vextractf128 XMMWORD PTR [rsp+0xf0],ymm15,0x1
      0x0000000002a70d72: sub    rsp,0x100
      0x0000000002a70d79: vmovdqu XMMWORD PTR [rsp],xmm0
      0x0000000002a70d7e: vmovdqu XMMWORD PTR [rsp+0x10],xmm1
      0x0000000002a70d84: vmovdqu XMMWORD PTR [rsp+0x20],xmm2
      0x0000000002a70d8a: vmovdqu XMMWORD PTR [rsp+0x30],xmm3
      0x0000000002a70d90: vmovdqu XMMWORD PTR [rsp+0x40],xmm4
      0x0000000002a70d96: vmovdqu XMMWORD PTR [rsp+0x50],xmm5
      0x0000000002a70d9c: vmovdqu XMMWORD PTR [rsp+0x60],xmm6
      0x0000000002a70da2: vmovdqu XMMWORD PTR [rsp+0x70],xmm7
      0x0000000002a70da8: vmovdqu XMMWORD PTR [rsp+0x80],xmm8
      0x0000000002a70db1: vmovdqu XMMWORD PTR [rsp+0x90],xmm9
      0x0000000002a70dba: vmovdqu XMMWORD PTR [rsp+0xa0],xmm10
      0x0000000002a70dc3: vmovdqu XMMWORD PTR [rsp+0xb0],xmm11
      0x0000000002a70dcc: vmovdqu XMMWORD PTR [rsp+0xc0],xmm12
      0x0000000002a70dd5: vmovdqu XMMWORD PTR [rsp+0xd0],xmm13
      0x0000000002a70dde: vmovdqu XMMWORD PTR [rsp+0xe0],xmm14
      0x0000000002a70de7: vmovdqu XMMWORD PTR [rsp+0xf0],xmm15
      0x0000000002a70df0: sub    rsp,0x10
      0x0000000002a70df4: fstp   QWORD PTR [rsp]
      0x0000000002a70df7: fstp   QWORD PTR [rsp+0x8]
      0x0000000002a70dfb: vmovsd xmm0,QWORD PTR [rsp]
      0x0000000002a70e00: vmovsd xmm1,QWORD PTR [rsp+0x8]
      0x0000000002a70e06: sub    rsp,0x20
      0x0000000002a70e0a: test   esp,0xf
      0x0000000002a70e10: je     0x0000000002a70e28
      0x0000000002a70e16: sub    rsp,0x8
      0x0000000002a70e1a: call   0x000000006bf240d0  ;   {runtime_call}
      0x0000000002a70e1f: add    rsp,0x8
      0x0000000002a70e23: jmp    0x0000000002a70e2d
      0x0000000002a70e28: call   0x000000006bf240d0  ;   {runtime_call}
      0x0000000002a70e2d: add    rsp,0x20
      0x0000000002a70e31: vmovsd QWORD PTR [rsp],xmm0
      0x0000000002a70e36: fld    QWORD PTR [rsp]
      0x0000000002a70e39: add    rsp,0x10
      0x0000000002a70e3d: vmovdqu xmm0,XMMWORD PTR [rsp]
      0x0000000002a70e42: vmovdqu xmm1,XMMWORD PTR [rsp+0x10]
      0x0000000002a70e48: vmovdqu xmm2,XMMWORD PTR [rsp+0x20]
      0x0000000002a70e4e: vmovdqu xmm3,XMMWORD PTR [rsp+0x30]
      0x0000000002a70e54: vmovdqu xmm4,XMMWORD PTR [rsp+0x40]
      0x0000000002a70e5a: vmovdqu xmm5,XMMWORD PTR [rsp+0x50]
      0x0000000002a70e60: vmovdqu xmm6,XMMWORD PTR [rsp+0x60]
      0x0000000002a70e66: vmovdqu xmm7,XMMWORD PTR [rsp+0x70]
      0x0000000002a70e6c: vmovdqu xmm8,XMMWORD PTR [rsp+0x80]
      0x0000000002a70e75: vmovdqu xmm9,XMMWORD PTR [rsp+0x90]
      0x0000000002a70e7e: vmovdqu xmm10,XMMWORD PTR [rsp+0xa0]
      0x0000000002a70e87: vmovdqu xmm11,XMMWORD PTR [rsp+0xb0]
      0x0000000002a70e90: vmovdqu xmm12,XMMWORD PTR [rsp+0xc0]
      0x0000000002a70e99: vmovdqu xmm13,XMMWORD PTR [rsp+0xd0]
      0x0000000002a70ea2: vmovdqu xmm14,XMMWORD PTR [rsp+0xe0]
      0x0000000002a70eab: vmovdqu xmm15,XMMWORD PTR [rsp+0xf0]
      0x0000000002a70eb4: add    rsp,0x100
      0x0000000002a70ebb: vinsertf128 ymm0,ymm0,XMMWORD PTR [rsp],0x1
      0x0000000002a70ec2: vinsertf128 ymm1,ymm1,XMMWORD PTR [rsp+0x10],0x1
      0x0000000002a70eca: vinsertf128 ymm2,ymm2,XMMWORD PTR [rsp+0x20],0x1
      0x0000000002a70ed2: vinsertf128 ymm3,ymm3,XMMWORD PTR [rsp+0x30],0x1
      0x0000000002a70eda: vinsertf128 ymm4,ymm4,XMMWORD PTR [rsp+0x40],0x1
      0x0000000002a70ee2: vinsertf128 ymm5,ymm5,XMMWORD PTR [rsp+0x50],0x1
      0x0000000002a70eea: vinsertf128 ymm6,ymm6,XMMWORD PTR [rsp+0x60],0x1
      0x0000000002a70ef2: vinsertf128 ymm7,ymm7,XMMWORD PTR [rsp+0x70],0x1
      0x0000000002a70efa: vinsertf128 ymm8,ymm8,XMMWORD PTR [rsp+0x80],0x1
      0x0000000002a70f05: vinsertf128 ymm9,ymm9,XMMWORD PTR [rsp+0x90],0x1
      0x0000000002a70f10: vinsertf128 ymm10,ymm10,XMMWORD PTR [rsp+0xa0],0x1
      0x0000000002a70f1b: vinsertf128 ymm11,ymm11,XMMWORD PTR [rsp+0xb0],0x1
      0x0000000002a70f26: vinsertf128 ymm12,ymm12,XMMWORD PTR [rsp+0xc0],0x1
      0x0000000002a70f31: vinsertf128 ymm13,ymm13,XMMWORD PTR [rsp+0xd0],0x1
      0x0000000002a70f3c: vinsertf128 ymm14,ymm14,XMMWORD PTR [rsp+0xe0],0x1
      0x0000000002a70f47: vinsertf128 ymm15,ymm15,XMMWORD PTR [rsp+0xf0],0x1
      0x0000000002a70f52: add    rsp,0x100
      0x0000000002a70f59: mov    r15,QWORD PTR [rsp]
      0x0000000002a70f5d: mov    r14,QWORD PTR [rsp+0x8]
      0x0000000002a70f62: mov    r13,QWORD PTR [rsp+0x10]
      0x0000000002a70f67: mov    r12,QWORD PTR [rsp+0x18]
      0x0000000002a70f6c: mov    r11,QWORD PTR [rsp+0x20]
      0x0000000002a70f71: mov    r10,QWORD PTR [rsp+0x28]
      0x0000000002a70f76: mov    r9,QWORD PTR [rsp+0x30]
      0x0000000002a70f7b: mov    r8,QWORD PTR [rsp+0x38]
      0x0000000002a70f80: mov    rdi,QWORD PTR [rsp+0x40]
      0x0000000002a70f85: mov    rsi,QWORD PTR [rsp+0x48]
      0x0000000002a70f8a: mov    rbp,QWORD PTR [rsp+0x50]
      0x0000000002a70f8f: mov    rbx,QWORD PTR [rsp+0x60]
      0x0000000002a70f94: mov    rdx,QWORD PTR [rsp+0x68]
      0x0000000002a70f99: mov    rcx,QWORD PTR [rsp+0x70]
      0x0000000002a70f9e: mov    rax,QWORD PTR [rsp+0x78]
      0x0000000002a70fa3: add    rsp,0x80
      0x0000000002a70faa: fstp   QWORD PTR [rsp]
      0x0000000002a70fad: vmovsd xmm0,QWORD PTR [rsp]  ;*invokestatic pow
                                                    ; - ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark::trickyMathOctaPow@4 (line 63)
      0x0000000002a70fb2: vmovsd xmm1,QWORD PTR [rip+0xfffffffffffffaae]        # 0x0000000002a70a68
                                                    ;   {section_word}
      0x0000000002a70fba: vmovsd QWORD PTR [rsp],xmm1
      0x0000000002a70fbf: fld    QWORD PTR [rsp]
      0x0000000002a70fc2: vmovsd QWORD PTR [rsp],xmm0
      0x0000000002a70fc7: fld    QWORD PTR [rsp]
      0x0000000002a70fca: movabs rax,0x6c4ba7d0     ;   {external_word}
      0x0000000002a70fd4: fld    QWORD PTR [rax]
      0x0000000002a70fd6: fucomip st,st(2)
      0x0000000002a70fd8: jp     0x0000000002a70ff1
      0x0000000002a70fde: jne    0x0000000002a70ff1
      0x0000000002a70fe4: fxch   st(1)
      0x0000000002a70fe6: ffree  st(0)
      0x0000000002a70fe8: fincstp 
      0x0000000002a70fea: fmul   st,st(0)
      0x0000000002a70fec: jmp    0x0000000002a71448
      0x0000000002a70ff1: fldz   
      0x0000000002a70ff3: fucomip st,st(1)
      0x0000000002a70ff5: ja     0x0000000002a71078
      0x0000000002a70ffb: fld    st(1)
      0x0000000002a70ffd: fld    st(1)
      0x0000000002a70fff: sub    rsp,0x8
      0x0000000002a71003: fstcw  WORD PTR [rsp]
      0x0000000002a71007: mov    eax,DWORD PTR [rsp]
      0x0000000002a7100a: or     eax,0x300
      0x0000000002a71010: push   rax
      0x0000000002a71011: fldcw  WORD PTR [rsp]
      0x0000000002a71014: pop    rax
      0x0000000002a71015: fyl2x  
      0x0000000002a71017: sub    rsp,0x8
      0x0000000002a7101b: fld    st(0)
      0x0000000002a7101d: frndint 
      0x0000000002a7101f: fsubr  st(1),st
      0x0000000002a71021: fistp  DWORD PTR [rsp]
      0x0000000002a71024: f2xm1  
      0x0000000002a71026: fld1   
      0x0000000002a71028: faddp  st(1),st
      0x0000000002a7102a: mov    eax,DWORD PTR [rsp]
      0x0000000002a7102d: mov    ecx,0xfffff800
      0x0000000002a71032: add    eax,0x3ff
      0x0000000002a71038: mov    edx,eax
      0x0000000002a7103a: shl    eax,0x14
      0x0000000002a7103d: add    edx,0x1
      0x0000000002a71040: cmove  eax,ecx
      0x0000000002a71043: cmp    edx,0x1
      0x0000000002a71046: cmove  eax,ecx
      0x0000000002a71049: test   ecx,edx
      0x0000000002a7104b: cmovne eax,ecx
      0x0000000002a7104e: mov    DWORD PTR [rsp+0x4],eax
      0x0000000002a71052: mov    DWORD PTR [rsp],0x0
      0x0000000002a71059: fmul   QWORD PTR [rsp]
      0x0000000002a7105c: add    rsp,0x8
      0x0000000002a71060: fldcw  WORD PTR [rsp]
      0x0000000002a71063: add    rsp,0x8
      0x0000000002a71067: fucomi st,st(0)
      0x0000000002a71069: jp     0x0000000002a71118
      0x0000000002a7106f: ffree  st(2)
      0x0000000002a71071: ffree  st(1)
      0x0000000002a71073: jmp    0x0000000002a71448
      0x0000000002a71078: fld    st(1)
      0x0000000002a7107a: frndint 
      0x0000000002a7107c: fucomi st,st(2)
      0x0000000002a7107e: jne    0x0000000002a71118
      0x0000000002a71084: sub    rsp,0x8
      0x0000000002a71088: fistp  QWORD PTR [rsp]
      0x0000000002a7108b: fld    st(1)
      0x0000000002a7108d: fld    st(1)
      0x0000000002a7108f: fabs   
      0x0000000002a71091: sub    rsp,0x8
      0x0000000002a71095: fstcw  WORD PTR [rsp]
      0x0000000002a71099: mov    eax,DWORD PTR [rsp]
      0x0000000002a7109c: or     eax,0x300
      0x0000000002a710a2: push   rax
      0x0000000002a710a3: fldcw  WORD PTR [rsp]
      0x0000000002a710a6: pop    rax
      0x0000000002a710a7: fyl2x  
      0x0000000002a710a9: sub    rsp,0x8
      0x0000000002a710ad: fld    st(0)
      0x0000000002a710af: frndint 
      0x0000000002a710b1: fsubr  st(1),st
      0x0000000002a710b3: fistp  DWORD PTR [rsp]
      0x0000000002a710b6: f2xm1  
      0x0000000002a710b8: fld1   
      0x0000000002a710ba: faddp  st(1),st
      0x0000000002a710bc: mov    eax,DWORD PTR [rsp]
      0x0000000002a710bf: mov    ecx,0xfffff800
      0x0000000002a710c4: add    eax,0x3ff
      0x0000000002a710ca: mov    edx,eax
      0x0000000002a710cc: shl    eax,0x14
      0x0000000002a710cf: add    edx,0x1
      0x0000000002a710d2: cmove  eax,ecx
      0x0000000002a710d5: cmp    edx,0x1
      0x0000000002a710d8: cmove  eax,ecx
      0x0000000002a710db: test   ecx,edx
      0x0000000002a710dd: cmovne eax,ecx
      0x0000000002a710e0: mov    DWORD PTR [rsp+0x4],eax
      0x0000000002a710e4: mov    DWORD PTR [rsp],0x0
      0x0000000002a710eb: fmul   QWORD PTR [rsp]
      0x0000000002a710ee: add    rsp,0x8
      0x0000000002a710f2: fldcw  WORD PTR [rsp]
      0x0000000002a710f5: add    rsp,0x8
      0x0000000002a710f9: fucomi st,st(0)
      0x0000000002a710fb: pop    rax
      0x0000000002a710fc: jp     0x0000000002a71118
      0x0000000002a71102: ffree  st(2)
      0x0000000002a71104: ffree  st(1)
      0x0000000002a71106: test   eax,0x1
      0x0000000002a7110b: je     0x0000000002a71448
      0x0000000002a71111: fchs   
      0x0000000002a71113: jmp    0x0000000002a71448
      0x0000000002a71118: ffree  st(0)
      0x0000000002a7111a: fincstp 
      0x0000000002a7111c: mov    QWORD PTR [rsp-0x28],rsp
      0x0000000002a71121: sub    rsp,0x80
      0x0000000002a71128: mov    QWORD PTR [rsp+0x78],rax
      0x0000000002a7112d: mov    QWORD PTR [rsp+0x70],rcx
      0x0000000002a71132: mov    QWORD PTR [rsp+0x68],rdx
      0x0000000002a71137: mov    QWORD PTR [rsp+0x60],rbx
      0x0000000002a7113c: mov    QWORD PTR [rsp+0x50],rbp
      0x0000000002a71141: mov    QWORD PTR [rsp+0x48],rsi
      0x0000000002a71146: mov    QWORD PTR [rsp+0x40],rdi
      0x0000000002a7114b: mov    QWORD PTR [rsp+0x38],r8
      0x0000000002a71150: mov    QWORD PTR [rsp+0x30],r9
      0x0000000002a71155: mov    QWORD PTR [rsp+0x28],r10
      0x0000000002a7115a: mov    QWORD PTR [rsp+0x20],r11
      0x0000000002a7115f: mov    QWORD PTR [rsp+0x18],r12
      0x0000000002a71164: mov    QWORD PTR [rsp+0x10],r13
      0x0000000002a71169: mov    QWORD PTR [rsp+0x8],r14
      0x0000000002a7116e: mov    QWORD PTR [rsp],r15
      0x0000000002a71172: sub    rsp,0x100
      0x0000000002a71179: vextractf128 XMMWORD PTR [rsp],ymm0,0x1
      0x0000000002a71180: vextractf128 XMMWORD PTR [rsp+0x10],ymm1,0x1
      0x0000000002a71188: vextractf128 XMMWORD PTR [rsp+0x20],ymm2,0x1
      0x0000000002a71190: vextractf128 XMMWORD PTR [rsp+0x30],ymm3,0x1
      0x0000000002a71198: vextractf128 XMMWORD PTR [rsp+0x40],ymm4,0x1
      0x0000000002a711a0: vextractf128 XMMWORD PTR [rsp+0x50],ymm5,0x1
      0x0000000002a711a8: vextractf128 XMMWORD PTR [rsp+0x60],ymm6,0x1
      0x0000000002a711b0: vextractf128 XMMWORD PTR [rsp+0x70],ymm7,0x1
      0x0000000002a711b8: vextractf128 XMMWORD PTR [rsp+0x80],ymm8,0x1
      0x0000000002a711c3: vextractf128 XMMWORD PTR [rsp+0x90],ymm9,0x1
      0x0000000002a711ce: vextractf128 XMMWORD PTR [rsp+0xa0],ymm10,0x1
      0x0000000002a711d9: vextractf128 XMMWORD PTR [rsp+0xb0],ymm11,0x1
      0x0000000002a711e4: vextractf128 XMMWORD PTR [rsp+0xc0],ymm12,0x1
      0x0000000002a711ef: vextractf128 XMMWORD PTR [rsp+0xd0],ymm13,0x1
      0x0000000002a711fa: vextractf128 XMMWORD PTR [rsp+0xe0],ymm14,0x1
      0x0000000002a71205: vextractf128 XMMWORD PTR [rsp+0xf0],ymm15,0x1
      0x0000000002a71210: sub    rsp,0x100
      0x0000000002a71217: vmovdqu XMMWORD PTR [rsp],xmm0
      0x0000000002a7121c: vmovdqu XMMWORD PTR [rsp+0x10],xmm1
      0x0000000002a71222: vmovdqu XMMWORD PTR [rsp+0x20],xmm2
      0x0000000002a71228: vmovdqu XMMWORD PTR [rsp+0x30],xmm3
      0x0000000002a7122e: vmovdqu XMMWORD PTR [rsp+0x40],xmm4
      0x0000000002a71234: vmovdqu XMMWORD PTR [rsp+0x50],xmm5
      0x0000000002a7123a: vmovdqu XMMWORD PTR [rsp+0x60],xmm6
      0x0000000002a71240: vmovdqu XMMWORD PTR [rsp+0x70],xmm7
      0x0000000002a71246: vmovdqu XMMWORD PTR [rsp+0x80],xmm8
      0x0000000002a7124f: vmovdqu XMMWORD PTR [rsp+0x90],xmm9
      0x0000000002a71258: vmovdqu XMMWORD PTR [rsp+0xa0],xmm10
      0x0000000002a71261: vmovdqu XMMWORD PTR [rsp+0xb0],xmm11
      0x0000000002a7126a: vmovdqu XMMWORD PTR [rsp+0xc0],xmm12
      0x0000000002a71273: vmovdqu XMMWORD PTR [rsp+0xd0],xmm13
      0x0000000002a7127c: vmovdqu XMMWORD PTR [rsp+0xe0],xmm14
      0x0000000002a71285: vmovdqu XMMWORD PTR [rsp+0xf0],xmm15
      0x0000000002a7128e: sub    rsp,0x10
      0x0000000002a71292: fstp   QWORD PTR [rsp]
      0x0000000002a71295: fstp   QWORD PTR [rsp+0x8]
      0x0000000002a71299: vmovsd xmm0,QWORD PTR [rsp]
      0x0000000002a7129e: vmovsd xmm1,QWORD PTR [rsp+0x8]
      0x0000000002a712a4: sub    rsp,0x20
      0x0000000002a712a8: test   esp,0xf
      0x0000000002a712ae: je     0x0000000002a712c6
      0x0000000002a712b4: sub    rsp,0x8
      0x0000000002a712b8: call   0x000000006bf240d0  ;   {runtime_call}
      0x0000000002a712bd: add    rsp,0x8
      0x0000000002a712c1: jmp    0x0000000002a712cb
      0x0000000002a712c6: call   0x000000006bf240d0  ;   {runtime_call}
      0x0000000002a712cb: add    rsp,0x20
      0x0000000002a712cf: vmovsd QWORD PTR [rsp],xmm0
      0x0000000002a712d4: fld    QWORD PTR [rsp]
      0x0000000002a712d7: add    rsp,0x10
      0x0000000002a712db: vmovdqu xmm0,XMMWORD PTR [rsp]
      0x0000000002a712e0: vmovdqu xmm1,XMMWORD PTR [rsp+0x10]
      0x0000000002a712e6: vmovdqu xmm2,XMMWORD PTR [rsp+0x20]
      0x0000000002a712ec: vmovdqu xmm3,XMMWORD PTR [rsp+0x30]
      0x0000000002a712f2: vmovdqu xmm4,XMMWORD PTR [rsp+0x40]
      0x0000000002a712f8: vmovdqu xmm5,XMMWORD PTR [rsp+0x50]
      0x0000000002a712fe: vmovdqu xmm6,XMMWORD PTR [rsp+0x60]
      0x0000000002a71304: vmovdqu xmm7,XMMWORD PTR [rsp+0x70]
      0x0000000002a7130a: vmovdqu xmm8,XMMWORD PTR [rsp+0x80]
      0x0000000002a71313: vmovdqu xmm9,XMMWORD PTR [rsp+0x90]
      0x0000000002a7131c: vmovdqu xmm10,XMMWORD PTR [rsp+0xa0]
      0x0000000002a71325: vmovdqu xmm11,XMMWORD PTR [rsp+0xb0]
      0x0000000002a7132e: vmovdqu xmm12,XMMWORD PTR [rsp+0xc0]
      0x0000000002a71337: vmovdqu xmm13,XMMWORD PTR [rsp+0xd0]
      0x0000000002a71340: vmovdqu xmm14,XMMWORD PTR [rsp+0xe0]
      0x0000000002a71349: vmovdqu xmm15,XMMWORD PTR [rsp+0xf0]
      0x0000000002a71352: add    rsp,0x100
      0x0000000002a71359: vinsertf128 ymm0,ymm0,XMMWORD PTR [rsp],0x1
      0x0000000002a71360: vinsertf128 ymm1,ymm1,XMMWORD PTR [rsp+0x10],0x1
      0x0000000002a71368: vinsertf128 ymm2,ymm2,XMMWORD PTR [rsp+0x20],0x1
      0x0000000002a71370: vinsertf128 ymm3,ymm3,XMMWORD PTR [rsp+0x30],0x1
      0x0000000002a71378: vinsertf128 ymm4,ymm4,XMMWORD PTR [rsp+0x40],0x1
      0x0000000002a71380: vinsertf128 ymm5,ymm5,XMMWORD PTR [rsp+0x50],0x1
      0x0000000002a71388: vinsertf128 ymm6,ymm6,XMMWORD PTR [rsp+0x60],0x1
      0x0000000002a71390: vinsertf128 ymm7,ymm7,XMMWORD PTR [rsp+0x70],0x1
      0x0000000002a71398: vinsertf128 ymm8,ymm8,XMMWORD PTR [rsp+0x80],0x1
      0x0000000002a713a3: vinsertf128 ymm9,ymm9,XMMWORD PTR [rsp+0x90],0x1
      0x0000000002a713ae: vinsertf128 ymm10,ymm10,XMMWORD PTR [rsp+0xa0],0x1
      0x0000000002a713b9: vinsertf128 ymm11,ymm11,XMMWORD PTR [rsp+0xb0],0x1
      0x0000000002a713c4: vinsertf128 ymm12,ymm12,XMMWORD PTR [rsp+0xc0],0x1
      0x0000000002a713cf: vinsertf128 ymm13,ymm13,XMMWORD PTR [rsp+0xd0],0x1
      0x0000000002a713da: vinsertf128 ymm14,ymm14,XMMWORD PTR [rsp+0xe0],0x1
      0x0000000002a713e5: vinsertf128 ymm15,ymm15,XMMWORD PTR [rsp+0xf0],0x1
      0x0000000002a713f0: add    rsp,0x100
      0x0000000002a713f7: mov    r15,QWORD PTR [rsp]
      0x0000000002a713fb: mov    r14,QWORD PTR [rsp+0x8]
      0x0000000002a71400: mov    r13,QWORD PTR [rsp+0x10]
      0x0000000002a71405: mov    r12,QWORD PTR [rsp+0x18]
      0x0000000002a7140a: mov    r11,QWORD PTR [rsp+0x20]
      0x0000000002a7140f: mov    r10,QWORD PTR [rsp+0x28]
      0x0000000002a71414: mov    r9,QWORD PTR [rsp+0x30]
      0x0000000002a71419: mov    r8,QWORD PTR [rsp+0x38]
      0x0000000002a7141e: mov    rdi,QWORD PTR [rsp+0x40]
      0x0000000002a71423: mov    rsi,QWORD PTR [rsp+0x48]
      0x0000000002a71428: mov    rbp,QWORD PTR [rsp+0x50]
      0x0000000002a7142d: mov    rbx,QWORD PTR [rsp+0x60]
      0x0000000002a71432: mov    rdx,QWORD PTR [rsp+0x68]
      0x0000000002a71437: mov    rcx,QWORD PTR [rsp+0x70]
      0x0000000002a7143c: mov    rax,QWORD PTR [rsp+0x78]
      0x0000000002a71441: add    rsp,0x80
      0x0000000002a71448: fstp   QWORD PTR [rsp]
      0x0000000002a7144b: vmovsd xmm0,QWORD PTR [rsp]  ;*invokestatic pow
                                                    ; - ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark::trickyMathOctaPow@10 (line 63)
      0x0000000002a71450: vmovsd xmm1,QWORD PTR [rip+0xfffffffffffff618]        # 0x0000000002a70a70
                                                    ;   {section_word}
      0x0000000002a71458: vmovsd QWORD PTR [rsp],xmm1
      0x0000000002a7145d: fld    QWORD PTR [rsp]
      0x0000000002a71460: vmovsd QWORD PTR [rsp],xmm0
      0x0000000002a71465: fld    QWORD PTR [rsp]
      0x0000000002a71468: movabs rax,0x6c4ba7d0     ;   {external_word}
      0x0000000002a71472: fld    QWORD PTR [rax]
      0x0000000002a71474: fucomip st,st(2)
      0x0000000002a71476: jp     0x0000000002a7148f
      0x0000000002a7147c: jne    0x0000000002a7148f
      0x0000000002a71482: fxch   st(1)
      0x0000000002a71484: ffree  st(0)
      0x0000000002a71486: fincstp 
      0x0000000002a71488: fmul   st,st(0)
      0x0000000002a7148a: jmp    0x0000000002a718e6
      0x0000000002a7148f: fldz   
      0x0000000002a71491: fucomip st,st(1)
      0x0000000002a71493: ja     0x0000000002a71516
      0x0000000002a71499: fld    st(1)
      0x0000000002a7149b: fld    st(1)
      0x0000000002a7149d: sub    rsp,0x8
      0x0000000002a714a1: fstcw  WORD PTR [rsp]
      0x0000000002a714a5: mov    eax,DWORD PTR [rsp]
      0x0000000002a714a8: or     eax,0x300
      0x0000000002a714ae: push   rax
      0x0000000002a714af: fldcw  WORD PTR [rsp]
      0x0000000002a714b2: pop    rax
      0x0000000002a714b3: fyl2x  
      0x0000000002a714b5: sub    rsp,0x8
      0x0000000002a714b9: fld    st(0)
      0x0000000002a714bb: frndint 
      0x0000000002a714bd: fsubr  st(1),st
      0x0000000002a714bf: fistp  DWORD PTR [rsp]
      0x0000000002a714c2: f2xm1  
      0x0000000002a714c4: fld1   
      0x0000000002a714c6: faddp  st(1),st
      0x0000000002a714c8: mov    eax,DWORD PTR [rsp]
      0x0000000002a714cb: mov    ecx,0xfffff800
      0x0000000002a714d0: add    eax,0x3ff
      0x0000000002a714d6: mov    edx,eax
      0x0000000002a714d8: shl    eax,0x14
      0x0000000002a714db: add    edx,0x1
      0x0000000002a714de: cmove  eax,ecx
      0x0000000002a714e1: cmp    edx,0x1
      0x0000000002a714e4: cmove  eax,ecx
      0x0000000002a714e7: test   ecx,edx
      0x0000000002a714e9: cmovne eax,ecx
      0x0000000002a714ec: mov    DWORD PTR [rsp+0x4],eax
      0x0000000002a714f0: mov    DWORD PTR [rsp],0x0
      0x0000000002a714f7: fmul   QWORD PTR [rsp]
      0x0000000002a714fa: add    rsp,0x8
      0x0000000002a714fe: fldcw  WORD PTR [rsp]
      0x0000000002a71501: add    rsp,0x8
      0x0000000002a71505: fucomi st,st(0)
      0x0000000002a71507: jp     0x0000000002a715b6
      0x0000000002a7150d: ffree  st(2)
      0x0000000002a7150f: ffree  st(1)
      0x0000000002a71511: jmp    0x0000000002a718e6
      0x0000000002a71516: fld    st(1)
      0x0000000002a71518: frndint 
      0x0000000002a7151a: fucomi st,st(2)
      0x0000000002a7151c: jne    0x0000000002a715b6
      0x0000000002a71522: sub    rsp,0x8
      0x0000000002a71526: fistp  QWORD PTR [rsp]
      0x0000000002a71529: fld    st(1)
      0x0000000002a7152b: fld    st(1)
      0x0000000002a7152d: fabs   
      0x0000000002a7152f: sub    rsp,0x8
      0x0000000002a71533: fstcw  WORD PTR [rsp]
      0x0000000002a71537: mov    eax,DWORD PTR [rsp]
      0x0000000002a7153a: or     eax,0x300
      0x0000000002a71540: push   rax
      0x0000000002a71541: fldcw  WORD PTR [rsp]
      0x0000000002a71544: pop    rax
      0x0000000002a71545: fyl2x  
      0x0000000002a71547: sub    rsp,0x8
      0x0000000002a7154b: fld    st(0)
      0x0000000002a7154d: frndint 
      0x0000000002a7154f: fsubr  st(1),st
      0x0000000002a71551: fistp  DWORD PTR [rsp]
      0x0000000002a71554: f2xm1  
      0x0000000002a71556: fld1   
      0x0000000002a71558: faddp  st(1),st
      0x0000000002a7155a: mov    eax,DWORD PTR [rsp]
      0x0000000002a7155d: mov    ecx,0xfffff800
      0x0000000002a71562: add    eax,0x3ff
      0x0000000002a71568: mov    edx,eax
      0x0000000002a7156a: shl    eax,0x14
      0x0000000002a7156d: add    edx,0x1
      0x0000000002a71570: cmove  eax,ecx
      0x0000000002a71573: cmp    edx,0x1
      0x0000000002a71576: cmove  eax,ecx
      0x0000000002a71579: test   ecx,edx
      0x0000000002a7157b: cmovne eax,ecx
      0x0000000002a7157e: mov    DWORD PTR [rsp+0x4],eax
      0x0000000002a71582: mov    DWORD PTR [rsp],0x0
      0x0000000002a71589: fmul   QWORD PTR [rsp]
      0x0000000002a7158c: add    rsp,0x8
      0x0000000002a71590: fldcw  WORD PTR [rsp]
      0x0000000002a71593: add    rsp,0x8
      0x0000000002a71597: fucomi st,st(0)
      0x0000000002a71599: pop    rax
      0x0000000002a7159a: jp     0x0000000002a715b6
      0x0000000002a715a0: ffree  st(2)
      0x0000000002a715a2: ffree  st(1)
      0x0000000002a715a4: test   eax,0x1
      0x0000000002a715a9: je     0x0000000002a718e6
      0x0000000002a715af: fchs   
      0x0000000002a715b1: jmp    0x0000000002a718e6
      0x0000000002a715b6: ffree  st(0)
      0x0000000002a715b8: fincstp 
      0x0000000002a715ba: mov    QWORD PTR [rsp-0x28],rsp
      0x0000000002a715bf: sub    rsp,0x80
      0x0000000002a715c6: mov    QWORD PTR [rsp+0x78],rax
      0x0000000002a715cb: mov    QWORD PTR [rsp+0x70],rcx
      0x0000000002a715d0: mov    QWORD PTR [rsp+0x68],rdx
      0x0000000002a715d5: mov    QWORD PTR [rsp+0x60],rbx
      0x0000000002a715da: mov    QWORD PTR [rsp+0x50],rbp
      0x0000000002a715df: mov    QWORD PTR [rsp+0x48],rsi
      0x0000000002a715e4: mov    QWORD PTR [rsp+0x40],rdi
      0x0000000002a715e9: mov    QWORD PTR [rsp+0x38],r8
      0x0000000002a715ee: mov    QWORD PTR [rsp+0x30],r9
      0x0000000002a715f3: mov    QWORD PTR [rsp+0x28],r10
      0x0000000002a715f8: mov    QWORD PTR [rsp+0x20],r11
      0x0000000002a715fd: mov    QWORD PTR [rsp+0x18],r12
      0x0000000002a71602: mov    QWORD PTR [rsp+0x10],r13
      0x0000000002a71607: mov    QWORD PTR [rsp+0x8],r14
      0x0000000002a7160c: mov    QWORD PTR [rsp],r15
      0x0000000002a71610: sub    rsp,0x100
      0x0000000002a71617: vextractf128 XMMWORD PTR [rsp],ymm0,0x1
      0x0000000002a7161e: vextractf128 XMMWORD PTR [rsp+0x10],ymm1,0x1
      0x0000000002a71626: vextractf128 XMMWORD PTR [rsp+0x20],ymm2,0x1
      0x0000000002a7162e: vextractf128 XMMWORD PTR [rsp+0x30],ymm3,0x1
      0x0000000002a71636: vextractf128 XMMWORD PTR [rsp+0x40],ymm4,0x1
      0x0000000002a7163e: vextractf128 XMMWORD PTR [rsp+0x50],ymm5,0x1
      0x0000000002a71646: vextractf128 XMMWORD PTR [rsp+0x60],ymm6,0x1
      0x0000000002a7164e: vextractf128 XMMWORD PTR [rsp+0x70],ymm7,0x1
      0x0000000002a71656: vextractf128 XMMWORD PTR [rsp+0x80],ymm8,0x1
      0x0000000002a71661: vextractf128 XMMWORD PTR [rsp+0x90],ymm9,0x1
      0x0000000002a7166c: vextractf128 XMMWORD PTR [rsp+0xa0],ymm10,0x1
      0x0000000002a71677: vextractf128 XMMWORD PTR [rsp+0xb0],ymm11,0x1
      0x0000000002a71682: vextractf128 XMMWORD PTR [rsp+0xc0],ymm12,0x1
      0x0000000002a7168d: vextractf128 XMMWORD PTR [rsp+0xd0],ymm13,0x1
      0x0000000002a71698: vextractf128 XMMWORD PTR [rsp+0xe0],ymm14,0x1
      0x0000000002a716a3: vextractf128 XMMWORD PTR [rsp+0xf0],ymm15,0x1
      0x0000000002a716ae: sub    rsp,0x100
      0x0000000002a716b5: vmovdqu XMMWORD PTR [rsp],xmm0
      0x0000000002a716ba: vmovdqu XMMWORD PTR [rsp+0x10],xmm1
      0x0000000002a716c0: vmovdqu XMMWORD PTR [rsp+0x20],xmm2
      0x0000000002a716c6: vmovdqu XMMWORD PTR [rsp+0x30],xmm3
      0x0000000002a716cc: vmovdqu XMMWORD PTR [rsp+0x40],xmm4
      0x0000000002a716d2: vmovdqu XMMWORD PTR [rsp+0x50],xmm5
      0x0000000002a716d8: vmovdqu XMMWORD PTR [rsp+0x60],xmm6
      0x0000000002a716de: vmovdqu XMMWORD PTR [rsp+0x70],xmm7
      0x0000000002a716e4: vmovdqu XMMWORD PTR [rsp+0x80],xmm8
      0x0000000002a716ed: vmovdqu XMMWORD PTR [rsp+0x90],xmm9
      0x0000000002a716f6: vmovdqu XMMWORD PTR [rsp+0xa0],xmm10
      0x0000000002a716ff: vmovdqu XMMWORD PTR [rsp+0xb0],xmm11
      0x0000000002a71708: vmovdqu XMMWORD PTR [rsp+0xc0],xmm12
      0x0000000002a71711: vmovdqu XMMWORD PTR [rsp+0xd0],xmm13
      0x0000000002a7171a: vmovdqu XMMWORD PTR [rsp+0xe0],xmm14
      0x0000000002a71723: vmovdqu XMMWORD PTR [rsp+0xf0],xmm15
      0x0000000002a7172c: sub    rsp,0x10
      0x0000000002a71730: fstp   QWORD PTR [rsp]
      0x0000000002a71733: fstp   QWORD PTR [rsp+0x8]
      0x0000000002a71737: vmovsd xmm0,QWORD PTR [rsp]
      0x0000000002a7173c: vmovsd xmm1,QWORD PTR [rsp+0x8]
      0x0000000002a71742: sub    rsp,0x20
      0x0000000002a71746: test   esp,0xf
      0x0000000002a7174c: je     0x0000000002a71764
      0x0000000002a71752: sub    rsp,0x8
      0x0000000002a71756: call   0x000000006bf240d0  ;   {runtime_call}
      0x0000000002a7175b: add    rsp,0x8
      0x0000000002a7175f: jmp    0x0000000002a71769
      0x0000000002a71764: call   0x000000006bf240d0  ;   {runtime_call}
      0x0000000002a71769: add    rsp,0x20
      0x0000000002a7176d: vmovsd QWORD PTR [rsp],xmm0
      0x0000000002a71772: fld    QWORD PTR [rsp]
      0x0000000002a71775: add    rsp,0x10
      0x0000000002a71779: vmovdqu xmm0,XMMWORD PTR [rsp]
      0x0000000002a7177e: vmovdqu xmm1,XMMWORD PTR [rsp+0x10]
      0x0000000002a71784: vmovdqu xmm2,XMMWORD PTR [rsp+0x20]
      0x0000000002a7178a: vmovdqu xmm3,XMMWORD PTR [rsp+0x30]
      0x0000000002a71790: vmovdqu xmm4,XMMWORD PTR [rsp+0x40]
      0x0000000002a71796: vmovdqu xmm5,XMMWORD PTR [rsp+0x50]
      0x0000000002a7179c: vmovdqu xmm6,XMMWORD PTR [rsp+0x60]
      0x0000000002a717a2: vmovdqu xmm7,XMMWORD PTR [rsp+0x70]
      0x0000000002a717a8: vmovdqu xmm8,XMMWORD PTR [rsp+0x80]
      0x0000000002a717b1: vmovdqu xmm9,XMMWORD PTR [rsp+0x90]
      0x0000000002a717ba: vmovdqu xmm10,XMMWORD PTR [rsp+0xa0]
      0x0000000002a717c3: vmovdqu xmm11,XMMWORD PTR [rsp+0xb0]
      0x0000000002a717cc: vmovdqu xmm12,XMMWORD PTR [rsp+0xc0]
      0x0000000002a717d5: vmovdqu xmm13,XMMWORD PTR [rsp+0xd0]
      0x0000000002a717de: vmovdqu xmm14,XMMWORD PTR [rsp+0xe0]
      0x0000000002a717e7: vmovdqu xmm15,XMMWORD PTR [rsp+0xf0]
      0x0000000002a717f0: add    rsp,0x100
      0x0000000002a717f7: vinsertf128 ymm0,ymm0,XMMWORD PTR [rsp],0x1
      0x0000000002a717fe: vinsertf128 ymm1,ymm1,XMMWORD PTR [rsp+0x10],0x1
      0x0000000002a71806: vinsertf128 ymm2,ymm2,XMMWORD PTR [rsp+0x20],0x1
      0x0000000002a7180e: vinsertf128 ymm3,ymm3,XMMWORD PTR [rsp+0x30],0x1
      0x0000000002a71816: vinsertf128 ymm4,ymm4,XMMWORD PTR [rsp+0x40],0x1
      0x0000000002a7181e: vinsertf128 ymm5,ymm5,XMMWORD PTR [rsp+0x50],0x1
      0x0000000002a71826: vinsertf128 ymm6,ymm6,XMMWORD PTR [rsp+0x60],0x1
      0x0000000002a7182e: vinsertf128 ymm7,ymm7,XMMWORD PTR [rsp+0x70],0x1
      0x0000000002a71836: vinsertf128 ymm8,ymm8,XMMWORD PTR [rsp+0x80],0x1
      0x0000000002a71841: vinsertf128 ymm9,ymm9,XMMWORD PTR [rsp+0x90],0x1
      0x0000000002a7184c: vinsertf128 ymm10,ymm10,XMMWORD PTR [rsp+0xa0],0x1
      0x0000000002a71857: vinsertf128 ymm11,ymm11,XMMWORD PTR [rsp+0xb0],0x1
      0x0000000002a71862: vinsertf128 ymm12,ymm12,XMMWORD PTR [rsp+0xc0],0x1
      0x0000000002a7186d: vinsertf128 ymm13,ymm13,XMMWORD PTR [rsp+0xd0],0x1
      0x0000000002a71878: vinsertf128 ymm14,ymm14,XMMWORD PTR [rsp+0xe0],0x1
      0x0000000002a71883: vinsertf128 ymm15,ymm15,XMMWORD PTR [rsp+0xf0],0x1
      0x0000000002a7188e: add    rsp,0x100
      0x0000000002a71895: mov    r15,QWORD PTR [rsp]
      0x0000000002a71899: mov    r14,QWORD PTR [rsp+0x8]
      0x0000000002a7189e: mov    r13,QWORD PTR [rsp+0x10]
      0x0000000002a718a3: mov    r12,QWORD PTR [rsp+0x18]
      0x0000000002a718a8: mov    r11,QWORD PTR [rsp+0x20]
      0x0000000002a718ad: mov    r10,QWORD PTR [rsp+0x28]
      0x0000000002a718b2: mov    r9,QWORD PTR [rsp+0x30]
      0x0000000002a718b7: mov    r8,QWORD PTR [rsp+0x38]
      0x0000000002a718bc: mov    rdi,QWORD PTR [rsp+0x40]
      0x0000000002a718c1: mov    rsi,QWORD PTR [rsp+0x48]
      0x0000000002a718c6: mov    rbp,QWORD PTR [rsp+0x50]
      0x0000000002a718cb: mov    rbx,QWORD PTR [rsp+0x60]
      0x0000000002a718d0: mov    rdx,QWORD PTR [rsp+0x68]
      0x0000000002a718d5: mov    rcx,QWORD PTR [rsp+0x70]
      0x0000000002a718da: mov    rax,QWORD PTR [rsp+0x78]
      0x0000000002a718df: add    rsp,0x80
      0x0000000002a718e6: fstp   QWORD PTR [rsp]
      0x0000000002a718e9: vmovsd xmm0,QWORD PTR [rsp]  ;*invokestatic pow
                                                    ; - ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark::trickyMathOctaPow@16 (line 63)

    However, there is an interesting feature in the implementation of intrinsics_dpow , namely the processing of the “special case”. Below is a snippet of library_call.cpp OpenJDK 8 sources :

    //------------------------------inline_pow-------------------------------------
    // Inline power instructions, if possible.
    bool LibraryCallKit::inline_pow() {
      // Pseudocode for pow
      // if (y == 2) {
      //   return x * x;
      // } else {
      //   if (x <= 0.0) {
      //     long longy = (long)y;
      //     if ((double)longy == y) { // if y is long
      //       if (y + 1 == y) longy = 0; // huge number: even
      //       result = ((1&longy) == 0)?-DPow(abs(x), y):DPow(abs(x), y);
      //     } else {
      //       result = NaN;
      //     }
      //   } else {
      //     result = DPow(x,y);
      //   }
      //   if (result != result)?  {
      //     result = uncommon_trap() or runtime_call();
      //   }
      //   return result;
      // }
    /* code omitted */
    }

    HotSpot developers took care of one particular case - squaring the number. Due to this, the code substituted by the JIT compiler will execute only x * x. We find this check in the disassembled code using the first call as an example Math.pow(a, 2):

      0x0000000002a70b14: vmovsd xmm1,QWORD PTR [rip+0xffffffffffffff44] ; в xmm1 поместили константу 2.0 
      0x0000000002a70b1c: vmovsd QWORD PTR [rsp],xmm1
      0x0000000002a70b21: fld    QWORD PTR [rsp] ; поместили значение 2.0 на FPU register stack
      0x0000000002a70b24: vmovsd QWORD PTR [rsp],xmm0
      0x0000000002a70b29: fld    QWORD PTR [rsp] ; поместили значение a на FPU register stack
      0x0000000002a70b2c: movabs rax,0x6c4ba7d0
      0x0000000002a70b36: fld    QWORD PTR [rax] ; поместили значение 2.0 на FPU register stack 
      0x0000000002a70b38: fucomip st,st(2) ; сравнили 2.0 и 2.0
      0x0000000002a70b3a: jp     0x0000000002a70b53
      0x0000000002a70b40: jne    0x0000000002a70b53
      0x0000000002a70b46: fxch   st(1) ; вытолкнули на вершину FPU стека значение a
      0x0000000002a70b48: ffree  st(0)
      0x0000000002a70b4a: fincstp 
      0x0000000002a70b4c: fmul   st,st(0) ; перемножили a на a
      0x0000000002a70b4e: jmp    0x0000000002a70faa
      ; code omitted
      0x0000000002a70faa: fstp   QWORD PTR [rsp]
      0x0000000002a70fad: vmovsd xmm0,QWORD PTR [rsp]  ; в xmm0 поместили значение a * a
      ; code omitted

    Benchmarks


    Benchmark Code:

    @Fork(value = 3, warmups = 0)
    @Warmup(iterations = 5, time = 1_000, timeUnit = TimeUnit.MILLISECONDS)
    @Measurement(iterations = 10, time = 1_000, timeUnit = TimeUnit.MILLISECONDS)
    @OutputTimeUnit(value = TimeUnit.NANOSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    @State(Scope.Benchmark)
    public class MathBenchmark {
        public double a;
        @Setup
        public void setup() {
            a = 1234567.890;
        }
        @Benchmark
        public void mathOctaPowBenchmark(Blackhole bh) {
            bh.consume(mathOctaPow(a));
        }
        @Benchmark
        public void plainOctaPowBenchmark(Blackhole bh) {
            bh.consume(plainOctaPow(a));
        }
        @Benchmark
        public void trickyMathOctaPowBenchmark(Blackhole bh) {
            bh.consume(trickyMathOctaPow(a));
        }
        @Benchmark
        public void trickyPlainOctaPowBenchmark(Blackhole bh) {
            bh.consume(trickyPlainOctaPow(a));
        }
        public double mathOctaPow(double a) {
            return Math.pow(a, 8);
        }
        public double plainOctaPow(double a) {
            return a * a * a * a * a * a * a * a;
        }
        public double trickyMathOctaPow(double a) {
            return Math.pow(Math.pow(Math.pow(a, 2), 2), 2);
        }
        public double trickyPlainOctaPow(double a) {
            a *= a; a *= a; return a * a;
        }
    }

    Results:

    Benchmark                                  Mode  Cnt   Score   Error  Units
    MathBenchmark.mathOctaPowBenchmark         avgt   30  76,041 ± 0,428  ns/op
    MathBenchmark.plainOctaPowBenchmark        avgt   30   4,174 ± 0,027  ns/op
    MathBenchmark.trickyMathOctaPowBenchmark   avgt   30   3,010 ± 0,014  ns/op
    MathBenchmark.trickyPlainOctaPowBenchmark  avgt   30   3,011 ± 0,015  ns/op

    Entire benchmark results
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: 
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.mathOctaPowBenchmark
    # Run progress: 0,00% complete, ETA 00:03:00
    # Fork: 1 of 3
    # Warmup Iteration   1: 77,026 ns/op
    # Warmup Iteration   2: 76,561 ns/op
    # Warmup Iteration   3: 77,623 ns/op
    # Warmup Iteration   4: 76,192 ns/op
    # Warmup Iteration   5: 76,012 ns/op
    Iteration   1: 75,947 ns/op
    Iteration   2: 75,739 ns/op
    Iteration   3: 75,864 ns/op
    Iteration   4: 76,179 ns/op
    Iteration   5: 75,934 ns/op
    Iteration   6: 75,783 ns/op
    Iteration   7: 75,820 ns/op
    Iteration   8: 75,898 ns/op
    Iteration   9: 75,798 ns/op
    Iteration  10: 76,053 ns/op
    # Run progress: 8,33% complete, ETA 00:02:48
    # Fork: 2 of 3
    # Warmup Iteration   1: 75,975 ns/op
    # Warmup Iteration   2: 76,008 ns/op
    # Warmup Iteration   3: 75,867 ns/op
    # Warmup Iteration   4: 76,061 ns/op
    # Warmup Iteration   5: 75,710 ns/op
    Iteration   1: 75,874 ns/op
    Iteration   2: 75,862 ns/op
    Iteration   3: 76,080 ns/op
    Iteration   4: 75,948 ns/op
    Iteration   5: 75,848 ns/op
    Iteration   6: 75,883 ns/op
    Iteration   7: 76,004 ns/op
    Iteration   8: 75,790 ns/op
    Iteration   9: 75,894 ns/op
    Iteration  10: 75,847 ns/op
    # Run progress: 16,67% complete, ETA 00:02:33
    # Fork: 3 of 3
    # Warmup Iteration   1: 75,778 ns/op
    # Warmup Iteration   2: 75,850 ns/op
    # Warmup Iteration   3: 75,878 ns/op
    # Warmup Iteration   4: 76,025 ns/op
    # Warmup Iteration   5: 76,450 ns/op
    Iteration   1: 75,791 ns/op
    Iteration   2: 75,941 ns/op
    Iteration   3: 75,652 ns/op
    Iteration   4: 75,795 ns/op
    Iteration   5: 75,906 ns/op
    Iteration   6: 78,971 ns/op
    Iteration   7: 76,055 ns/op
    Iteration   8: 75,736 ns/op
    Iteration   9: 75,816 ns/op
    Iteration  10: 77,537 ns/op
    Result "ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.mathOctaPowBenchmark":
      76,041 ±(99.9%) 0,428 ns/op [Average]
      (min, avg, max) = (75,652, 76,041, 78,971), stdev = 0,640
      CI (99.9%): [75,614, 76,469] (assumes normal distribution)
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: 
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.plainOctaPowBenchmark
    # Run progress: 25,00% complete, ETA 00:02:17
    # Fork: 1 of 3
    # Warmup Iteration   1: 4,622 ns/op
    # Warmup Iteration   2: 4,406 ns/op
    # Warmup Iteration   3: 4,169 ns/op
    # Warmup Iteration   4: 4,163 ns/op
    # Warmup Iteration   5: 4,153 ns/op
    Iteration   1: 4,141 ns/op
    Iteration   2: 4,144 ns/op
    Iteration   3: 4,141 ns/op
    Iteration   4: 4,141 ns/op
    Iteration   5: 4,149 ns/op
    Iteration   6: 4,136 ns/op
    Iteration   7: 4,143 ns/op
    Iteration   8: 4,136 ns/op
    Iteration   9: 4,140 ns/op
    Iteration  10: 4,134 ns/op
    # Run progress: 33,33% complete, ETA 00:02:02
    # Fork: 2 of 3
    # Warmup Iteration   1: 4,567 ns/op
    # Warmup Iteration   2: 4,267 ns/op
    # Warmup Iteration   3: 4,162 ns/op
    # Warmup Iteration   4: 4,155 ns/op
    # Warmup Iteration   5: 4,157 ns/op
    Iteration   1: 4,157 ns/op
    Iteration   2: 4,151 ns/op
    Iteration   3: 4,161 ns/op
    Iteration   4: 4,175 ns/op
    Iteration   5: 4,136 ns/op
    Iteration   6: 4,154 ns/op
    Iteration   7: 4,192 ns/op
    Iteration   8: 4,206 ns/op
    Iteration   9: 4,203 ns/op
    Iteration  10: 4,180 ns/op
    # Run progress: 41,67% complete, ETA 00:01:47
    # Fork: 3 of 3
    # Warmup Iteration   1: 4,569 ns/op
    # Warmup Iteration   2: 4,204 ns/op
    # Warmup Iteration   3: 4,172 ns/op
    # Warmup Iteration   4: 4,151 ns/op
    # Warmup Iteration   5: 4,159 ns/op
    Iteration   1: 4,141 ns/op
    Iteration   2: 4,175 ns/op
    Iteration   3: 4,182 ns/op
    Iteration   4: 4,205 ns/op
    Iteration   5: 4,246 ns/op
    Iteration   6: 4,186 ns/op
    Iteration   7: 4,273 ns/op
    Iteration   8: 4,240 ns/op
    Iteration   9: 4,169 ns/op
    Iteration  10: 4,270 ns/op
    Result "ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.plainOctaPowBenchmark":
      4,174 ±(99.9%) 0,027 ns/op [Average]
      (min, avg, max) = (4,134, 4,174, 4,273), stdev = 0,040
      CI (99.9%): [4,147, 4,201] (assumes normal distribution)
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: 
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.trickyMathOctaPowBenchmark
    # Run progress: 50,00% complete, ETA 00:01:31
    # Fork: 1 of 3
    # Warmup Iteration   1: 3,396 ns/op
    # Warmup Iteration   2: 3,237 ns/op
    # Warmup Iteration   3: 3,156 ns/op
    # Warmup Iteration   4: 3,020 ns/op
    # Warmup Iteration   5: 3,001 ns/op
    Iteration   1: 2,995 ns/op
    Iteration   2: 3,012 ns/op
    Iteration   3: 3,014 ns/op
    Iteration   4: 2,997 ns/op
    Iteration   5: 3,025 ns/op
    Iteration   6: 3,015 ns/op
    Iteration   7: 3,004 ns/op
    Iteration   8: 2,999 ns/op
    Iteration   9: 3,033 ns/op
    Iteration  10: 3,003 ns/op
    # Run progress: 58,33% complete, ETA 00:01:16
    # Fork: 2 of 3
    # Warmup Iteration   1: 3,409 ns/op
    # Warmup Iteration   2: 3,230 ns/op
    # Warmup Iteration   3: 3,057 ns/op
    # Warmup Iteration   4: 3,027 ns/op
    # Warmup Iteration   5: 3,010 ns/op
    Iteration   1: 3,001 ns/op
    Iteration   2: 3,001 ns/op
    Iteration   3: 3,023 ns/op
    Iteration   4: 3,097 ns/op
    Iteration   5: 3,017 ns/op
    Iteration   6: 2,997 ns/op
    Iteration   7: 3,017 ns/op
    Iteration   8: 3,011 ns/op
    Iteration   9: 2,998 ns/op
    Iteration  10: 2,991 ns/op
    # Run progress: 66,67% complete, ETA 00:01:01
    # Fork: 3 of 3
    # Warmup Iteration   1: 3,476 ns/op
    # Warmup Iteration   2: 3,188 ns/op
    # Warmup Iteration   3: 2,998 ns/op
    # Warmup Iteration   4: 2,984 ns/op
    # Warmup Iteration   5: 3,023 ns/op
    Iteration   1: 2,999 ns/op
    Iteration   2: 3,004 ns/op
    Iteration   3: 2,998 ns/op
    Iteration   4: 3,059 ns/op
    Iteration   5: 3,001 ns/op
    Iteration   6: 3,006 ns/op
    Iteration   7: 3,002 ns/op
    Iteration   8: 2,994 ns/op
    Iteration   9: 3,005 ns/op
    Iteration  10: 2,989 ns/op
    Result "ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.trickyMathOctaPowBenchmark":
      3,010 ±(99.9%) 0,014 ns/op [Average]
      (min, avg, max) = (2,989, 3,010, 3,097), stdev = 0,022
      CI (99.9%): [2,996, 3,025] (assumes normal distribution)
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C:\Program Files\Java\jre1.8.0_161\bin\java.exe
    # VM options: 
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time/op
    # Benchmark: ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.trickyPlainOctaPowBenchmark
    # Run progress: 75,00% complete, ETA 00:00:45
    # Fork: 1 of 3
    # Warmup Iteration   1: 3,353 ns/op
    # Warmup Iteration   2: 3,169 ns/op
    # Warmup Iteration   3: 2,985 ns/op
    # Warmup Iteration   4: 3,004 ns/op
    # Warmup Iteration   5: 3,018 ns/op
    Iteration   1: 2,994 ns/op
    Iteration   2: 2,986 ns/op
    Iteration   3: 2,986 ns/op
    Iteration   4: 3,041 ns/op
    Iteration   5: 3,000 ns/op
    Iteration   6: 2,993 ns/op
    Iteration   7: 2,999 ns/op
    Iteration   8: 3,001 ns/op
    Iteration   9: 3,024 ns/op
    Iteration  10: 2,995 ns/op
    # Run progress: 83,33% complete, ETA 00:00:30
    # Fork: 2 of 3
    # Warmup Iteration   1: 3,371 ns/op
    # Warmup Iteration   2: 3,190 ns/op
    # Warmup Iteration   3: 3,010 ns/op
    # Warmup Iteration   4: 2,992 ns/op
    # Warmup Iteration   5: 2,995 ns/op
    Iteration   1: 2,993 ns/op
    Iteration   2: 3,007 ns/op
    Iteration   3: 2,999 ns/op
    Iteration   4: 3,006 ns/op
    Iteration   5: 2,992 ns/op
    Iteration   6: 3,009 ns/op
    Iteration   7: 3,013 ns/op
    Iteration   8: 3,012 ns/op
    Iteration   9: 3,010 ns/op
    Iteration  10: 3,000 ns/op
    # Run progress: 91,67% complete, ETA 00:00:15
    # Fork: 3 of 3
    # Warmup Iteration   1: 3,388 ns/op
    # Warmup Iteration   2: 3,239 ns/op
    # Warmup Iteration   3: 3,046 ns/op
    # Warmup Iteration   4: 3,146 ns/op
    # Warmup Iteration   5: 3,008 ns/op
    Iteration   1: 3,023 ns/op
    Iteration   2: 3,048 ns/op
    Iteration   3: 3,039 ns/op
    Iteration   4: 3,094 ns/op
    Iteration   5: 3,024 ns/op
    Iteration   6: 3,004 ns/op
    Iteration   7: 2,991 ns/op
    Iteration   8: 3,025 ns/op
    Iteration   9: 3,006 ns/op
    Iteration  10: 3,006 ns/op
    Result "ru.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.trickyPlainOctaPowBenchmark":
      3,011 ±(99.9%) 0,015 ns/op [Average]
      (min, avg, max) = (2,986, 3,011, 3,094), stdev = 0,023
      CI (99.9%): [2,996, 3,026] (assumes normal distribution)
    # Run complete. Total time: 00:03:03
    Benchmark                                  Mode  Cnt   Score   Error  Units
    MathBenchmark.mathOctaPowBenchmark         avgt   30  76,041 ± 0,428  ns/op
    MathBenchmark.plainOctaPowBenchmark        avgt   30   4,174 ± 0,027  ns/op
    MathBenchmark.trickyMathOctaPowBenchmark   avgt   30   3,010 ± 0,014  ns/op
    MathBenchmark.trickyPlainOctaPowBenchmark  avgt   30   3,011 ± 0,015  ns/op

    Our reasoning is confirmed by the benchmark results. The difference between using Math.pow(a, 2)and (a * a)was not significant.

    To demonstrate the effectiveness of using intrinsic functions, you can run the same benchmark, but with the intrinsic disabled _dpow:

    Benchmark                                  Mode  Cnt    Score   Error  Units
    MathBenchmark.mathOctaPowBenchmark         avgt   30  195,222 ± 0,850  ns/op
    MathBenchmark.plainOctaPowBenchmark        avgt   30    4,183 ± 0,030  ns/op
    MathBenchmark.trickyMathOctaPowBenchmark   avgt   30   41,158 ± 0,381  ns/op
    MathBenchmark.trickyPlainOctaPowBenchmark  avgt   30    3,081 ± 0,032  ns/op

    Entire benchmark results
    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C: \ Program Files \ Java \ jre1.8.0_161 \ bin \ java.exe
    # VM options: -XX: + UnlockDiagnosticVMOptions - XX: DisableIntrinsic = _dpow
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time / op
    # Benchmark: en.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.mathOctaPowBenchmark

    # Run progress: 0.00% complete, ETA 00:03:00
    # Fork: 1 of 3
    # Warmup Iteration 1: 194.013 ns / op
    # Warmup Iteration 2 : 197,926 ns / op
    # Warmup Iteration 3: 197,374 ns / op
    # Warmup Iteration 4: 197,242 ns / op
    # Warmup Iteration 5: 202,265 ns / op
    Iteration 1: 198,168 ns / op
    Iteration 2: 198,107 ns / op
    Iteration 3: 197,629 ns / op
    Iteration 4: 195,174 ns / op
    Iteration 5: 194,771 ns / op
    Iteration 6: 194,804 ns / op
    Iteration 7: 194,732 ns / op
    Iteration 8: 194,932 ns / op
    Iteration 9: 194,964 ns / op
    Iteration 10: 194,774 ns / op

    # Run progress: 8.33% complete, ETA 00 : 02: 48
    # Fork: 2 of 3
    # Warmup Iteration 1: 200,032 ns / op
    # Warmup Iteration 2: 200,323 ns / op
    # Warmup Iteration 3: 195,602 ns / op
    # Warmup Iteration 4: 194,705 ns / op
    # Warmup Iteration 5 : 194,277 ns / op
    Iteration 1: 194,657 ns / op
    Iteration 2: 195,459 ns / op
    Iteration 3: 199,108 ns / op
    Iteration 4: 195,154 ns / op
    Iteration 5: 195,208 ns / op
    Iteration 6: 194,692 ns / op
    Iteration 7: 194,406 ns / op
    Iteration 8: 194,979 ns / op
    Iteration 9: 194,950 ns / op
    Iteration 10: 194,234 ns / op

    # Run progress: 16.67% complete, ETA 00:02:33
    # Fork: 3 of 3
    # Warmup Iteration 1: 193,094 ns / op
    # Warmup Iteration 2: 192,849 ns / op
    # Warmup Iteration 3: 195,101 ns / op
    # Warmup Iteration 4: 195,456 ns / op
    # Warmup Iteration 5: 194,698 ns / op
    Iteration 1: 194,806 ns / op
    Iteration 2: 194,887 ns / op
    Iteration 3: 194,863 ns / op
    Iteration 4: 195,134 ns / op
    Iteration 5: 194,379 ns / op
    Iteration 6: 193,851 ns / op
    Iteration 7: 194,085 ns / op
    Iteration 8: 194,743 ns / op
    Iteration 9: 194,486 ns / op
    Iteration 10: 194,508 ns / op

    Result “En.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.mathOctaPowBenchmark”:
    195,222 ± (99.9%) 0,850 ns / op [Average]
    (min, avg, max) = (193,851, 195,222, 199,108), stdev = 1,272
    CI ( 99.9%): [194,372, 196,071] (assumes normal distribution)

    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C: \ Program Files \ Java \ jre1.8.0_161 \ bin \ java.exe
    # VM options: -XX: + UnlockDiagnosticVMOptions -XX: DisableIntrinsic = _dpow
    #Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time / op
    # Benchmark: ru.gnkoshelev .jbreak2018.perf_tests.pow.MathBenchmark.plainOctaPowBenchmark

    # Run progress: 25.00% complete, ETA 00:02:17
    # Fork: 1 of 3
    # Warmup Iteration 1: 4,569 ns / op
    # Warmup Iteration 2: 4,238 ns / op
    # Warmup Iteration 3: 4,167 ns / op
    # Warmup Iteration 4: 4,211 ns / op
    # Warmup Iteration 5: 4,267 ns / op
    Iteration 1: 4,185 ns / op
    Iteration 2: 4,280 ns / op
    Iteration 3: 4,186 ns / op
    Iteration 4 : 4,202 ns / op
    Iteration 5: 4,193 ns / op
    Iteration 6: 4,360 ns / op
    Iteration 7: 4,191 ns / op
    Iteration 8: 4,181 ns / op
    Iteration 9: 4,176 ns / op
    Iteration 10: 4,170 ns / op

    # Run progress: 33.33% complete, ETA 00:02:02
    # Fork: 2 of 3
    # Warmup Iteration 1: 4,573 ns / op
    # Warmup Iteration 2: 4,218 ns / op
    # Warmup Iteration 3: 4,176 ns / op
    # Warmup Iteration 4: 4,155 ns / op
    # Warmup Iteration 5: 4,279 ns / op
    Iteration 1: 4,251 ns / op
    Iteration 2: 4,207 ns / op
    Iteration 3: 4,175 ns / op
    Iteration 4: 4,174 ns / op
    Iteration 5: 4,182 ns / op
    Iteration 6: 4,196 ns / op
    Iteration 7: 4,169 ns / op
    Iteration 8: 4,164 ns / op
    Iteration 9: 4.175 ns / op
    Iteration 10: 4.157 ns / op

    # Run progress: 41.67% complete, ETA 00:01:47
    # Fork: 3 of 3
    # Warmup Iteration 1: 4,561 ns / op
    # Warmup Iteration 2: 4,193 ns / op
    # Warmup Iteration 3: 4,139 ns / op
    # Warmup Iteration 4: 4,152 ns / op
    # Warmup Iteration 5: 4,154 ns / op
    Iteration 1: 4,141 ns / op
    Iteration 2: 4,144 ns / op
    Iteration 3: 4,157 ns / op
    Iteration 4: 4,141 ns / op
    Iteration 5: 4,162 ns / op
    Iteration 6: 4,135 ns / op
    Iteration 7: 4,166 ns / op
    Iteration 8: 4,156 ns / op
    Iteration 9: 4,160 ns / op
    Iteration 10: 4,144 ns / op

    Result "en.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.plainOctaPowBenchmark":
    4,183 ± (99.9%) 0,030 ns / op [Average]
    (min, avg, max) = (4,135, 4,183, 4,360), stdev = 0,045
    CI (99.9%): [4,152, 4,213] (assumes normal distribution)

    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C: \ Program Files \ Java \ jre1.8.0_161 \ bin \ java.exe
    # VM options: -XX: + UnlockDiagnosticVMOptions -XX: DisableIntrinsic = _dpow
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread , will synchronize iterations
    # Benchmark mode: Average time, time / op
    # Benchmark: en.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.trickyMathOctaPowBenchmark

    # Run progress: 50.00% complete, ETA 00:01:31
    # Fork: 1 of 3
    # Warmup Iteration 1: 41,544 ns / op
    # Warmup Iteration 2: 41,150 ns / op
    # Warmup Iteration 3: 41,312 ns / op
    # Warmup Iteration 4: 41,196 ns / op
    # Warmup Iteration 5: 41,002 ns / op
    Iteration 1: 43,681 ns / op
    Iteration 2: 41,183 ns / op
    Iteration 3: 41,598 ns / op
    Iteration 4: 41,703 ns / op
    Iteration 5: 41,365 ns / op
    Iteration 6: 41,210 ns / op
    Iteration 7: 41,380 ns / op
    Iteration 8: 41,413 ns / op
    Iteration 9: 41,481 ns / op
    Iteration 10: 41,763 ns / op

    # Run progress: 58.33% complete, ETA 00:01:16
    # Fork: 2 of 3
    # Warmup Iteration 1: 41,665 ns / op
    # Warmup Iteration 2: 40,970 ns / op
    # Warmup Iteration 3: 40,872 ns / op
    # Warmup Iteration 4: 40,926 ns / op
    # Warmup Iteration 5: 40,794 ns / op
    Iteration 1: 41,103 ns / op
    Iteration 2: 40,991 ns / op
    Iteration 3: 40,859 ns / op
    Iteration 4: 41,046 ns / op
    Iteration 5: 41,241 ns / op
    Iteration 6: 40.711 ns / op
    Iteration 7: 40.571 ns / op
    Iteration 8: 40.928 ns / op
    Iteration 9: 40.662 ns / op
    Iteration 10: 40.911 ns / op

    # Run progress: 66.67% complete, ETA 00: 01:01
    # Fork: 3 of 3
    # Warmup Iteration 1: 42,068 ns / op
    # Warmup Iteration 2: 41.017 ns / op
    # Warmup Iteration 3: 41.260 ns / op
    # Warmup Iteration 4: 41.147 ns / op
    # Warmup Iteration 5: 40.777 ns / op
    Iteration 1: 41.060 ns / op
    Iteration 2: 40.881 ns / op
    Iteration 3: 41.014 ns / op
    Iteration 4: 40.826 ns / op
    Iteration 5: 40.977 ns / op
    Iteration 6: 40.837 ns / op
    Iteration 7: 41.023 ns / op
    Iteration 8: 40.749 ns / op
    Iteration 9: 40.959 ns / op
    Iteration 10: 40,611 ns / op

    Result "en.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.trickyMathOctaPowBenchmark":
    41,158 ± (99.9%) 0,381 ns / op [Average]
    (min, avg, max) = (40,571, 41,158, 43,681 ), stdev = 0.570
    CI (99.9%): [40,777, 41,538] (assumes normal distribution)

    # JMH version: 1.20
    # VM version: JDK 1.8.0_161, VM 25.161-b12
    # VM invoker: C: \ Program Files \ Java \ jre1.8.0_161 \ bin \ java.exe
    # VM options: -XX: + UnlockDiagnosticVMOptions -XX: DisableIntrinsic = _dpow
    # Warmup: 5 iterations, 1000 ms each
    # Measurement: 10 iterations, 1000 ms each
    # Timeout: 10 min per iteration
    # Threads: 1 thread, will synchronize iterations
    # Benchmark mode: Average time, time / op
    # Benchmark: en.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.trickyPlainOctaPowBenchmark

    # Run progress: 75.00% complete, ETA 00:00:45
    # Fork: 1 of 3
    # Warmup Iteration 1: 3.384 ns / op
    # Warmup Iteration 2: 3.214 ns / op
    # Warmup Iteration 3: 3.063 ns / op
    # Warmup Iteration 4: 3.051 ns / op
    # Warmup Iteration 5: 3.073 ns / op
    Iteration 1: 3.090 ns / op
    Iteration 2: 3.045 ns / op
    Iteration 3: 3,054 ns / op
    Iteration 4: 3,074 ns / op
    Iteration 5: 3,058 ns / op
    Iteration 6: 3,059 ns / op
    Iteration 7: 3,075 ns / op
    Iteration 8: 3,092 ns / op
    Iteration 9: 3,155 ns / op
    Iteration 10: 3,089 ns / op

    # Run progress: 83,33% complete, ETA 00:00:30
    # Fork: 2 of 3
    # Warmup Iteration 1: 3,442 ns / op
    # Warmup Iteration 2: 3,315 ns / op
    # Warmup Iteration 3 : 3.027 ns / op
    # Warmup Iteration 4: 3.031 ns / op
    # Warmup Iteration 5: 3,051 ns / op
    Iteration 1: 3,032 ns / op
    Iteration 2: 3,051 ns / op
    Iteration 3: 3,050 ns / op
    Iteration 4: 3,076 ns / op
    Iteration 5: 3,067 ns / op
    Iteration 6: 3,018 ns / op
    Iteration 7: 3.034 ns / op
    Iteration 8: 3.017 ns / op
    Iteration 9: 3.041 ns / op
    Iteration 10: 3.023 ns / op

    # Run progress: 91.67% complete, ETA 00:00:15
    # Fork: 3 of 3
    # Warmup Iteration 1: 3,415 ns / op
    # Warmup Iteration 2: 3,276 ns / op
    # Warmup Iteration 3: 3,344 ns / op
    # Warmup Iteration 4: 3,226 ns / op
    # Warmup Iteration 5: 3,072 ns / op
    Iteration 1: 3,150 ns / op
    Iteration 2: 3,132 ns / op
    Iteration 3: 3,172 ns / op
    Iteration 4: 3,101 ns / op
    Iteration 5: 3,053 ns / op
    Iteration 6: 3,061 ns / op
    Iteration 7: 3,106 ns / op
    Iteration 8: 3,150 ns / op
    Iteration 9: 3,097 ns / op
    Iteration 10: 3,204 ns / op

    Result "en.gnkoshelev.jbreak2018.perf_tests.pow.MathBenchmark.trickyPlainOctaPowBenchmark":
    3,081 ± (99.9%) 0,032 ns / op [Average]
    (min, avg, max) = (3,017, 3,081, 3,204 ), stdev = 0.048
    CI (99.9%): [3,049, 3,113] (assumes normal distribution)

    # Run complete. Total time: 00:03:03

    Benchmark Mode Cnt Score Error Units
    MathBenchmark.mathOctaPowBenchmark avgt 30 195.222 ± 0.850 ns / op
    MathBenchmark.plainOctaPowBenchmark avgt 30 4.183 ± 0.030 ns / op
    MathBenchmark.trickyMathOctaPowBenchmark avgt 30 41.158 ± 0.381 ns / op
    MathBenchmark.trickyPlainOctaPowBenchmark avgt 30 3.081 ± 0.032 ns / op

    We see the result of an honest call to the native method StrictMath.pow(). An interesting fact is that the challenge of several StrictMath.pow(x, 2)is still better StrictMath.pow(x, 8). This indicates that in the implementation of the native method there is also a special case with squaring.

    Conclusion


    The story with the implementation of the intrinsic function_dpow generally deserves a separate chapter: judging by the changes in the OpenJDK repository, the intrinsic undergoes constant changes in different releases, and developers constantly forget about a special case. Andrey apangin Pangin talked about this at the Joker 2016 conference - Myths and facts about slow Java .

    Correct answer


    Variants 3 and 4 are equally fast due to a special case in the implementation of an intrinsic function , which essentially reduces to x * x.

    Option 2 loses in speed due to more operations.

    Option 1 is significantly inferior in speed, because Despite the use of intrinsics , the complex logic of raising a number to a power of a type is doublecalled.

    Statistics


    Two conference participants gave the correct answer. Another 5 answers were partially correct. Let me remind you, 32 options were commissioned.

    PS


    All code on GitHub: jbreak2018-pow-perf-tests .

    Also popular now: