The Way to Understanding V8 Bytecode

Original author: Franziska Hinkelmann
  • Transfer
  • Tutorial
V8 is Google's open source JavaScript engine. It is used by Chrome, Node.js and many other applications. This material, written by Google employee Francisco Hinkelmann, is dedicated to describing the V8 bytecode format. Bytecode is pretty easy to read if you understand some basic things.

image

V8 compilation pipeline



Ignition! Start! The Ignition interpreter, whose name can be translated as “ignition”, has been part of the V8 compilation pipeline since 2016.

When V8 compiles JavaScript code, the parser generates an abstract syntax tree. The syntax tree is a tree view of the syntax structure of the JS code. The Ignition interpreter generates bytecode from this data structure. The optimizing TurboFan compiler ultimately generates optimized machine code from the bytecode.


V8 Compilation Pipeline

If you want to know why V8 has two execution modes, take a look at my presentation with JSConfEU.

V8 Bytecode Basics


A bytecode is an abstraction of machine code . Compiling a bytecode into machine code is easier if the bytecode is designed using the same computational model that is used in the physical processor. That is why interpreters are often register or stack machines.

The Ignition interpreter is a register machine with a cumulative register .


The code on the left is convenient for people. Code on the right - for machines,

V8 bytecodes can be thought of as small building blocks that, put together, can implement any JavaScript functionality. V8 has several hundred bytecodes. There are codes for operators, like Addor TypeOf, or for loading properties - sort of LdaNamedProperty. V8 also has some fairly specific bytecodes, such as CreateObjectLiteralor SuspendGenerator. In the bytecodes.h header file, you can find a complete list of V8 bytecodes.

Each bytecode defines its input and output data as register operands. Ignition uses registersr0, r1, r2, ...and cumulative register. Almost all bytecodes use a memory register. It is similar to regular case, except that it is not explicitly indicated in bytecodes. For example, a command Add r1adds a value from a register r1to what is stored in a cumulative register. This makes bytecodes shorter and saves memory.

Many bytecode names begin with Ldaor Sta. The letter ain Ldaand Stais an abbreviation of the word a ccumulator (cumulative register).

For example, the command LdaSmi [42]loads a small integer (Small Integer, Smi) 42into the accumulative register. The command Star r0writes a value that is in the accumulation register to the register r0.

Function Byte Code Analysis


Now, after we have examined the basic concepts, let's look at the bytecode of a real function.

function incrementX(obj) {
  return 1 + obj.x;
}
incrementX({x: 42});  // Компилятор V8 ленив, поэтому, если вы не вызовете функцию, он не будет её интерпретировать

If you want to see the bytecode for the JavaScript code, you can display it by calling the D8 debugger or Node.js (starting with version 8.3) with the flag --print-bytecode. In the case of Chrome - run it from the command line with the key --js-flags="--print-bytecode". Here is the Chromium key call stuff .

$ node --print-bytecode incrementX.js
...
[generating bytecode for function: incrementX]
Parameter count 2
Frame size 8
  12 E> 0x2ddf8802cf6e @    StackCheck
  19 S> 0x2ddf8802cf6f @    LdaSmi [1]
        0x2ddf8802cf71 @    Star r0
  34 E> 0x2ddf8802cf73 @    LdaNamedProperty a0, [0], [4]
  28 E> 0x2ddf8802cf77 @    Add r0, [6]
  36 S> 0x2ddf8802cf7a @    Return
Constant pool (size = 1)
0x2ddf8802cf21: [FixedArray] in OldSpace
 - map = 0x2ddfb2d02309 
 - length: 1
           0: 0x2ddf8db91611 
Handler Table (size = 16)

We can ignore a considerable part of this data, focusing on byte codes. Here is a description of what we see here.

LdaSmi [1]

The command LdaSmi [1]loads a constant 1into the accumulation register.



Star r0

The command Star r0writes the value in the accumulative register, that is 1, in the register r0.



LdaNamedProperty a0, [0], [4]

The command LdaNamedPropertyloads the named property a0into the cumulative register. The construction airefers to the iith argument of the function incrementX(). In this example, we access the named property at the address a0, that is, the first argument incrementX(). The name is determined by a constant 0. LdaNamedPropertyuses 0to search for a name in a separate table:

- length: 1
           0: 0x2ddf8db91611 

0Displayed here on x. In the end, it turns out that this bytecode is loading obj.x.

What is the operand with a digit used for 4? This index is the so-called feedback vector (feedback vector) function increment(x). The feedback vector contains runtime information that is used to optimize performance.

Now the contents of the registers are as follows.


Add r0, [6]

The last instruction adds content r0to the accumulative register, which results in a final value 43. A number 6 — is another index of the feedback vector.



Return

The command Returnreturns the contents of the accumulation register. This is the completion of the function incrementX(). What caused incrementX()it starts working with a number 43in the accumulative register and can continue to perform certain actions with this value.

Please note that the bytecode that this material is dedicated to is used in V8 version 6.2, in Chrome 62 and in the not yet released Node 9. We, at Google, are constantly working on V8 in the direction of improving performance and reducing memory consumption. In other versions of V8, there may be some differences in the bytecode from what was described here.

Summary


At first glance, the V8 bytecode may seem rather cryptic, especially when it is displayed with a ton of additional information. However, as soon as you find out that Ignition is a register machine with an accumulative register, you can understand the purpose of most bytecodes.

Dear readers! Are you planning to analyze the bytecode of your JS programs?

Also popular now: