Common compiled languages such as C++ usually compile the code directly into machine code that the CPU understands to run. On the other hand, to achieve the “compile once, run everywhere” feature, Java divides the compilation process into two parts to execute.
How JVM executes JAVA code
Usually, the JVM contains two core modules: the executor and the memory manager, where the executor is specifically used to execute the bytecode. The most widely used virtual machine is Hotspot, whose executor includes an interpreter and a JIT compiler.
Before interpreter can start executing Java code, the first step is to compile the source code into byte code through javac
. This process includes lexical analysis, syntax analysis, semantic analysis. Next, the interpreter directly interpretes bytecode and executes line by line without compilation. In the meantime, the virtual machine collects meta-data regarding the program’s execution. The compiler (JIT) can gradually come into play based on this data. It will perform backstage compilation - compiling the bytecode into machine code. But JIT will only compile code identified as a hotspot by the JVM.
Let’s look at an example.
1
2
3
4
5
6
7
8
9
10
11
12
13
import java.lang.Math;
public class ByteCodeDemo {
public static int absDifference(int a, int b) {
int difference = a - b;
return Math.abs(difference);
}
public static void main(String[] args) {
System.out.println(absDifference(2, 1));
}
}
One can use the javap command to see its bytecode:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
>>> javac ByteCodeDemo.java
>>> javap -c ByteCodeDemo.class
Compiled from "ByteCodeDemo.java"
public class ByteCodeDemo {
public ByteCodeDemo();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static int absDifference(int, int);
Code:
0: iload_0
1: iload_1
2: isub
3: istore_2
4: iload_2
5: invokestatic #2 // Method java/lang/Math.abs:(I)I
8: ireturn
public static void main(java.lang.String[]);
Code:
0: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
3: iconst_2
4: iconst_1
5: invokestatic #4 // Method absDifference:(II)I
8: invokevirtual #5 // Method java/io/PrintStream.println:(I)V
11: return
}
Each bytecode has its op_code
with either 0 or 1 argument. Each op_code
is an unsigned byte type integer in the class file, occupying precisely one byte, which is how JVM instructions are called bytecodes.
Note that the javap command translates the op_code into literal helper characters for human readability.
We can take a look at the bytecode of the absDifference()
method. The number 0 preceding iload_0 represents the offset of this bytecode. The number 1 at the next line also illustrates the offset value of this bytecode. If we look at bytecode invokestatic
, we can notice that this bytecode is different from the previous one as it has a parameter #2 and a length of 3 bytes.
So how does the interpreter work? The interpreter is in fact a Stack Machine that executes bytecode in stack order according to the semantics of the bytecode. Take the above substraction for example. When interpreter executes a substraction, it will first push two operands into stack, in our case is iload_0
and iload_1
. Then it exeutes isub
which will pop operand 0 and operand 1 out of stack and perform the substraction. Then it executes istore_2
which pushes the substraction result into the stack.
JIT
There are two core mechanisms that JIT compilers rely on, which are:
- Request writable and executable memory areas to ensure that executable machine code can be generated during runtime.
- Profiling Guided Optimization, which allows the JIT compiler to achieve runtime performance that exceeds that of static compilers.
Specifically, the first mechanism is to request a memory block with both write and execute permissions. Then, compile the Java method and write the compiled machine code to the requested memory block. When the original Java method is called, instead of interpreting the method, JIT directly call the executable memory block.
The second mechanism involves runtime profiling. Basically JVM uses two counters to profile, the Invocation Counter and the Back Edge Counter.
- Invocation Counter: used to count the number of times a method has been called
- Back Edge Counter: used to count the number of times the loop code is executed in a method, and the instruction that jumps backward in the bytecode is called
Back Edge
.
JVM will trigger JIT compilation whenever the two counter goes above some pre-set threshold. Those over threshold code
is considered HotSpot
.