5.3.1 Generic Java Virtual Machine overview
For more info...
Figure 5-3 Generic JVM implementation architecture
![]()
Class file format
Java class files have a predefined format laid out in the Java standards. This format uses byte code to define the Java instructions, not native machine instructions. It also splits the class file into various parts identifying constants, field names and types, method names and types, and referenced classes, fields and methods. The methods contain the code that is later translated into native code for execution.
Classloaders
The Java class files are loaded by the Java virtual machine using classloaders. From the very beginning, Java allowed multiple classloaders, in a loose hierarchy, to be used to load code within a single Java virtual machine.
At the core is the Java bootstrap classloader which loads classes in a default way, usually from local disk, and is part of the JVM implementation itself and not modifiable by users. All but the bootstrap classloader are typically written in Java. They are designed to enable extensibility to the functionality of the classloading mechanism, such as in the URLClassloader, which can load classes across a network using a URL to identify the class to be loaded.
The classloaders load Java byte code into the virtual machine and pass it through a class verifier and check it against the "domain" security mechanisms to ensure the code is safe for the given user to execute. The goal of the classloaders is to put the byte code itself into the class area, and create the necessary constant pool, static variables, and other artifacts necessary for the code to work with.
As part of the IBM implementation, classes are split in two and stored in different places. The first part is the immutable part of classes (primarily, method names and byte code). It is placed in a ROMClass area (where ROM stands for Read Only Memory to signify that this never changes after the initial write).
The second half of the class is placed in a RAMClass area, and tends to be based around an instance of a class after it is created, such as static variables and several caches. There may be multiple instances a class of a given name, partitioned by classloader, so classes are segregated by classloader.
Stack-based
The Java virtual machine is a stack-based machine and works by pushing and popping information onto a stack, and then performing operations that manipulate this information and leave a result on this stack. There are no Java virtual registers except for a program counter per thread.
Because the language is multithreaded, there are multiple Java stacks with multiple stack frames that must be accommodated. And because the Java language allows calls into the base operating system using the Java Native Interface (JNI), and the JVM has no control over how this platform-specific code works, it plays safe by having a separate stack for native code and takes care of weaving the execution together properly.
Execution
Up to now we have been working with Java code and Java code that "wraps" and calls into native code. When the Java code from the class area is executed by a thread, it is passed to an execution engine that actually executes the code on the platform processor.
It is important to recognize that there may be a difference between the number of Java threads and the number of platform threads, and a difference in sizes between Java variables and platform variables. These differences are implementation-specific.
The Java language itself is usually thought of as 32-bit. This statement is not strictly accurate because the Java language does not have true pointers (it has references), and it does not have registers. The language implementation can be 32-bit, 64-bit, or even 16-bit, as long as the Java language specification is adhered to and expected results are produced.
The execution engine implementation for the Java Virtual Machine is not specified beyond the behavior of the Java byte code instructions and the results. The implementer can have an execution engine that performs interpretation to map each byte code instruction, one by one, to a stream of native instructions. Or the implementer can perform Just-In-Time (JIT) compilation to translate Java methods into native code before executing it. Or it can do a mix of the two ways, depending some statistical analysis at classloading time. Ultimately, the Java byte code is translated to platform-specific native code before execution on the real processors.
Instructions
Java has instructions that can be expected of any real machine, such as adding, subtracting, and so on. They map fairly well to native instructions. It has some instructions that can be optimized in hardware, such as some of those used for arithmetic functions with the BigDecimal class optimized on the POWER6 processor. However, it also has some instructions that are complex to implement due to the difference between a virtual machine and a real machine.
For example, the Java INVOKEVIRTUAL byte code instruction creates the stack frame entries for a method call, loads the necessary byte code for the target method, and then calls into it. This is difficult for a processor to optimize because the Java language specifies additional requirements on execution that real machines do not typically have, including exception throwing, and the target may well be byte code rather than native code.
Memory management
Java does not allow the programmer to explicitly deallocate objects or the memory they use, and the Java virtual machine is tasked in the Java language specification with performing garbage collection to tidy up the Java heap for objects that are no longer reachable, typically meaning in scope of any executing threads. How the garbage collection is implemented is left to the implementer, but usually a traditional mark-sweep algorithm is chosen to identify what can be tidied up. There are options that allow parts of the work to be performed on a background thread. For more information about this topic, refer to Garbage collection.