Garbage collection basics

Garbage collection algorithms, like JVMs, have evolved and become more and more complex to understand. Knowledge of how the Garbage Collector (GC) works is necessary for designing and tuning Java applications and appservers. Following is a broad, somewhat simplified, overview of the Mark - Sweep - Compact (MSC) garbage collection technique implemented by IBM JVMs. For an in-depth study of additional, state-of-the-art heap management and garbage collection techniques, refer to the articles mentioned in Additional JVM and garbage collection related resources.

The Garbage Collector allocates areas of storage inside the Java heap, where objects, arrays, and classes are stored. An allocated object is considered live when there exists at least one reference to it, that means, it is used by someone, commonly another object. Thus the object is also considered reachable. When this object is no longer used by anyone, all references should have been removed, it is now considered garbage, and its allocated storage area should be reclaimed for reuse. This task is performed by the Garbage Collector.

When the JVM is unable to allocate an object from the current Java heap because of lack of free, contiguous space, a memory allocation fault occurs (allocation failure) and the Garbage Collector is invoked. (The GC can also be invoked by a specific function call: System.gc(). However, when System.gc() is invoked, the JVM can simply 'take it under advisement' and choose to defer the GC operation until later if the JVM has more pressing needs to attend to.) The first task of the GC is to make sure to acquire all locks required for garbage collection, and then stops all the other threads. Because of this garbage collection is also referred to as stop-the-world (STW) collection. Garbage collection will then take place in three phases: mark, sweep, and optionally compact.

 

Mark phase

In the mark phase, all reachable objects that are referenced either directly by the JVM, for example through threads stacks, or in turn by other objects, will be identified. Everything else that is not marked is considered garbage.

 

Sweep phase

All allocated objects that are not marked are swept away, that is, the space used by them is reclaimed.

 

Compaction phase

When the garbage has been removed from the heap, the GC can consider compacting the heap, which is typically riddled with holes caused by the freed objects by now. When there is no chunk of memory available big enough to satisfy an allocation request after garbage collection, the heap has to be compacted. Because heap compaction means moving objects around and updating all references to them, it is extremely costly in terms of time, and the GC tries to avoid it if possible. Modern JVM implementations try to avoid heap compaction by focusing on optimizing object placement in the heap.

Note Do not confuse the Java heap with the native (or system) heap! The native heap is never garbage collected; it is used by the JVM process and stores all the objects that the JVM itself needs during its entire lifetime. The native heap is typically much smaller than the Java heap.

 

Heap expansion and shrinkage

Heap expansion will occur after garbage collection if the ratio of free to total heap size falls below the value specified by the -Xminf parameter. The default is 0.3 (or 30%).

Heap shrinkage will occur after garbage collection if the ratio of free to total heap size exceeds the value specified by the -Xmaxf parameter. The default is 0.6 (or 60%). The amount of expansion is governed by the minimum expansion size, set by the -Xmine parameter, and the maximum expansion size, defined by -Xmaxe. The defaults for -Xmine are 1MB, for -Xmaxe 0, which is equal to unlimited. These parameters do not have any effect on a fixed-size heap, where the -Xms and -Xmx values are equal.

 

Parallel versus concurrent operation

Recent releases of JVMs implement multiple helper threads that run in parallel on a multi-processor machine during the mark and sweep phases. These threads are asleep during normal operation, and only during garbage collection the work is divided between the main GC thread and his helper threads, using all processors simultaneously. Parallel mark mode is enabled by default since IBM JVM versions 1.3.0, while parallel sweep is enabled by default since version 1.3.1. As mentioned before, garbage collection is basically a stop-the-world operation. This is not entirely true: IBM JVM 1.3.1 and higher also know an optional concurrent mark mode, where a background thread is started by the JVM, and some of the work of the mark phase is done concurrently while all application threads are active. Thus the STW pause will be reduced when garbage collection finally occurs. Concurrent mark is disabled by default, and can be activated using the -Xgcpolicy:optavgpause parameter.

Note In some cases, concurrent mark may reduce the throughput of an application. It is recommended you compare the application performance with and without concurrent mark, using identical loads to measure the effect on application performance.

In addition, IBM JVM 1.4 knows an incremental compaction mode which parallelizes the compaction phase. For an introductory explanation of parallel and concurrent modes, refer to the developerWorks article Fine-tuning Java garbage collection performance by Sumit Chawla found at:

http://www.ibm.com/developerworks/ibm/library/i-gctroub/

or the IBM JVM Diagnostics Guides (see Additional JVM and garbage collection related resources) for an in-depth discussion of these features.

 

Java HotSpot Virtual Machine

Be sure to increase the default value for the following JVM parameters:

-XX:MaxPermSize=128M -XX:MaxNewSize=128M -XrunIBM_HeapDump -XX:+PrintGCTimeStamps

See:

  1. JavaHotSpot JVM 1.4.2 Garbage Collection
  2. IBM HeapDump for Solaris
  3. Tuning JVMs

  Prev | Home | Next

 

WebSphere is a trademark of the IBM Corporation in the United States, other countries, or both.

 

IBM is a trademark of the IBM Corporation in the United States, other countries, or both.