Diagnose common runtime problems with hprof

 

Run with hprof

To run your program with HProf Profiler Agent invoke the Java runtime with:

java -Xrunhprof[:help][:<suboption>=<value>,...] ClassName
...for example...

java -classic -Xrunhprof:heap=sites,cpu=samples,depth=10,monitor=y,thread=y,doe=y MemoryLeak

...where...

format=[a|b] Set to b if you want to generate a binary file which can be used by the HAT utility.
heap=sites Generate stack traces indicating where memory was allocated
cpu=samples Use statistical sampling to determine where the CPU spends its time
depth=10 Show stack traces 10 levels deep, at most
monitor=y Generate information on contention monitors used to synchronize the work of multiple threads
thread=y Identify threads in stack traces
doe=y Produce a dump of profiling data upon exit

Running your program with hprof will leave a file called java.hprof.txt in your working directory. You can also generate a dump at any time while your program is running by pressing Ctrl-\ in your Java console window on Unix or Ctrl-Break on Windows.

 

Anatomy of an hprof output file

An hprof output file includes sections describing various characteristics of the profiled Java program. It starts with a header that describes its format, which the header claims is subject to change without notice.

The output file's Thread and Trace sections help you figure out what threads were active when your program ran and what they did. The Thread section provides a list of all threads started and terminated during the program's life. The Trace section includes a list of numbered stack traces for some threads. These stack trace numbers are cross-referenced in other file sections.

The Heap Dump and Sites sections help you analyze memory usage. Depending on the heap suboption you choose when you start the virtual machine (VM), you can get a dump of all live objects in the Java heap (heap=dump) and/or a sorted list of allocation sites that identifies the most heavily allocated objects (heap=sites).

The CPU Samples and CPU Time sections help you understand CPU utilization; the section you get depends on your cpu suboption (cpu=samples or cpu=time). CPU Samples provides a statistical execution profile. CPU Time includes measurements of how many times a given method was called and how long each method took to execute.

The Monitor Time and Monitor Dump sections help you understand how synchronization affects your program's performance. Monitor Time shows how much time your threads experience contention for locked resources. Monitor Dump is a snapshot of monitors currently in use. As you'll see, Monitor Dump is useful for finding deadlocks.

Note: You might want to try IBM_HEAPDUMP before HProf. Heapdump is more robust, and has better performance than HProf..... Note that IBM_HEAPDUMP does not currently work with Solaris.

 

Diagnose a memory leak

Memory leaks are generally an unintentional failure to dereference discarded objects so that the garbage collector cannot reclaim the memory they use. Here is a program you can run that will generate a memory leak...

import java.util.Vector;

public class MemoryLeak {

   public static void main(String[] args) {

      int    MAX_CONSUMERS        = 10000;
      int    SLEEP_BETWEEN_ALLOCS = 5;

      ConsumerContainer objectHolder = new ConsumerContainer();

      while(objectHolder.size() < MAX_CONSUMERS) {
          System.out.println("Allocating object " + 
                             Integer.toString(objectHolder.size())
                            );
          objectHolder.add(new MemoryConsumer());
          try {
              Thread.currentThread().sleep(SLEEP_BETWEEN_ALLOCS);
          } catch (InterruptedException ie) {
              // Do nothing.
          }
      } // while.
   } // main.

} // End of MemoryLeak.

/** Named container class to hold object references. */
class ConsumerContainer extends Vector {}

/** Class that consumes a fixed amount of memory. */
class MemoryConsumer {
  public static final int MEMORY_BLOCK        = 1024;
  public byte[]           memoryHoldingArray;

  MemoryConsumer() {
      memoryHoldingArray = new byte[MEMORY_BLOCK];
  }
} // End MemoryConsumer.

Compile by running...

javac MemoryLeak.java
...and run by executing...

java -Xrunhprof:heap=sites,cpu=samples,depth=10,thread=y,doe=y MemoryLeak

Use <CTRL>-\ to generate content in the java.hprof.txt file.

When the program runs, it creates a ConsumerContainer object, then starts creating and adding MemoryConsumer objects at least 1 KB in size to that ConsumerContainer object. Keeping the objects accessible makes them unavailable for garbage collection, simulating a memory leak.

Let's look at select parts of the profile file. The Sites section's first few lines clearly show what is happening:

SITES BEGIN (ordered by live bytes) Mon Sep  3 19:16:29 2001
          percent         live     alloc'ed       stack class
rank   self  accum    bytes objs     bytes  objs trace name
    1 97.31% 97.31% 10280000 10000 10280000 10000  1995 [B
    2  0.39% 97.69%    40964     1    81880    10  1996 [L<Unknown>;
    3  0.38% 98.07%    40000 10000    40000 10000  1994 MemoryConsumer
    4  0.16% 98.23%    16388     1    16388     1  1295 [C
    5  0.16% 98.38%    16388     1    16388     1  1304 [C
    ...
There are 10,000 objects of type byte[] ([B in VM-speak) as well as 10,000 MemoryConsumer objects. The byte arrays take up 10,280,000 bytes, so apparently there is overhead just above the raw bytes that each array consumes. Since the number of objects allocated equals the number of live objects, we can conclude that none of these objects could be garbage collected. This is consistent with our expectations.

Another interesting point: the memory reported to be consumed by the MemoryConsumer objects does not include the memory consumed by the byte arrays. This shows that our profiling tool does not expose hierarchical containment relationships, but rather class-by-class statistics. This is important to understand when using hprof to pinpoint a memory leak.

Now, where did those leaky byte arrays come from? Notice that the MemoryConsumer objects and the byte arrays reference traces 1994 and 1995 in the following Trace section. Lo and behold, these traces tell us that the MemoryConsumer objects were created in the MemoryLeak class's main() method and that the byte arrays were created in the constructor (<init>() method in VM-speak). We've found our memory leak, line numbers and all:

TRACE 1994: (thread=1)
        MemoryLeak.main(MemoryLeak.java:16)
TRACE 1995: (thread=1)
        MemoryConsumer.<init>(MemoryLeak.java:36)
        MemoryLeak.main(MemoryLeak.java:16)

 

Diagnose a CPU hog

In Listing 2, a BusyWork class has each thread call a method that regulates how much the thread works by varying its sleep time in between bouts of performing CPU-intensive calculations:

Listing 2. CPUHog program

01 /** Main class for control test. */
02 public class CPUHog {
03    public static void main(String[] args) {
04 
05       Thread slouch, workingStiff, workaholic;
06       slouch       = new Slouch();
07       workingStiff = new WorkingStiff();
08       workaholic   = new Workaholic();
09 
10       slouch.start();
11       workingStiff.start();
12       workaholic.start();
13    }
14 }
15 
16 /** Low CPU utilization thread. */
17 class Slouch extends Thread {
18   public Slouch() {
19      super("Slouch");
20   }
21   public void run() {
22      BusyWork.slouch();
23   }
24 }
25 
26 /** Medium CPU utilization thread. */
27 class WorkingStiff extends Thread {
28   public WorkingStiff() {
29       super("WorkingStiff");
30   }
31   public void run() {
32      BusyWork.workNormally();
33   }
34 }
35 
36 /** High CPU utilization thread. */
37 class Workaholic extends Thread {
38   public Workaholic() {
39       super("Workaholic");
40   }
41   public void run() {
42      BusyWork.workTillYouDrop();
43   }
44 }
45 
46 /** Class with static methods to consume varying amounts
47  *  of CPU time. */
48 class BusyWork {
49 
50   public static int callCount = 0;
51 
52   public static void slouch() {
53     int SLEEP_INTERVAL = 1000;
54     computeAndSleepLoop(SLEEP_INTERVAL); 
55   }
56 
57   public static void workNormally() {
58     int SLEEP_INTERVAL = 100;
59     computeAndSleepLoop(SLEEP_INTERVAL); 
60   }
61 
62   public static void workTillYouDrop() {
63     int SLEEP_INTERVAL = 10;
64     computeAndSleepLoop(SLEEP_INTERVAL); 
65   }
66   
67   private static void computeAndSleepLoop(int sleepInterval) {
68      int MAX_CALLS = 10000;
69      while (callCount < MAX_CALLS) {
70         computeAndSleep(sleepInterval);
71      }
72   } 
73 
74   private static void computeAndSleep(int sleepInterval) {
75         int    COMPUTATIONS = 1000;
76         double result;
77         
78         // Compute.
79         callCount++;
80         for (int i = 0; i < COMPUTATIONS; i++) {
81            result = Math.atan(callCount * Math.random());         
82         }
83 
84         // Sleep.
85         try {
86            Thread.currentThread().sleep(sleepInterval);
87         } catch (InterruptedException ie) {
88            // Do nothing.
89         }
90 
91   } // End computeAndSleep.
92 } // End BusyWork.

There are three threads -- Workaholic, WorkingStiff, and Slouch -- whose work ethics vary by orders of magnitude, judging by the work they choose to do. Examine the profile's CPU Samples section shown below. The three highest ranked traces show that the CPU spent most of its time calculating random numbers and arc tangents, as we would expect:

CPU SAMPLES BEGIN (total = 935) Tue Sep  4 20:44:49 2001
rank   self  accum   count trace method
   1 39.04% 39.04%     365  2040 java/util/Random.next
   2 26.84% 65.88%     251  2042 java/util/Random.nextDouble
   3 10.91% 76.79%     102  2041 java/lang/StrictMath.atan
   4  8.13% 84.92%      76  2046 BusyWork.computeAndSleep
   5  4.28% 89.20%      40  2050 java/lang/Math.atan
   6  3.21% 92.41%      30  2045 java/lang/Math.random
   7  2.25% 94.65%      21  2051 java/lang/Math.random
   8  1.82% 96.47%      17  2044 java/util/Random.next
   9  1.50% 97.97%      14  2043 java/util/Random.nextDouble
  10  0.43% 98.40%       4  2047 BusyWork.computeAndSleep
  11  0.21% 98.61%       2  2048 java/lang/StrictMath.atan
  12  0.11% 98.72%       1  1578 java/io/BufferedReader.readLine
  13  0.11% 98.82%       1  2054 java/lang/Thread.sleep
  14  0.11% 98.93%       1  1956 java.security.PermissionCollection.setReadOnly
  15  0.11% 99.04%       1  2055 java/lang/Thread.sleep
  16  0.11% 99.14%       1  1593 java/lang/String.valueOf
  17  0.11% 99.25%       1  2052 java/lang/Math.random
  18  0.11% 99.36%       1  2049 java/util/Random.nextDouble
  19  0.11% 99.47%       1  2031 BusyWork.computeAndSleep
  20  0.11% 99.57%       1  1530 sun/io/CharToByteISO8859_1.convert
  ..
Note that calls to the BusyWork.computeAndSleep() method take up 8.13 percent, 0.43 percent, and 0.11 percent for the Workaholic, WorkingStiff, and Slouch threads, respectively. We can tell which threads these are by examining the traces referenced in the CPU Samples section's trace column above (ranks 4, 10, and 19) in the following Trace section:

TRACE 2040: (thread=8)
        java/util/Random.next(Random.java:Unknown line)
        java/util/Random.nextDouble(Random.java:373)
        java/lang/Math.random(Math.java:557)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.workTillYouDrop(CPUHog.java:64)
        Workaholic.run(CPUHog.java:42)
TRACE 2046: (thread=8)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.workTillYouDrop(CPUHog.java:64)
        Workaholic.run(CPUHog.java:42)
TRACE 2047: (thread=7)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.workNormally(CPUHog.java:59)
        WorkingStiff.run(CPUHog.java:32)
TRACE 2031: (thread=6)
        BusyWork.computeAndSleep(CPUHog.java:81)
        BusyWork.computeAndSleepLoop(CPUHog.java:70)
        BusyWork.slouch(CPUHog.java:54)
        Slouch.run(CPUHog.java:22)
Perhaps just as interesting as what the profiler shows us is what it doesn't clearly show. Note that thread 8 (the Workaholic) is responsible for the top CPU hog call to the Random.next() method. The workTillYouDrop() method ultimately calls the Random.next() method, but that does not mean that it shows up as a top CPU hog on a line by itself, as we might expect.

The profiler flattens the hierarchy of method calls the way it flattened the hierarchy of object containment in the MemoryLeak example. It does not count CPU utilization by internal methods toward the total of a method lower on the stack trace. The calls to the BusyWork.computeAndSleep() method show up because this method is called many times by each thread. BusyWork.computeAndSleep() is often part of the sampled stack traces, so it frequently gets caught in the act. As was the case in the Sites section, here we can only discern what is really happening when we look at the stack traces.

Note: hprof's cpu=time mode should show CPU usage hierarchically, but this feature appears to have a bug that leads to a segmentation fault in the 1.3 VMs I worked with.

 

Diagnose a deadlock

The third control program exhibits a classic deadlock, as shown in Listing 3:

Listing 3. Deadlock program

01 /** Main class for the control test. */
02 public class ClassicDeadlock {
03 
04    public static void main(String[] args) {
05       SynchronizedObject left;
06       SynchronizedObject right;
07 
08       LeftToRightGrabber leftToRight;
09       RightToLeftGrabber rightToLeft;
10 
11       left  = new SynchronizedObject("left");
12       right = new SynchronizedObject("right");
13 
14       leftToRight = new LeftToRightGrabber(left,right);
15       rightToLeft = new RightToLeftGrabber(left,right);
16 
17       leftToRight.start();
18       rightToLeft.start();
19    }
20 
21 } // End ClassicDeadlock.
22 
23 /** An object containing a synchronized method. */
24 class SynchronizedObject {
25   public String name;
26   public SynchronizedObject(String aName) {
27      name = aName;
28   }
29   public synchronized void synchronizedMethod() {
30       String message = name                        + 
31                        " lock acquired by thread " +
32                        Thread.currentThread().getName(); 
33       System.out.println(message);
34       System.out.flush();
35   }
36 } // End of SynchronizedObject.
37 
38 /** Base class for the Left and Right Grabber. */
39 abstract class GrabberBase extends Thread {
40    protected SynchronizedObject leftObject;
41    protected SynchronizedObject rightObject;
42 
43    public GrabberBase(SynchronizedObject left,
44                       SynchronizedObject right,
45                       String             name
46                      ) {
47      super(name);
48      leftObject  = left;
49      rightObject = right; 
50    } 
51 } // End of GrabberBase.
52 
53 /** Class to grab locks right and then left. */
54 class RightToLeftGrabber extends GrabberBase {
55       public RightToLeftGrabber( SynchronizedObject left,
56                                  SynchronizedObject right) {
57           super(left, right, "RightToLeft");
58       }
59 
60       public void run() {
61          while(true) {
62            grabRightToLeft();
63          }
64       }
65 
66       private void grabRightToLeft() {
67           synchronized (rightObject) {
68              leftObject.synchronizedMethod();
69           }
70       }
71 } // End of RightToLeftGrabber.
72 
73 /** Class to grab locks left and then right. */
74 class LeftToRightGrabber extends GrabberBase {
75       public LeftToRightGrabber( SynchronizedObject left,
76                                  SynchronizedObject right) {
77           super(left, right, "LeftToRight");
78       }
79 
80       public void run() {
81          while(true) {
82             grabLeftToRight();
83          }
84       }
85  
86       private void grabLeftToRight() {
87           synchronized (leftObject) {
88              rightObject.synchronizedMethod();
89           }
90       }
91 } // End of LeftToRightGrabber.
This program creates two equivalent objects, left and right, each having its own synchronization monitor. Two threads then try to grab locks on these objects in a particular order (by entering a block synchronized on one object and then calling a synchronized method on the other). If the LeftToRightGrabber thread can enter its synchronized block and the RightToLeftGrabber thread is scheduled to run before LeftToRightGrabber completes its work, a deadlock will occur. How long the program must run to hit a deadlock depends on the JVM's thread scheduling algorithm. For me, this typically occurred during the third switch between thread contexts.

Once the deadlock occurs, hitting Ctrl-\ generates a profile file containing a Monitor Dump section, like so:

MONITOR DUMP BEGIN
    THREAD 8, trace 2018, status: CW
    THREAD 7, trace 2027, status: MW
    THREAD 6, trace 2016, status: MW
    THREAD 5, trace 2036, status: MW
    THREAD 4, trace 2037, status: CW
    THREAD 3, trace 2038, status: CW
    THREAD 2, trace 2039, status: R
    MONITOR java/lang/ref/Reference$Lock(80ddd80) unowned
        waiting to be notified: thread 3
    MONITOR SynchronizedObject(81ece98)
        owner: thread 7, entry count: 1
        waiting to enter: thread 6
    MONITOR SynchronizedObject(81ecd40)
        owner: thread 6, entry count: 1
        waiting to enter: thread 7
    MONITOR java/lang/ref/ReferenceQueue$Lock(80e7010) unowned
        waiting to be notified: thread 4
    ...
We can see that threads 6 and 7 (LeftToRight and RightToLeft from the Thread section) are both in waiting states. Thread 7 holds one of the SynchronizedObject instance locks while thread 6 waits for that lock, and thread 6 holds the other lock while thread 7 waits for it. The traces below confirm that each thread dutifully waits to obtain the lock held by the other thread -- exactly where we expect them to wait:

TRACE 2027: (thread=7)
        RightToLeftGrabber.grabRightToLeft(ClassicDeadlock.java:68)
        RightToLeftGrabber.run(ClassicDeadlock.java:62)
TRACE 2016: (thread=6)
        LeftToRightGrabber.grabLeftToRight(ClassicDeadlock.java:88)
        LeftToRightGrabber.run(ClassicDeadlock.java:82)
hprof has shown us what stopped this program in its tracks.

 

Tackle uncharted territory

By writing simple control experiment programs and running them under your chosen profiling tool, you can gain experience with the tool and confidence to use it when you venture into uncharted territory. The programs examined in this article model three common problems. You can use them as-is with other profilers and let them serve as a starting point for your own experimentation.

 

See Also

  1. The HProf Profiler Agent
  2. prophIt: the Java profiling experts
  3. Troubleshooting by component: What is not working?