Troubleshoot performance problems
- Measure system performance and collecting performance data
- Locate a bottleneck
- Eliminate a bottleneck
Measure system performance and collecting performance data
Always start with a benchmark, a standard set of operations to run, that exercises those application functions experiencing performance problems.
Let the system warm up first to cache objects, optimize code paths, etc... System performance during the warm-up period is usually slower than after the warm-up period. The benchmark must be able to generate work that warms up the system prior to recording the measurements that are used for performance analysis. Depending on the system complexity, a warm-up period can range from a few thousand transactions to longer than 30 minutes.
For performance issues occuring with a large number of clients use the system, the benchmark must simulate multiple users.
The benchmark must be able to produce repeatable results. If the results vary more than a few percent from one run to another. If not,
- The initial state of the system might not be the same for each run
- Measurements are made during the warm-up period
- The system is running additional workloads
IBM Rational has sophisticated benchmarking tools that can generate complex interactions with the system under test and simulate thousands of users. Producing a useful benchmark requires effort and needs to be part of the development process. Do not wait until an application goes into production to determine how to measure performance.
The benchmark records throughput and response time results in a form to allow graphing and other analysis techniques. The performance data that is provided by PMI helps to monitor and tune the appserver performance.
Request metrics is another sources of performance data that is provided by WAS. Request metrics allows a request to be timed at server component boundaries, enabling a determination of the time that is spent in each major component.
Locate a bottleneck
Scenario: Poor performance occurs with only a single user.
Utilize request metrics to determine how much each component is contributing to the overall response time. Focus on the component accounting for the most time. Use Tivoli Performance Viewer to check for resource consumption, including frequency of garbage collections. We might need code profiling tools to isolate the problem to a specific method.
Scenario: Poor performance only occurs with multiple users.
Check to determine if any systems have high CPU, network or disk utilization and address those. For clustered configurations, check for uneven loading across cluster members.
Scenario: None of the systems seems to have a CPU, memory, network, or disk constraint but performance problems occur with multiple users.
Check that work is reaching the system under test. Ensure that some external device does not limit the amount of work reaching the system. Tivoli Performance Viewer helps determine the number of requests in the system.
A thread dump might reveal a bottleneck at a synchronized method or a large number of threads waiting for a resource.
Make sure that enough threads are available to process the work both in IBM HTTP Server, database, and the appservers. Conversely, too many threads can increase resource contention and reduce throughput.
Monitor garbage collections with Tivoli Performance Viewer or the verbosegc option of the Java virtual machine. Excessive garbage collection can limit throughput.
Eliminate a bottleneck
- Reduce the demand
- Increase resources
- Improve workload distribution
- Reduce synchronization
Reducing the demand for resources can be accomplished in several ways. Caching can greatly reduce the use of system resources by returning a previously cached response, thereby avoiding the work needed to construct the original response. Caching is supported at several points in the following systems:
- IBM HTTP Server
- Command
- Enterprise bean
- Operating system
Application code profiling can lead to a reduction in the CPU demand by pointing out hot spots we can optimize. IBM Rational and other companies have tools to perform code profiling. An analysis of the application might reveal areas where some work might be reduced for some types of transactions.
Change tuning parameters to increase some resources, for example, the number of file handles, while other resources might need a hardware change, for example, more or faster CPUs, or additional appservers. Key tuning parameters are described for each major Server component to facilitate solving performance problems. Also, the performance advisors can provide advice on tuning a production system under a real or simulated load.
Workload distribution can affect performance when some resources are underutilized and others are overloaded. workload management functions provide several ways to determine how the work is distributed. Workload distribution applies to both a single server and configurations with multiple servers and nodes.
Some critical sections of the application and server code require synchronization to prevent multiple threads from running this code simultaneously and leading to incorrect results. Synchronization preserves correctness, but it can also reduce throughput when several threads must wait for one thread to exit the critical section. When several threads are waiting to enter a critical section, a thread dump shows these threads waiting in the same procedure. Synchronization can often be reduced by:
- changing the code to only use synchronization when necessary
- reducing the path length of the synchronized code
- reducing the frequency of invoking the synchronized code
See also
What to do next
WAS V6 Scalability and Performance Handbook
WAS Performance Web site
All SPEC jAppServer2004 Results Published by SPEC.