Home

Using IBM Rational Performance Tester: Application Monitoring Part 2, Enabling real-time monitoring

 

+

Search Tips   |   Advanced Search

 

Real-time application monitoring enables IBM® Rational® Performance Tester to take a preventive approach to isolating bottlenecks in applications during the development and test phases of software development. This approach is advantageous because it can identify problems before the application is deployed into the production environment, which reduces the number of critical problems that could arise soon after the application is released. Alternatively, you can use real-time application monitoring to validate the performance of an application after it has been patched by the development team.

Data collection architecture

Real-time application monitoring in Rational Performance Tester is enabled by using the data collection infrastructure (DCI). The role of the DCI is to transform and forward to the client the Application Response Measurement (ARM) standard events that are reported to the IBM® Tivoli® ARM engine by an ARM-instrumented application (see Figure 1).



Transaction flow

To understand the architecture, it helps to consider the flow of a business transaction through the environment:

  1. The DCI located on the machine where the root (or edge) transaction will be executed must be started for monitoring. For example, executing an HTTP request for http://ibm.com/PlantsByWebSphere indicates that the application PlantsByWebSphere has to be instrumented and that a DCI is installed on the machine located at ibm.com.
  2. A client invokes an HTTP transaction involving an application that is ARM-instrumented. We can use either of these clients:
    • Rational Performance Tester: Behaves similarly to a browser by executing HTTP requests for each page in the test. When response time breakdown (RTB) is enabled, the Rational Performance Tester client adds the ARM_CORRELATOR header attribute to the request, which enables the DCI to monitor the transaction. This client automatically establishes a connection to the DCI on the machine where the root transaction will be executed.
    • Internet browser: Executes HTTP requests for a selected URL. You must manually establish a connection to the DCI on the machine where the root transaction will be executed.
  3. As the transaction invokes the application that is operating in the execution environment, the instrumentation code (or probes) will execute (read Part 1 for more about probes). These probes initiate ARM transactions that are monitored by the Tivoli ARM engine.
  4. The IBM® Performance Optimization Toolkit (hereafter, toolkit) agent is a Java™ technology-based application that implements the ARM plug-in interface provided by Tivoli for registering third-party applications with the Tivoli ARM engine. As ARM events are created and reported to the engine, they are also reported to the toolkit agent. The events are collected and organized into a transactional hierarchy, from the root transaction to all of the subtransactions. This hierarchy is then converted into a set of events that are modeled after the Eclipse Test and Performance Tools Platform (TPTP) trace model format. The set of events is then sent to the presentation system.
  5. The target and presentation systems communicate with each other by using the IBM® Rational® Agent Controller. This component integrates with Eclipse TPTP, which Rational Performance Tester is based on. The component also manages the toolkit agent (starts and stops monitoring).

Each ARM event reported to the DCI contains information about the start time and completion time of the ARM transaction, in addition to metadata. The timing information helps to compute metrics that help an analyst determine whether a performance problem is present. The metadata helps to indicate the context of the transaction in the entire transaction hierarchy and the type of transaction being invoked.

Execution environment

Within the execution environment, there are two agents that implement the Java profiling interface:

Part 1 of this series discusses JITI. The JVMPI agent (or Java profiling agent) used in Eclipse TPTP has been enhanced in Rational Performance Tester to include features such as security and support for Citrix and SAP integration.

When you set up a profiling configuration to monitor an application, you choose an analysis type to specify the type of data that you want to collect. Depending on the analysis type that you choose, the data collection infrastructure uses one or both agents to collect the data. The agent is selected automatically to correspond with your profiling settings. These are the two collection agents:

Each agent offers its own set of data collection features, as Table 1 shows.

Table 1. Comparison of data collection features between ARM and JVMPI agents
Table 1. Table title

FeatureARM agentJVMPI agent
Provides the ability to correlate remote calls between processes and hostsYesNo
Affects application performance when the agent runs in profiling mode Small effectLarge effect
Offers filtering mechanismsBy J2EE component type (for example servlet or JDBC), host URLBy package, class, and method
Collects memory information and object interactions (for the UML2 Object Interactions view) NoYes

With two agents that can potentially be used in parallel, the load on the execution environment can increase dramatically. As a result, the profiling interface (PI) virtualizer (the virt.dll file on Windows systems) was added. This component is the only agent that the JVM recognizes. When the PI virtualizer receives events from the JVM, it broadcasts those events to every JVMPI-based agent that it detects. In this case, those agents are JITI and the Java profiling agent.

To use the Java profiling agent, we need to add this Virtual Machine (VM) argument:

  -XrunpiAgent:server=enabled


When the PI virtualizer is present, use this VM argument:

-Xrunvirt:agent=jvmpi:C:\PROGRA~1\IBM\DCI\rpa_prod\TIVOLI~1\
app\instrument\61\appServers\server1_100\config\jiti.properties,agent=
piAgent:server=enabled


Notice that the argument specifies both agents that are to receive the broadcasted events. This configuration can be found in the same configuration files as the instrumentation, as previously discussed.

Data collection infrastructure

To collect end-to-end transaction information, the DCI must be installed on all systems involved in the path of a transaction, as shown in Figure 2 (for more on end-to-end transactions, see Part 1). This requirement is present because the ARM correlator goes across the physical machine boundary. As a result, when an ARM correlator is sent from one machine to another, the DCI on the machine receiving the ARM correlator has to execute special processing that is known as dynamic discovery.
Figure 2. Data collection infrastructure for a distributed environment

diagram of data collection infrastructure

The purpose of the dynamic discovery process is to automatically have the client workbench attach and begin monitoring the DCI located on the machine where the remote method is being invoked. This is required because, when a transaction is first executed, the client is attached only to the first machine in the transaction's path. Therefore, rather than expecting us to know all of the physical machines that would be involved in any given transaction and to manually attach to each one, this process is automated.

This process is made possible by sending information in the ARM correlator about the Agent Controller on the caller machine. Therefore, when the RMI-IIOP transaction is invoked, an ARM event is sent to the DCI at this newly discovered machine. (RMI-IIOP stands for Remote Method Invocation [RMI] Internet Inter-ORB Protocol [IIOP]). Specifically, this RMI-IIOP transaction is detected by the IBM toolkit agent. At that moment, the toolkit agent invokes a peer request to the Agent Controller at the caller machine by using the Agent Controller at the call-recipient machine. This request asks for the known client attached at the caller machine to also establish a connection to the newly discovered machine (see Figure 3). In turn, transactional information from the call-recipient machine flows to the client for analysis.

Note:
The security feature on the Agent Controller must be disabled for dynamic discovery to work. When this article was initially published (May 2007), the security feature was not yet fully functional across distributed environments. To disable it or verify that it is disabled, execute the SetConfig script located in the bin directory of the Agent Controller installation.
Figure 3. Dynamic discovery process

diagram of the Dynamic discovery process

For the most part, the focus here has been on collecting transactional information from application servers. However, there are a few database systems, such as IBM® DB2®, that include ARM instrumentation in the product. Enabling this ARM instrumentation allows deeper monitoring of end-to-end transaction monitoring. For example, rather than just being able to know that a transaction has used Java DataBase Connectivity™ (JDBC™) to invoke an SQL transaction from a Java™ application, analysts can get the transaction information from within the database for that specific query. This information greatly helps to narrow down whether a problem is caused by a Java application or the software solution's database.

  1. The first step to enable end-to-end transaction monitoring within the database is simply to following the database product's instructions to enable this ARM instrumentation.
  2. The second step is to install the DCI on the same machine where the database executes. Starting the DCI in the monitoring mode will allow database transaction information to be sent to the client.
  3. Furthermore, you can configure the environment to collect the exact SQL query that is running on a database. To enable this configuration, disable database privacy and security for the local DCI. The instructions for enabling the collection of SQL statements are as follows (do this only after you have already instrumented the application server):
    1. Shut down the application server and DCI.
    2. Navigate to this directory: <DCI_INSTALL_DIRECTORY>/rpa_prod/tivoli_comp/app/instrument/
      61/appServers/<servername>/config
    3. Open the monitoringApplication.properties file.
    4. Add the following two lines:
      tmtp.isPrivacyEnabled=false
      tmtp.isSecurityEnabled=false
    5. Start monitoring for the DCI.
    6. Start the application server.
    7. Initiate data collection by invoking transactions.

Configuring Rational Performance Tester

We can use the Response Time breakdown feature in Rational Performance Tester to see statistics on any page element that were captured while you were running a performance test or performance schedule. Response time breakdown shows how much time was spent in each part of the system under test. The Response Time Breakdown view is associated with a page element (URL) from a particular execution of a test or schedule. This shows the inside of the system under test, because the data collection mechanisms are on the systems under test, not the load drivers.

You will typically capture the response time breakdown in real time (test) environments during development, rather than in production environments. To capture response time breakdown data, enable it in a test or schedule, and then specify the amount of data to be captured.

To configure a performance test:

  1. Use the check box in the Test Element Details section of the Performance Test Editor (see Figure 4).
  2. If the top-most node in the Test Contents tree is selected (that is the performance test itself), then selecting Enable response time breakdown from Test Element Details causes application monitoring on every page and page element in the performance test.
  3. If only a specific page or page element requires application monitoring, then select it from the Test Contents tree. This action displays the selected item's configuration in Test Element Details, from which you can enable response time breakdown.

Figure 4. Response time breakdown configuration in a performance test

image of performance test editor workspace

To configure a performance schedule:

  1. Under Schedule Element Details, select the Response Time Breakdown tab and select Enable collection of response time data. This activates the test list and Options.
  2. After enabling response time breakdown data collection, set logging detail levels (see Figure 5).

Figure 5. Response time breakdown configuration in a performance schedule

image of workspace

The Response Time Breakdown page allows us to set the type of data that you see during a run, the sampling rate for that data, and whether data is collected from all users or a representative sample.

Enabling response time breakdown in a performance test will not affect the response time breakdown in any performance schedule that is referencing that test. The reverse is true, too: Enabling response time breakdown in a performance schedule will not affect the response time breakdown configuration in a performance test. The test and the schedule are separate entities.

After making the appropriate response time breakdown configuration, execute the test or schedule to initiate application monitoring during the load test. This action can be executed by using the Run menu to launch the test or schedule. Response time breakdown can be exploited only when executing a test or schedule from the GUI. If a test or schedule is configured for response time breakdown, and then executed by using the command line, response time breakdown data will not be collected.

Profiling for real-time observation

Within a development environment, you can collect response time breakdown data for analysis several ways:

To collect real-time response time breakdown data, make sure that you comply with these requirements:

Before profiling or collecting real-time response time breakdown data from an application system, establish a connection to the data collection infrastructure. First, identify the server that will process the initial transaction request. More specifically, identify the first application server that the transaction request reaches.

Note:
You do not have to explicitly create a connection to every computer involved in the distributed application that you want to monitor. Through the dynamic discovery process, the workbench automatically connects to each computer as the transaction flows to it.

To initiate real-time data collection:

  1. Select Run > Profile, or use the toolbar button to open the Launch Configuration dialog (see Figure 6).
  2. This profile action asks us to switch into the Rational Performance Tester Profile and Logging perspective. This perspective enables profiling tools and has a different layout of the views on the workbench.

Figure 6. Profile launch configuration action

image of run profile menu command

The Profile Configuration dialog allows us to create, manage, and run configurations for real-time application monitoring. Use this method for monitoring J2EE, non-J2EE, and Web service applications that do not have automated application monitoring, which is the situation with a performance test or schedule.

  1. In the Launch Configuration dialog, select J2EE Application and click New (see Figure 7). If your application is not running on a J2EE application server, but is instrumented manually for the ARM standard, select ARM Instrumented Application instead.
  2. On the Host page, select the host for the Agent Controller. If the host we need is not on the list, click Add and provide the host name and port number.
  3. Test the connection before proceeding.

Figure 7. Profile launch configuration for J2EE Application: Host page

image of workspace host tab

  1. On the Monitor page, for Analysis type, select either J2EE Performance Analysis or, if you are profiling by using the ARM-instrumented application launch configuration, ARM Performance Analysis (see Figure 8).
  2. If you want to customize the profiling settings and filters, click Edit Options.

Figure 8. Profile launch configuration for J2EE Application: Monitor page

image of workspace monitor tab

  1. On the Components page, select the types of J2EE components from which you want to collect data (Figure 9).

Figure 9. J2EE performance analysis components to monitor

image of edit profiling options dialog box, components tab

  1. On the Filters page, specify the hosts and transactions that you want to monitor. The filters indicate the types of data that you do want to collect. That is, they include, rather than exclude (see Figure 10).

Figure 10. J2EE performance analysis filters to apply

image of edit profiling options dialog box, filters tab

  1. On the Sampling page, you can limit the amount of data being collected by specifying either a fixed percentage or a fixed rate of all the data to collect (Figure 11).
    • Sample a percentage of all transactions: The profiler alternates between collecting and discarding calls. For example, with a setting of 25%, the first call is collected, the next three are not.
    • Sample a specific number of transactions each minute: The first n calls (where n is the specified value) are collected, and nothing else will be collected during that minute. Thus, we will never receive data for more than n calls in any given minute.

Figure 11. J2EE Performance Analysis sampling options

image of edit profiling options dialog box, sampling tab

We can set up filters and sampling to specify how much data is collected. A filter specifies which transactions you want to collect data from. Sampling specifies what subset percentage or number of transactions that you want to collect data from. Filters and sampling work at the root (or edge) transaction level. A root transaction is one that is not a subtransaction of any other (at least, as far as the data collection is concerned).

Thus, when using a browser to access a Web application, a root transaction is a URL request when it first hits the server. The filtering and sampling apply at this level. That is, if you enter a transaction filter, it acts on the URL; whereas, if you sample, it will collect only some of the URLs and discard others. It is all or nothing; it does not sample parts of the URL transactions.

If the Web application resides on multiple servers, some instrumented and some not, only data about instrumented servers will be included. For example, if the user accesses the application through Server A, which is not instrumented, and Server A uses a method call to contact Server B, which is instrumented, then the method call is the root transaction. Filtering will not work, because it uses URLs, not method names, to filter.

Also, if you are using performance tests or schedules to generate profiling data, then the root transaction takes place on the first application response measurement (ARM) instrumented test element to be run. If the entire test or schedule is ARM-instrumented (that is, Enable response time breakdown is selected for the top-level test or schedule element), there will only be one root transaction; therefore, filtering and sampling will be ineffective.

  1. Before you start monitoring, bring the application to the state that it was in immediately before the performance problem trigger. For example, if an action on a particular Web page is slow, navigate to that page.
  2. Click Profile. The connected agent and its host and process will be shown in the Profiling Monitor view (Figure 9). Depending on your profiling configuration, more than one agent may be shown if operating in a distributed environment.
  3. Start the monitoring agent by selecting the agent and Start Monitoring (Figure 12). If there is more than one agent for this profile configuration, start monitoring each of the agents. Any activity in the application that fits the profiling settings that you specified earlier will now be recorded. The agent state will change from <attached> to <monitoring>, and then to <monitoring...collecting> whenever data is being received from the agent.

Figure 12. Profiling monitor for J2EE performance analysis

image of menu command

  1. In the application, perform the steps required to trigger the performance problem.
  2. Stop monitoring the agent by selecting Stop Monitoring from the pop-up menu. For best results, detach from the agent so that other users can profile the system. Select the agent and Detach in its pop-up menu. Repeat this step for each agent involved in collecting the data.

Finding the causes of performance problems

Application failures can be caused by a number of coding problems, each of which investigate by using the data collection and analysis techniques and tools available. Typical application failures include stoppages, when the application unexpectedly terminates, and lockups, when the application becomes unresponsive as it, for example, enters an infinite loop or waits for an event that will never happen.

In the case of a lockup, you might not see any actual errors logged. Use interaction diagrams, statistical views, and thread analysis tools to find the problem. For example, if the problem is an endless loop, the UML sequence diagrams show you repeating sequences of calls, and statistical tables show methods that take a long time. If the problem is a deadlock, thread analysis tools will show that threads that you expect to be working are actually waiting for something that is never going to happen.

After you have collected response time breakdown data, you can analyze the results in the profiling tools to identify exactly what part of the code is causing the problem. Generally, you first narrow it down to which component is causing the problem (which application on which server). Then you can continue narrowing down to determine which package, class, and, finally, which method is causing the problem. When you know which method the problem is in, you can go directly to the source code to fix it.

We can view response time breakdown data in the Test perspective or in the Profiling and Logging perspective if a performance test or performance schedule is executed with response time breakdown. Otherwise, if you manually launched the J2EE application or ARM-instrumented application launch configuration, then the data collected is viewable only in the Profile and Logging perspective.

Tracking such problems can involve trial-and-error as you collect and analyze data in various ways. You might find your own way that works best.

Rational Performance Tester report analysis

After a performance test or performance schedule has completed execution, the default Rational Performance Tester report will be displayed to the user. There are two mechanisms that can be used to analyze response time breakdown data (Figure 13).


Figure 13. Sample of a Page Performance report

image of Page Performance report

In these two reporting mechanisms, this hierarchy provides a structure for organizing and categorizing a transaction and its subtransactions:

  1. Host: The physical machine where the application is executing the method invocation, in context.
  2. Application: The containing application where the method invocation was observed. Typically, for J2EE applications, this is an application server.
  3. Application component: A categorization label, or metadata, given to the method invocation, in context. For J2EE applications, a method invocation may be labeled as Session EJB, which is also the application component.
  4. Package: The package where the class and method invocation, in context, belong to.
  5. Class: The class where the method invocation, in context, belong to.
  6. Method: An invocation of a specific method

Analysis using the Statistics view

Selecting the Display Response Time Breakdown Statistics option from the Page Performance report displays a tabular format of a transaction and its subtransaction response time for a particular page or page element. First, a Page Element Selection Wizard (Figure 14) is presented so that you can pick a particular page element to view its transaction decomposition. All page elements are listed in descending order from the amount of response time for each element.
Figure 14. Page Element Selection Wizard

image of Page Element Selection Wizard

The top view of the Response Time Breakdown Statistics view displays an aggregation of all of the subtransactions for the selected page element (Figure 15). There are various tools available in this view that can be helpful when analyzing a performance problem.


Figure 15. Response Time Breakdown Statistics view

image of workspace

The directory (tree) layout also shows this hierarchy (Figure 16):

  1. Host
  2. Application
  3. Component
  4. Package
  5. Class
  6. Method

Each host is a tier in your enterprise environment. Within each host, there are tiers of applications. Within each application, there are tiers of components, and so on.

The tree layout (Figure 16) helps you identify which tier has the slowest response time.
Figure 16. Response Time Breakdown Statistics: Tree layout

image of workspace

Use the first icon in the toolbar in the upper-right corner to toggle between tree and simple layouts. The default layout is the simple layout.

The simple layout is a flattened version of the tree layout. Use this layout if you want to see all of the methods without seeing the relationships shown in the tree layout. The simple layout provides a quick and easy way to see the slowest or fastest methods.

Click a column heading to sort the table by that column. Drag the columns to change the order in which they are displayed. Except for the first column in the tree layout, all of the columns are moveable. The exact URL of the selected page element is displayed above the table.

Four results are displayed for each object: Base Time, Average Base Time, Cumulative Time, and Calls. All times are in seconds. The following list defines the times that you can view:

Actions

Figure 17 shows the toolbar buttons (also shown in Figure 16) that are associated with the following actions:
Figure 17. toolbar buttons

image of toolbar buttons

  1. Click the Filter icon (second from left in the toolbar) to open the Filters window. There, you can add, edit, or remove filters applied to the displayed results.
  2. Click the Columns icon (third) to open the Select Columns page. There, you can select which columns are displayed in the table. These settings are saved with the current workspace and are applied to all response time breakdown tables in the workspace.
  3. Click the Percentage icon (fourth) to toggle the display between percentages and absolute values. In the percentage format, the table shows percentages instead of absolute values. The percentage figure represents the percentage of the total of all values for that column.
  4. Click the Source icon (fifth) to jump to the source code (if available) in your workspace. You must first select a method before clicking the Source button.
  5. Click the Export icon (last) to open the New Report window. There, you can export the response time breakdown table to CSV, HTML, or XML formats.
  6. Use the navigation information in the upper-left corner to navigate back to previous views. The navigation is presented in the form of a breadcrumb trail, which make it easy to drill down and drill up from various reports.

In the past, there has been some confusion about the terminology and values computed for delivery time and response time. If you are familiar with performance resting, response time here does not equate to a Rational Performance Tester definition of the term. These definitions apply here, instead:

When viewing the transaction decomposition for a Graphics Interchange Format (GIF) file, you normally see a very small delivery time (often zero). because the entire data stream for the file arrives in a single packet. However, in a primary request of a page, if the server is taking a long time to respond, you typically see that the server responds with the header relatively quickly, as compared to the remainder of the response. In that case, the response may be broken down across several packets.

Note:
You will always see a delivery time and response time. You may also see transactions marked with the words DNS Lookup and Connect for requests that are of this nature.

Interactive Graphical Analysis

The interactive graphical analysis method involves a graphical drill-down process for decomposing a transaction using the existing Rational Performance Tester reports. If you choose to proceed toward response time breakdown analysis by selecting Display Page Element Responses (Figure 13), you are presented with a bar chart that shows all of the page element responses for the selected page (Figure 18).
Figure 18. Graphical drill-down reports for Response Time Breakdown

image of workspace

From this page element drill-down report, you can choose to drill down farther into individual host responses, application and application component responses, and package, class, or method invocation responses for a particular element. The right-click context menu on the bar graph displays which drill-down action is available from the current report. For example, if you are viewing the application response time breakdown for a particular transaction, the context menu will display a response time breakdown item for viewing the application components of the selected application (see Figure 19).

At each level in the drill-down process, the breadcrumb path becomes more detailed. This behavior provides an easy way to jump between reports during the analysis process. We can also choose to jump directly to the Response Time Breakdown Statistics view from any drill-down graph.
Figure 19. Component level drill-down report

image of workspace

Profiling for real-time observation analysis

The UML Sequence Diagram view presents a sequence of causal dependent events, where events are defined as method entries and exits, as well as outbound calls and return calls (see Figure 20). Specifically, it presents interactions between class instances. Those interactions have the form of method calls and call returns. The implementation of the Trace Interaction tool extends that definition to one that generalizes actors of interactions, as well as their means. In other words, the views provided by the tool are able to present not only the interactions of classes and class instances, but also interactions among threads, processes, and hosts. This extended use of the execution flow notation is motivated by the need to provide a hierarchy of data representation, which is necessary for large-scale, distributed traces.
Figure 20. UML Sequence Diagram view

image of workspace

A view-linking service is available for assisting in correlating application trace events, as shown in the UML Sequence Diagram view and the statistical and log views (see Figure 21). Correlation is computed using timestamps of two events. If an exact match is not found during correlation, the next event with the closest timestamp to the event being correlated is selected, within acceptable range.
Figure 21. View-linking service between UML Sequence Diagram and statistical or log views

image of shortcut menu

The Execution Statistics view displays statistics about the application execution time (Figure 22). It provides data such as the number of methods called and the amount of time taken to execute every method. Execution statistics are available at the package, class, method, and instance levels.


Figure 22. Execution Statistics view

image of workspace

Tip:
Sort the entire list of methods by the Base Time metric. This action forces the method that consumes the most processing time to appear at the top of the list. From there, select the item and use the right-click menu to explore various analysis options. Select an item from the Execution Statistics view, and display the right-click context menu (Figure 23).
Figure 23. Execution Statistics view right-click context menu

image of right-click context menu

In this menu, there are various actions that you can execute, notably:



The Method Invocation Details view displays:

This concludes the second of three articles about using Rational Performance Tester for application monitoring. See Resources for links to the other parts of the series and other useful information.

Resources

Learn

Get products and technologies

Discuss

With laptop running RPT, heap size is set via Java command-line. Default is 1900M. Default approx max users you can run is around 450. More users than that, we need to set up agent controllers to spread the load.