Configure the autonomic request flow manager

Configure the autonomic request flow manager

We can fine tune the autonomic request flow manager (ARFM) by changing the default settings in the administrative console. We can enable node-based ARFM by setting a custom property.

To change the settings on the autonomic request flow manager, you must have operator, configurator, or administrator administrative privileges. Operators can only view the information on the configuration tab, but can change the settings on the runtime tab. The configurator can change settings on the configuration tab, but cannot change settings on the runtime tab. Administrators have all privileges.
When security is enabled, some fields are not editable without proper security authorization.

The autonomic request flow manager contains the following components:

A controller per target cell, such as a cell to which an ARFM gateway directly sends work. This is an HAManagedItem process that runs in any node agent or deployment manager.

A gateway per used combination of protocol family, proxy process, and deployment target. A gateway runs in its proxy process. For HTTP and Session Initiation Protocol(SIP), the proxy processes are the on demand routers; for Java Message Service (JMS) and Internet Inter-ORB Protocol (IIOP), the proxy processes are the WAS application servers.

A work factor estimator per target cell, which is an HAManagedItem process that can run in any node agent, ODR, or deployment manager.
The gateways intercept and queue the incoming HTTP, SIP, JMS, and IIOP requests, while the controller provides control signals, or directions, to the gateways and the placement controller. The work profiler continually estimates the computational requirements of the various kinds of requests, based on observations of the system in operation. Working together, these components properly prioritize incoming requests.
(zos) Dynamic placement function with job scheduler is not supported on z/OS servers.

Modify the appropriate ARFM settings. In the administrative console, click Operational policies > Autonomic managers > Autonomic request flow manager.

Click OK or Apply when we have completed the changes.

Click Save to save the changes to the master repository.

Test the settings we have just defined and iterate as often as necessary to get the request flow performance we want.

Example

The following table provides specific guidance for configuring each setting.

Field Purpose Tips for setting
Aggregation period Each ARFM gateway broadcasts aggregated statistics periodically, and this parameter specifies the period. The statistics reported by the gateways support: the runtime charting in the administrative console, the operation of ARFM controllers, the operation of the application placement controller, and the operation of work profilers.
When setting the aggregation period, ensure the value is high enough to allow for the collection of a sufficient number of performance samples. Samples are collected by the gateways for each request. A few hundred samples are necessary to produce a good statistical measure.
Use an example - requests associated with a service class run in 250 milliseconds, and on average 10 requests run concurrently. The concurrency value is calculated automatically, based on the cluster size and the resources in the environment. The concurrency value can be seen on the visualization panels, under the Runtime Operations category in the console. As a result, the service class handles about 40 requests per second. Therefore, setting the aggregation period value to 15 seconds results in the collection of 600 samples for each aggregation period. The metrics provided by a 600 sample survey are useful and reliable.
Set an aggregation period value too low results in unreliable performance metrics. Performance metrics derived from fewer samples are more noisy and less reliable, then a higher sample size. Because the ARFM controller is activated when new statistics are produced, setting an aggregation period value that is too long results in less frequent recomputation of the control settings. Therefore, Intelligent Management becomes less responsive to sudden changes in traffic intensities and patterns.
Control cycle length minimum This parameter defines how often the ARFM controller is activated. Controller activation is the process of evaluating inputs and producing new control settings as a result of the input received. The activation process for an ARFM controller is initiated when new statistics are received from one of its gateways AND the elapsed time since the previous activation is greater than or equal to the control cycle minimum length, or the controller has never activated before. This setting determines the control cycle length giving it a lower bound. For example, if we have just one ODR and set the aggregation period to 30 seconds and the control cycle minimum length to 60 seconds, you might find that one activation occurs at 12:00:00.0 and the next occurs 90.1 seconds later at 12:01:30.1 because the previous statistics arrival time was 12:00:59.9. To ensure a reliable control cycle of around 60 seconds, set the control cycle minimum length to 58 or 59 seconds.
Smoothing window This setting defines how sensitive the ARFM controller reaction is to the incoming gateway statistics, by allowing a concatenation of gateway statistics. For any gateway, its ARFM controller uses a running average of the last few statistics reports from that gateway. The smoothing window controls the number of reports that are combined.
A low smoothing window setting makes the controller more sensitive and react more quickly. However, a low parameter also creates a sensitive reaction to noise, or anomalies, in the data.
The product of the smoothing window and the aggregation period should be roughly the same as the actual control cycle length, which is sometimes slightly greater than the configured control cycle minimum length.
Maximum queue length
This parameter is used to bound the length of each ARFM queue to a maximum number of requests that can be held in queue. ARFM divides all incoming traffic into flows, and has a separate queue for each flow. Flow particulars include requests that have a particular service class, are served on a particular deployment target, or go through a particular ODR.
When a request arrives and its queue is full, the request is rejected.
A lower parameter in this field increases the possibility that a request will be rejected due to short-term traffic bursts, while a higher parameter in this field can allow requests to linger longer in the queues. Queued requests consume memory. The default setting is 1000, but we can experiment with this setting to find the one that is a best match for the environment.
Maximum CPU usage
The ARFM provides overload protection, in addition to its prioritization capabilities. An ARFM will queue requests in its gateways to avoid overloading the application servers.
For this release, load is determined in terms of processor utilization on the first tier of application servers. The maximum CPU utilization parameter tells ARFM how heavily to load the servers. During severe peak conditions this utilization limit might be briefly exceeded.

Higher values give better resource utilization; lower values give more robust operation. Real load is noisy and variable. The performance management techniques in Intelligent Management react to changes in the load, but with some time delay. During that reaction time, the system might operate outside its configured region; this includes having higher processor utilization than configured. Operation with one application server at 100 percent processor utilization for multiple minutes has been observed to break some internal communication mechanisms, to the detriment of many features.
The performance management in this release of Intelligent Management does not work well if the first tier of application server machines are loaded with other work besides WebSphere requests that arrive through HTTP through the ODRs.
This setting affects application placement. If the total predicted demand suceeds the Maximum CPU utilization limit, the placement controller uniformly reduces the demand of all the dynamic clusters before calculating best placement.
Set the arfmManageCpu custom property to false to disable processor overload protection and request prioritization. The arfmManageCpu is a cell custom property that create.
We can determine CPU utilization by doing the following:

For VMWare, configure Intelligent Management to talk to the VMWare vCenter to obtain accurate statistics. When configured, Intelligent management obtains CPU utilization directly from the VMWare vCenter.

For Solaris Zones, you must run a WebSphere nodeagent on the global zone for WebSphere Virtual Enterprise to accurately report CPU utilization on non-global zones. In this case, CPU is obtained from the global zone by running the following command, where the CPU utilization is the value of the current_clock_Hz field:
/usr/bin/kstat -m cpu_info

Linux uses steal time. If the custom cell property enableStealTimeCalculation is set to true , and the maxStealTime value is set (the default is 3), run the following command:
100 - idle cpu + idle cpu x (steal/max steal)
Otherwise, the formula is:
100 - idle cpu

AIX does not use steal time, and uses the following formula:
CPU usage time/time

Admission control for CPU overload protection
The purpose of admission control for processor overload protection is to deliberately not accept dialogs based on judgments concerning how much can be accepted without overloading the compute power in the nodes being managed and compromising the response time of the accepted messages.
The Admission control for CPU overload protection value applies only to HTTP and Session Initiation Protocol(SIP); it does not apply to IIOP and JMS.
Enable it when queuing for processor overload protection is not enough; when it is important to make deliberate refusals of some offered load.

Disabled by default. To configure:

Define service policies with achievable performance goals, and set the goal type of the policies either to response time or percentile, not discretionary.

In the ARFM panel, set the CPU utilization limit to no higher than 90%. Select the third button for Rejection policy. The rejection policy determines whether the admission control for processor overload protection is enabled and, if so, how the response time threshold used for admission control is related to the response time threshold that appears in the performance goal.

At the cell level, set a cell custom property named arfmInitialMsgDlgRatio. The value is a decimal-formatted float that is the initial estimate for the ratio of each of the dialog-continuing message flows to the dialog-initiating message flow within the same (protocol family, deployment target). That is, it is the number of incoming follow-up messages per dialog. Set arfmInitialMsgDlgRatio to a value that is comparable among the collection of all dialog-continuing message flows.
This custom property is also relevant when dialog orientation for processor overload protection and differentiated service is enabled.

Save the changes.

The admission control for processor overload protection is working if, in a heavily loaded system, the processor utilization is about the same as the setting for processor overload protection.
Read about memory overload protection
Maximum percentage of the heap size to be used for each application server.

Maximum percentage of the WAS heap size to use. Set the value to less than 100.
Request rejection policy
Behavior for HTTP, SIP and SOAP requests associated with a performance goal when an overload condition is detected.

Choose among the options to determine when to reject messages to prevent the CPU from being overloaded. We can reject no messages, or specify a rejection threshold value that determines when to reject messages. The default is to reject no messages.
Discretionary work is assumed to have a response time threshold of 60 seconds.

How to enable node-based ARFM
To enable node-based ARFM set the custom property arfmQueueMode to node. To use a CPU based predictor for APC when we are using dynamic clusters in automatic mode, set the custom property APC.predictor to CPU

What to do next

Use mustGather documents to troubleshoot autonomic request flow manager and application placement issues.

Subtopics

(dist)(zos) arfmController.py script
We can use the arfmController.py script to force the autonomic request flow manager (ARFM) to forget all of its historical data.

(dist)(zos) Intelligent Management: autonomic request flow manager custom properties
We can use the following custom properties to change the behavior of the autonomic request flow manager (ARFM). Some custom properties are set on deployment targets.

(dist)(zos) Intelligent Management: autonomic request flow manager advanced custom properties
We can use these properties to configure the autonomic request flow manager (ARFM).

(dist)(zos) TCModuleStatsCache
This log file contains information about the transaction class module cache.

Rate-based autonomic request flow manager (ARFM)
The autonomic request flow manager (ARFM) uses a rate-based algorithm that results in a more consistent loading and protecting of application server resources by ARFM.

(dist)(zos) Configure emergency throttle
The on demand router (ODR) and associated autonomic managers are able to support business goals in times of intense request flows by making smart decisions about the work coming into the server. The autonomic request flow manager (ARFM) controls HTTP request prioritization in the ODR. At times, emergency conditions result when certain sensors detect such overloaded situations. These overload situations include extremely high node utilization, intermittent communication failures between ARFM controller and request scheduling gateways, and intermittent communication failures between AsyncPMI monitoring data producers and the gateways. To prevent prolonging of these conditions, if they occur, and the accompanying degradation in performance, the gateways are equipped with emergency throttle controllers that control, and safeguard request dispatch rates to backend nodes. ARFM is handled in the back end for IIOP/JMS requests.

(dist)(zos) Memory overload protection
Memory overload protection limits the rate at which the on demand router (ODR) forwards traffic in order to prevent an out of memory exception from occurring in an application server. If traffic without server affinity arrives at the ODR and the rate for all potential servers has been exceeded, the traffic is rejected. Memory overload protection does not reject traffic that has server affinity. For example, HTTP requests with session affinity or SIP in-dialog messages.

(dist)(zos) Configure memory overload protection
Follow these instructions to configure memory overload protection from the administrative console.

(dist)(zos) Intelligent Management: trace settings for autonomic request flow manager and application placement
To troubleshoot the autonomic request flow manager and application placement, we can enable diagnostic trace.

(dist)(zos) Intelligent Management: request prioritization problems
Occasionally, you might encounter flow prioritization behavior that is unexpected. We can look for some common things when request flow prioritization is not working as expected.

Related concepts

Memory overload protection

Related tasks

Manage the Intelligent Management environment
Configure work factors in multiple tier configurations
Define a service policy

Intelligent Management: administrative roles and privileges
arfmController.py script

Related information:

Intelligent Management: autonomic request flow manager custom properties
Intelligent Management: autonomic request flow manager advanced custom properties
MustGather documents

Field	Purpose	Tips for setting
Aggregation period	Each ARFM gateway broadcasts aggregated statistics periodically, and this parameter specifies the period. The statistics reported by the gateways support: the runtime charting in the administrative console, the operation of ARFM controllers, the operation of the application placement controller, and the operation of work profilers.	When setting the aggregation period, ensure the value is high enough to allow for the collection of a sufficient number of performance samples. Samples are collected by the gateways for each request. A few hundred samples are necessary to produce a good statistical measure. Use an example - requests associated with a service class run in 250 milliseconds, and on average 10 requests run concurrently. The concurrency value is calculated automatically, based on the cluster size and the resources in the environment. The concurrency value can be seen on the visualization panels, under the Runtime Operations category in the console. As a result, the service class handles about 40 requests per second. Therefore, setting the aggregation period value to 15 seconds results in the collection of 600 samples for each aggregation period. The metrics provided by a 600 sample survey are useful and reliable. Set an aggregation period value too low results in unreliable performance metrics. Performance metrics derived from fewer samples are more noisy and less reliable, then a higher sample size. Because the ARFM controller is activated when new statistics are produced, setting an aggregation period value that is too long results in less frequent recomputation of the control settings. Therefore, Intelligent Management becomes less responsive to sudden changes in traffic intensities and patterns.
Control cycle length minimum	This parameter defines how often the ARFM controller is activated. Controller activation is the process of evaluating inputs and producing new control settings as a result of the input received. The activation process for an ARFM controller is initiated when new statistics are received from one of its gateways AND the elapsed time since the previous activation is greater than or equal to the control cycle minimum length, or the controller has never activated before.	This setting determines the control cycle length giving it a lower bound. For example, if we have just one ODR and set the aggregation period to 30 seconds and the control cycle minimum length to 60 seconds, you might find that one activation occurs at 12:00:00.0 and the next occurs 90.1 seconds later at 12:01:30.1 because the previous statistics arrival time was 12:00:59.9. To ensure a reliable control cycle of around 60 seconds, set the control cycle minimum length to 58 or 59 seconds.
Smoothing window	This setting defines how sensitive the ARFM controller reaction is to the incoming gateway statistics, by allowing a concatenation of gateway statistics. For any gateway, its ARFM controller uses a running average of the last few statistics reports from that gateway. The smoothing window controls the number of reports that are combined.	A low smoothing window setting makes the controller more sensitive and react more quickly. However, a low parameter also creates a sensitive reaction to noise, or anomalies, in the data. The product of the smoothing window and the aggregation period should be roughly the same as the actual control cycle length, which is sometimes slightly greater than the configured control cycle minimum length.
Maximum queue length	This parameter is used to bound the length of each ARFM queue to a maximum number of requests that can be held in queue. ARFM divides all incoming traffic into flows, and has a separate queue for each flow. Flow particulars include requests that have a particular service class, are served on a particular deployment target, or go through a particular ODR. When a request arrives and its queue is full, the request is rejected.	A lower parameter in this field increases the possibility that a request will be rejected due to short-term traffic bursts, while a higher parameter in this field can allow requests to linger longer in the queues. Queued requests consume memory. The default setting is 1000, but we can experiment with this setting to find the one that is a best match for the environment.
Maximum CPU usage	The ARFM provides overload protection, in addition to its prioritization capabilities. An ARFM will queue requests in its gateways to avoid overloading the application servers. For this release, load is determined in terms of processor utilization on the first tier of application servers. The maximum CPU utilization parameter tells ARFM how heavily to load the servers. During severe peak conditions this utilization limit might be briefly exceeded.	Higher values give better resource utilization; lower values give more robust operation. Real load is noisy and variable. The performance management techniques in Intelligent Management react to changes in the load, but with some time delay. During that reaction time, the system might operate outside its configured region; this includes having higher processor utilization than configured. Operation with one application server at 100 percent processor utilization for multiple minutes has been observed to break some internal communication mechanisms, to the detriment of many features. The performance management in this release of Intelligent Management does not work well if the first tier of application server machines are loaded with other work besides WebSphere requests that arrive through HTTP through the ODRs. This setting affects application placement. If the total predicted demand suceeds the Maximum CPU utilization limit, the placement controller uniformly reduces the demand of all the dynamic clusters before calculating best placement. Set the arfmManageCpu custom property to false to disable processor overload protection and request prioritization. The arfmManageCpu is a cell custom property that create. We can determine CPU utilization by doing the following: For VMWare, configure Intelligent Management to talk to the VMWare vCenter to obtain accurate statistics. When configured, Intelligent management obtains CPU utilization directly from the VMWare vCenter. For Solaris Zones, you must run a WebSphere nodeagent on the global zone for WebSphere Virtual Enterprise to accurately report CPU utilization on non-global zones. In this case, CPU is obtained from the global zone by running the following command, where the CPU utilization is the value of the current_clock_Hz field: /usr/bin/kstat -m cpu_info Linux uses steal time. If the custom cell property enableStealTimeCalculation is set to true , and the maxStealTime value is set (the default is 3), run the following command: 100 - idle cpu + idle cpu x (steal/max steal) Otherwise, the formula is: 100 - idle cpu AIX does not use steal time, and uses the following formula: CPU usage time/time
Admission control for CPU overload protection	The purpose of admission control for processor overload protection is to deliberately not accept dialogs based on judgments concerning how much can be accepted without overloading the compute power in the nodes being managed and compromising the response time of the accepted messages. The Admission control for CPU overload protection value applies only to HTTP and Session Initiation Protocol(SIP); it does not apply to IIOP and JMS. Enable it when queuing for processor overload protection is not enough; when it is important to make deliberate refusals of some offered load.	Disabled by default. To configure: Define service policies with achievable performance goals, and set the goal type of the policies either to response time or percentile, not discretionary. In the ARFM panel, set the CPU utilization limit to no higher than 90%. Select the third button for Rejection policy. The rejection policy determines whether the admission control for processor overload protection is enabled and, if so, how the response time threshold used for admission control is related to the response time threshold that appears in the performance goal. At the cell level, set a cell custom property named arfmInitialMsgDlgRatio. The value is a decimal-formatted float that is the initial estimate for the ratio of each of the dialog-continuing message flows to the dialog-initiating message flow within the same (protocol family, deployment target). That is, it is the number of incoming follow-up messages per dialog. Set arfmInitialMsgDlgRatio to a value that is comparable among the collection of all dialog-continuing message flows. This custom property is also relevant when dialog orientation for processor overload protection and differentiated service is enabled. Save the changes. The admission control for processor overload protection is working if, in a heavily loaded system, the processor utilization is about the same as the setting for processor overload protection.
Read about memory overload protection	Maximum percentage of the heap size to be used for each application server.	Maximum percentage of the WAS heap size to use. Set the value to less than 100.
Request rejection policy	Behavior for HTTP, SIP and SOAP requests associated with a performance goal when an overload condition is detected.	Choose among the options to determine when to reject messages to prevent the CPU from being overloaded. We can reject no messages, or specify a rejection threshold value that determines when to reject messages. The default is to reject no messages. Discretionary work is assumed to have a response time threshold of 60 seconds.