plugin-cfg.xml: IBM recommendations

plugin-cfg.xml: IBM recommendations

MaxConnections

MaxConnections can be used to put a limit on the number of PENDING requests per server. When the MaxConnections limit is reached, the plug-in will stop sending requests to that appserver. Note that the appserver is not marked down. WAS applications that cannot handle requests in a timely manner are considered overwhelmed, and pending requests start to build up. For pending requests from the plugin to the appserver that are handled quickly, the default could be -1 (unlimited).
The optimal value for MaxConnections will depend on how quickly the application and appserver respond to each request. If normal responses are returned in less than one second, set a low value for MaxConnections (20). If it takes several seconds to get a response from the application, use a higher value, such as 100. If MaxConnections limit has been reached the plug-in will not send any more requests to that server until responses come back for the current PENDING requests, and the pending requests count drops back down below the MaxConnections limit.
Recommended value = 20 - 100 depending on application response times
With MaxConnections="-1" set LogLevel="Stats".
Monitor pending requests in the plug-in log, under normal conditions. Then, choose a value for MaxConnections that is significantly higher than the highest number shown in the log.

LoadBalanceWeight

Dynamically changed by the plug-in during runtime. Lowered each time a request is assigned to that clone. Default is 2, which means that the weights will get to 0 very quickly and the plug-in will constantly be readjusting the weights. IBM recommends starting with a higher LoadBalanceWeight. The WAS console will allow a value up to 20. We can manually edit plugin-cfg.xml and specify a higher value.
The LoadBalanceWeight of each appserver in a cluster are normalized by their highest common factor. For example, 80, 50, 30 have a common factor of 10. So, these configured weights would be divided by 10 at runtime, resulting in actual starting weights of only 8, 5, 3. Setting all clones to the same starting LoadBalanceWeight (for example: 20, 20, 20) will result in an actual starting weight of only 1 for each, because of normalization. Set the weight of at least one of the clones to be off by a value of 1. For example, if there are 3 clones, we coulde set starting LoadBalanceWeights to be: 20, 20, 19. After normalization the weights will be unchanged.

ConnectTimeout

Number of seconds the plug-in will wait when trying to open a socket to the application server If streams are already open and available, the plug-in will use one of those. Opening a new stream should not take very long, so the value for ConnectTimeout should be small. A ConnectTimeout value of 0 means never time-out: The timeout is left up to the OS TCP layer. IBM recommends specifying a small positive number.
Recommended value = 5

ServerIOTimeout

Set a time-out on the HTTP requests. By default, not included in plugin-cfg.xml. If the portal does not answer in the allotted time, the server is marked down. There are certain classes of failures whereby the portal cluster member accepts a socket open request, but the JVM has hung, and will not respond to HTTP requests. Without ServerIOTimeout, the plug-in does not mark the cluster member as down; however, it is not able to handle requests. This situation results in requests being routed to a hung server.
If the application is very quick to respond, use a lower value for ServerIOTimeout (60 seconds). If the application requires more time to process the request, use a higher number (180+ seconds). A value of 0 means that the plug-in will NOT timeout the request.
A positive value means that the plug-in will NOT mark the appserver down after a ServerIOTimeout pops. A negative value means that the plug-in will mark the appserver down after exceeding lime. To have the plugin mark the appserver down, and fail-over to another appserver in the same cluster, use a negative value.
Recommended value = -60
The ability to use a negative ServerIOTimeout value was introduced in plug-in apar PK72097.
Use trace to determine the amount of time it takes for the application to respond to requests under normal conditions. Then choose a value that is 2X or 3X or more than the longest response time.

RetryInterval

The time that the plug-in will wait before trying again to use an appserver that was marked down. Optimal value depends on the number of appservers in the cluster, and the value used for ServerIOTimeout.
Recommended value = (number of appservers in cluster - 1) x (absolute ServerIOTimeout) - 1
For example, if there are two appservers in the cluster, and value of ServerIOTimeout is -60, set RetryInterval...
(2 - 1) x (60) - 1 = 59 seconds
If there are four appservers in the cluster, and the value of ServerIOTimeout is -60, set RetryInterval...
(4 - 1) x (60) -1 = 179 seconds or less

Setting RetryInterval to a value higher than the recommended maximum can lead to a situation where all of the appservers in the cluster may be marked down simultaneously resulting in all requests temporarily failing.

Affinity requests
Requests containing a session cookie (JSESSIONID). All requests with same JSESSIONID are sent to the same application server, regardless of LoadBalanceWeight.
By default, affinity requests do not lower the weight, which can cause an uneven distribution across the servers. Set IgnoreAffinityRequests="false" to have the weight lowered for each affinity request in a Round Robin environment. If using Random instead of Round Robin, the affinity requests are still handled correctly, but new requests are routed randomly, and the LoadBalanceWeight is not used.
Recommendation = IgnoreAffinityRequests="false"

Fail-over

Occurs when the plug-in marks a cluster member appserver (or clone) as "down", and then sends the pending requests to other members of the same cluster.
Happens when the plug-in...

Is unable to open a new connection to the appserver within the ConnectTimeout
Has already sent the request to the appserver, but does not receive a response from the application within ServerIOTimeout

When the plug-in marks a cluster member appserver "down", it will handle the PENDING requests in one of two ways: Prior to plug-in apar PM12112, the plug-in would send all of the pending requests to the very next appserver in the cluster. Post apar PM12112, the plug-in will randomly send the pending requests to any of the available appservers in the cluster. While the appserver is marked "down" the plug-in will no longer send any requests to it. After RetryInterval the plug-in will check to see if that appserver can be used successfully again. If so, the "down" flag will be removed and the appserver will be used again.
By default, the number of attempts to handle a request is limited by the number of appservers in the cluster. For example, if there are only two appservers in the cluster, and the request fails once, the plug-in will only attempt that request one more time (total of two attempts). If there are five appservers in the same cluster, and the request fails once, the plug-in will attempt to retry that same request up to four more times (total of five attempts). That number includes retries sent to the same appserver (session affinity), or attempts sent to different appservers (fail-over).
The plug-in apar PM70559 introduced a new setting called "ServerIOTimeoutRetry" that can be used to control the number of retries due to ServerIOTimeout.

GetDWLMTable

If Memory-to-Memory (M2M) session replication is enabled in WAS, set GetDWLMTable to true in the plug-in config. M2M uses partition IDs rather than clone IDs which can lead to broken session affinity if GetDWLMTable is set to false, which is the default.
Recommendation = GetDWLMTable="true" whenever M2M is used in WAS.

StartServers

Each web server child process loads a separate instance of the web server plug-in. Multiple running instances of the web server plug-in do not share information with each other. If the IHS web server is configured to start 3 child processes (StartServers 3), there will be 3 instances of the web server plug-in running, one for each IHS child process. The dynamically changing LoadBalanceWeight of each cluster member is not shared between the plug-in instances. So, in one instance of the plug-in member1 might be considered UP with a weight of 5, when in another instance of the plug-in member1 might be considered DOWN and unusable. This can result in possibly different behaviors depending on which child process / plug-in instance handles each incoming request. For this reason IBM recommends using only a single web server child process with many threads.
If you choose to use more than one web server child process, keep in mind that the plug-in settings are handled on a per instance basis. For example, MaxConnections means the number of pending requests that will be allowed on that server, for each plug-in instance. If MaxConnections = 20, and there are 3 web server child processes (3 plug-in instances), then each instance will allow 20 pending connections to that application server for a total of 60 pending connections.

Related information
Understand plug-in Fail-over

Understand plug-in Load Balancing
Tune IBM HTTP Server processes and threads
Web server plug-in configuration
Modify plug-in properties from the WAS administrative console
How do the properties ServerIOTimeout and PostBufferSize affect plug-in behavior?