Resource workload routing
)(ZOS On a z/OS system, there are two ways that resource routing can be accomplished:
- Configure an alternate resource.
- Configure an action notification.
(Dist) On distributed systems, we can enable resource routing only by configuring an alternate resource.
Configure an alternate resource
A data source and connection factory can fail over and fail back automatically when a specified or default failure threshold value is reached. When fail over occurs, the application switches from using the primary resource to using the alternate resource. Fail back occurs when the application switches back from the alternate resource to the primary resource.
The alternate resource is created the same way that other connection factories or data sources are created. The alternate resource configuration should mirror the primary resource configuration. For example, the alternate resource security configuration and the primary resource configuration, with respect to the application and resources, should mirror each other so that the application and database can access the required data. After the alternate resource is created, we can change the database values that are necessary for the alternate resource backend configuration. If the alternate resource is not compatible, it is likely that fail over will fail. If the resources are not compatible, the following errors might occur: tables or fields that do not exist, an expected record does not exist and unexpected resource errors exist. As a test option, before the alternate resource is configured to the primary data source, we can test the application by changing the JNDI name in the application from the primary JNDI name to the alternate JNDI name.
When the primary resource is used, the alternate resource is paused. Before the alternate resource is paused, the alternate resource can be available before the primary is used. Using the alternate before it is paused is not recommended unless there is a special reason the alternate needs to be accessed before the primary resource. An example of a special reason is testing the application for compatability.
An alternate resource cannot be used as a primary. When using the fail over feature with non-relational resource adapters that have back-ends that also support fail over, verify that fail over is not configured for these back-ends. Fail over works with non-relational resource adapters that have a ManagedConnection object that implements a testConnection method. The testConnection method is used to ping the alternate and primary resources for success before re-establishing a connection to the currently available resource. If the resource adapter does not implement a testConnection method or testConnection throws a javax.resource.NotSupportedException error, the fail over feature is disabled.
For resource adapters that do not meet the requirement for testConnection, partial fail over can be used. We must manually fail back using Mbeans to the primary resource when the primary resource is available. Partial fail over can be enabled by setting the property, enablePartialResourceAdapterFailoverSupport to true.
We are encouraged to test the suitability of this feature with the system environment and resources before enabling fail over support.
(ZOS) For more information about using optimized local adapters, see the topic on enabling optimized local adapter high availability support.
Mbean operations
- failOverToAlternateResource
Values: none, hold or automated; default is hold without automated fail back.
Description: Manual fail over to alternate resource. This action is issued on the primary resource.
- failBackToPrimaryResource
Values: none, hold or automated; default is automated with automated fail over.
Description: Manual fail back to primary resource. This action is issued on the primary resource.
Mbean attribute properties
- currentActivePool
Values: A string value returned containing a JNDI name.
Description: A primary or alternate JNDI name is returned depending on which one is currently being used.
- populateAlternateResource
Values: boolean
Description: False - Disables populate alternate resource. True - Enables populate alternate resource. This action is issued on the alternate resource.
- resourceFailOver
Values: boolean
Description: False - Disables resource fail over. True - Enables resource fail over. This action is issued on the primary resource.
- resourceFailBack
Values: boolean
Description: False - Disables resource fail back. True - Enables resource fail back. This action is issued on the primary resource.
(ZOS) Manual resource routing with modify command for resources configured with an alternate resource
On the z/OS platform, we can use some of the Mbean functionality using the modify command. The modify command is used to manually disable resource failover support, enable resource failover support, failover to a configured alternate resource, and failback to the primary configured resource. See the topic, Modify Command, for more details on how to issue the command and to learn about the fail over parameters.
(ZOS) Configure an action notification
When action notification is configured for a particular resource and requests to that resource start to fail beyond a specified or default threshold value, the WebSphere Application Server for z/OS run time receives notification to perform that particular action. The action is a configurable value. The action codes are defined in the failureNotificationActionCode property content located in the "Custom properties" section later in this topic.
The various failure notification actions are designed to assist with high availability environments so that when a resource failure occurs, work can be routed to other servers in a cluster. The following action codes can be used:
- Action code 1
This action code issues messages BBOJ0130I and BBOJ0131I to the hard copy logging stream in the controller. The BBOJ0130I message is issued when the resource is unavailable, and BBOJ0131I is issued when the resource is restarted and available. WAS does not take any further automated action.
Action code 1 is designed to provide a notification to WebSphere administrators so that manual or automated mitigation actions can be configured outside of the application server. BBOJ0130I contains the following information:
- The JNDI name that identifies the resource that has failed.
- The name of the server where the resource that has failed was used.
- The action that was taken. For example: NONE, PAUSING LISTENERS
BBOJ0131I contains the following information:
- The JNDI name that identifies the resource that is restarted.
- The name of the server on which the resource is restarted.
- The action that was taken. For example: NONE, RESUMING LISTENERS
- The reason the action was taken: 1: Normal servant region availability notification. 2: Unknown resource availability.
- Action code 2
This action code pauses and resumes the listeners on the server where the resource resides for which this action was configured. Server listeners are paused when the resource is deemed unavailable. Server listeners are resumed when the resource is deemed available. When combined with a front-end router that supports high availability, such as a proxy server or an on-demand router, work for this server is routed to other servers in the cluster. As part of this action, two informational messages are issued to hardcopy in the controller region. The BBOJ0130I message is issued when the resource is unavailable, and BBOJ0131I is issued when the resource is restarted and available.
- Action code 3
This action code stops and starts all applications with locally installed modules that use this resource for which this action was configured. Applications are stopped when the resource that these applications use is deemed unavailable. Applications are started when the resource that these applications use is deemed available.
As part of this action, two informational messages are issued to hardcopy in the controller region. The BBOJ0130I message is issued when the resource is unavailable, and BBOJ0131I is issued when the resource is restarted and available.
The only applications for which a resource reference is defined are stopped on the server that experienced the resource failure only. Therefore, if the application is installed in a cluster, the application remains started on the other servers in the cluster.
Messages BBOJ0130I and BBOJ0131I contain the following information:
BBOJ0130I:
- The JNDI name that identifies the resource that has failed.
- The name of the server where the resource that has failed was used.
- The action that was taken; for example: NONE, "PAUSING LISTENERS", "STOPPING APPLICATIONS THAT USE THIS RESOURCE"
BBOJ0131I:
- The JNDI name that identifies the resource that is restarted.
- The name of the server on which the resource is restarted.
- The action that was taken; for example: NONE, "RESUMING LISTENERS", "STARTING APPLICATIONS THAT USE THIS RESOURCE"
- The reason the action was taken: 1: Normal servant region availability notification. 2: Unknown resource availability.
Custom properties
All properties for this feature must be created as new custom properties on the connection pool for a particular data source or connection factory. In the administrative console, navigate to the data source or connection factory that notification is to be enabled for. Click the Connection pool properties link. On the Pool Properties panel, click Connection pool custom properties link. The Custom properties panel for the resource connection pool displays. Click New to create the custom properties described as follows:
- (ZOS) failureNotificationActionCode
- (ZOS)
Values: {1,2,3}
Description: The failureNotificationActionCode property is used to enable the notification feature. If this property is not set to one of the following valid integer values specified, the notification feature is disabled:
- 1 = A BBOJ0130I message is output to hardcopy indicating that this resource is unavailable. When the resource becomes available, a BBOJ0131I message is output to hardcopy indicating that the resource is again available.
- 2 = A pause listeners command is issued to the server, preventing the server from receiving new incoming work. Message BBOJ0130I is also issued to describe the action taken. When the resource is available, a resume listeners command is issued to allow the server to again receive incoming work. Message BBOJ0131I is issued to describe the action taken.
- 3 = All applications with locally installed modules that use this resource are stopped on this server. Message BBOJ0130I is also issued to describe the action taken. When the resource becomes available, these applications are started. Message BBOJ0131I is also issued to describe the action taken.
- failureThreshold
Values: Must be an integer and > 0.
Description: The failureThreshold property is only read if the failureNotificationActionCode or alternateResourceJNDIName property is set to valid value. If the failureThreshold property is not set or is set to an invalid number the default value of 5 is used. The integer value specified for the failureThreshold is the number of consecutive resource exceptions that must occur for a particular resource before notification is sent or failover occurs.
The following is an example of how this value works: If the failureThreshold property is set to 5 for data source B, then data source B must get five consecutive resource exceptions while attempting to establish connections, with no successful attempts in-between these failures, before notification is sent or the resource can fail over. An attempt to send the notification or fail over is made after five consecutive resource exceptions occur. However, in a multi-threaded environment, after the failure threshold value is reached, the timing of the notification or fail over might occur after additional resource exceptions.
Resource exception counters are not shared between resources. Resource exceptions must be consecutive to reach failure threshold.
- alternateResourceJNDIName
Values: String value containing a direct JNDI name.
Description: An alternate connection factory or data source resource should be like the primary resource. Provide the JNDI name of the alternate resource to enable the fail over feature.
Advanced fail over properties
- resourceAvailabilityTestRetryInterval
Values: int value, default is 10.
Description: The test connection interval by default is 10 seconds. Every 10 seconds, the test connection thread attempts to test the primary resource. Depending on system resources, this value can be change from 1 - MAXINIT seconds.
- enablePartialResourceAdapterFailoverSupport
Values: boolean value, default is false.
Description: If true, fail over to the alternate resource occurs, but fail back to the primary is manual. This property can be set if the resource adapter being used does not meet the requirements for connection fail over, for example, it does not have testConnection implemented or it throws a not supported exception.
- disableResourceFailOver
Values: boolean value, default is false.
Description: If true, automatic fail over does not occur.
- disableResourceFailBack
Values: boolean value, default is false.
Description: If true, automatic fail back does not occur.
- populateAlternateResource
Values: boolean value, default is false.
Description: If true, the alternate resource is populated with connections to maximum connections. Every attempt is made to keep the alternate resource at maximum connections. If the database goes down for the alternate resource, the stale connections are removed and the populate thread repopulates the alternate resource when the database is available. We can dynamically enable and disable the populate alternate resource using the Mbean operations, disablePopulateAlternateResource and enablePopulateAlternateResource.
We might see performance gains during a failover with populate alternate resource enabled, but, the cost is high to keep the alternate resource populated. ost of the cost is in keeping twice the number of connections typically required for one data source. Because connections are large objects, keeping the connections uses additional memory resource on the server and extra resource on the database.
(ZOS) Enable optimized local adapter high availability support (ZOS) Modify command