WS-ReliableMessaging troubleshooting tips

WS-ReliableMessaging troubleshooting tips

Tips for troubleshooting the WS-ReliableMessaging configuration.

IBM recommends using the High Performance Extensible Logging (HPEL) log and trace infrastructure . We view HPEL log and trace information using the logViewer .
To help you identify and resolve problems with WS-ReliableMessaging, we can use the WebSphere Application Server trace and logging facilities. If we are using Eclipse-based tools, we can also use the TCP/IP monitor in Eclipse to view the messages that are flowing between the client applications and reliable messaging enabled Web services.

To enable trace for WS-ReliableMessaging, set the application server trace string as follows:
For either of the managed qualities of service:
org.apache.sandesha2*=all=enabled:com.ibm.ws.websvcs.rm*=all=enabled:org.apache.axis2*=all=enabled:com.ibm.ws.sib.wsrm*=all=enabled
For the Unmanaged Non-Persistent quality of service:
org.apache.sandesha2*=all=enabled:com.ibm.ws.websvcs.rm*=all=enabled:org.apache.axis2*=all=enabled
If we encounter a problem that you think might be related to WS-ReliableMessaging, we can check for error messages in the WAS administrative console, and in the application server SystemOut.log file. We can also enable the application server debug trace to provide a detailed exception dump.
A list of the main known restrictions that apply when using WS-ReliableMessaging is provided in WS-ReliableMessaging known restrictions.

WAS system messages are logged from a variety of sources, including application server components and applications. Messages logged by application server components and associated IBM products start with a unique message identifier that indicates the component or application that issued the message. The prefix for the WS-ReliableMessaging component is CWSKA.

The Troubleshooter reference: Messages topic contains information about all WAS messages, indexed by message prefix. For each message there is an explanation of the problem, and details of any action that we can take to resolve the problem.

Here is a set of tips to help you troubleshoot commonly-experienced problems:

If a sequence is reallocated, we might see more sequences than you expect
If reliable messaging is running on a cluster, when you examine the runtime state of inbound or outbound sequences you see multiple entries for each sequence
When we examine the runtime state of inbound or outbound sequences we might see multiple entries for each sequence
Runtime errors occur when we migrate persisted WS-ReliableMessaging messages from v6.1.0.9 or 6.1.0.11
When an application server starts, a messaging engine used for reliable messaging is reported as unavailable
A client application is unable to invoke a reliable messaging enabled web service
A sequence is not established and WS-ReliableMessaging cannot ensure messages are transmitted
A sequence is established but cannot be used and WS-ReliableMessaging cannot ensure messages are transmitted
Reliable messaging managed store is not initialized because the policy set binding is not complete or valid
Reliable messaging is interrupted because a server is unavailable
Socket timeout errors are received when running multiple reliable messaging client applications in a cluster
A message is not recovered after a server becomes unavailable
An exception message states that the security context token is not valid
A servant region experiences a timeout abend when we are using a managed quality of service

If a sequence is reallocated, we might see more sequences than you expect

When we examine the runtime state of inbound or outbound sequences we might see more sequences than you expect, due to sequence reallocation

If a sequence is reallocated, the original and new sequences are both visible. Ignore the multiple entries.

If reliable messaging is running on a cluster, when you examine the runtime state of inbound or outbound sequences you see multiple entries for each sequence

This is because, although reliable messaging only binds to one messaging engine in a cluster, the runtime panel is calculating and displaying the sequence information once for every cluster member. Ignore the duplicate entries. Note that the slight differences in the statistics being displayed for each duplicate entry is due to the entries being created sequentially, while polling for messages continues.

(ZOS) When we examine the runtime state of inbound or outbound sequences we might see multiple entries for each sequence

This is because, although reliable messaging only binds to one messaging engine, the runtime panel is calculating and displaying the sequence information once for every servant region. Ignore the duplicate entries. Note that the slight differences in the statistics being displayed for each duplicate entry is due to the entries being created sequentially, while polling for messages continues.

Runtime errors occur when we migrate persisted WS-ReliableMessaging messages from v6.1.0.9 or 6.1.0.11

If we are migrating from WAS Version 6.1, and we are using v6.1.0.9 or 6.1.0.11 of the Feature Pack for Web Services, and the configuration includes WS-ReliableMessaging configured for the managed persistent quality of service, we need to remove all persisted messages before migrating.

Each message is persisted as part of a sequence that is currently being processed. To remove all persisted messages, use the administrative console to complete the following steps:

Navigate to the Inbound sequence collection runtime panel for our reliable messaging application.
Select all the inbound sequences, then click delete sequence and messages to delete the sequences.
Navigate to the Outbound sequence collection runtime panel, then repeat the previous steps for the outbound sequences.

When an application server starts, a messaging engine used for reliable messaging is reported as unavailable

When we use reliable messaging with a managed quality of service, we might see the following exception message when the application server starts:
CWSIT0019E: No suitable messaging engine is available on bus yourBus that matched the specified connection properties 
In a network deployment environment, this can occur because the messaging engine is on an application server or cluster member that has started later than the server that hosts your reliable messaging application. In this case we need do nothing but wait; reliable messaging will keep trying to connect until the messaging engine becomes available.

If we suspect there is an underlying problem, for example the bindings are incorrect or the server that hosts the messaging engine is not going to start, complete the following checks:

Check that the specified messaging engine and service integration bus exist.
Check the system out log to ensure that the server that hosts the messaging engine has started.

A client application is unable to invoke a reliable messaging enabled web service

If our client application is unable to invoke a reliable messaging enabled web service, we can use the TCP-IP monitor to view the messages that are flowing between the client and the service. We should also check the following:

The endpoint is available.
The service is running.
The service has been invoked.
WS-ReliableMessaging is running.
WS-ReliableMessaging is correctly configured. In particular, for either of the managed qualities of service, check that we have configured a valid binding to a service integration bus and messaging engine, and that the messaging engine is running. See Attaching and binding a WS-ReliableMessaging policy set to a web service application by or Attaching and binding a WS-ReliableMessaging policy set to a web service application .
There are not too many applications sharing a single messaging engine.
When many applications use the same messaging engine, it can impact performance. Factors to consider include the number of applications that are already binding to the messaging engine, the CPU utilization, and the message throughput. To improve performance for a single server configuration, create a new messaging engine to bind to the application.

A sequence is not established and WS-ReliableMessaging cannot ensure messages are transmitted

After a sequence has been established, WS-ReliableMessaging provides retransmission of messages to a service. However if the sequence is not established then the messages are not transmitted to the service and a message similar to the following example is displayed:
org.apache.axis2.AxisFault: The Create Sequence request has been refused by the RM Destination
The initial createSequence message has been refused. This is propagated back, and causes the client to fail. For information about CreateSequence and CreateSequenceRefused, see the WS-ReliableMessaging: supported specifications and standards.

We might also see a subsequent message to help explain why the request has been refused. For example:
Caused by: javax.xml.ws.soap.SOAPFaultException: com.ibm.ws.sib.wsrm.exceptions.WSRMRuntimeException: 
CWSJZ0202I: A messaging engine connection is unavailable for bus myBus
.
There is a problem with your reliable messaging configuration. Complete the following checks:

Check that the policy sets are correctly applied. Specifically, check that the destination has reliable messaging correctly enabled.
Check the logs for server-side problems.
For the managed persistent quality of service, check that the associated messaging engine is available.

For further checks that might help you resolve the problem, see also the following troubleshooting tips:

A sequence is established but cannot be used and WS-ReliableMessaging cannot ensure messages are transmitted.
A client application is unable to invoke a reliable messaging enabled web service.

A sequence is established but cannot be used and WS-ReliableMessaging cannot ensure messages are transmitted

If we get an exception such as the following exception, then the sequence is established but cannot be used:
javax.xml.ws.WebServiceException: org.apache.axis2.AxisFault: The value of wsrm:Identifier is not a known Sequence identifier
.
The most common reason is that we are working in a clustered environment but the server-side policy set specifies the unmanaged non-persistent quality of service. For example: The WS-I RSP default policy set specifies the unmanaged non-persistent quality of service. To use reliable asynchronous messaging in a clustered environment, we must use a managed quality of service to enable the cluster members to correlate reliable messaging state. To do this, either use the WS-I RSP ND default policy set, or modify our custom policy set so that the WS-ReliableMessaging policy specifies a managed quality of service, and an associated binding to a service integration bus and messaging engine. For information about how to do this, see Configure a WS-ReliableMessaging policy set by and Attaching and binding a WS-ReliableMessaging policy set to a web service application by .

For further checks that might help you resolve the problem, see also the following troubleshooting tips:

A sequence is not established and WS-ReliableMessaging cannot ensure messages are transmitted.
A client application is unable to invoke a reliable messaging enabled web service.

Reliable messaging managed store is not initialized because the policy set binding is not complete or valid

If our policy set specifies a managed quality of service, but we have not specified a binding to a messaging engine to support that quality of service, we get the following exception message:
CWSKA0102E: The managed web services reliable messaging storage manager could not be initialized because the policy set binding was incomplete or invalid
.
Perhaps we have attached a managed policy set to the application, and used the default bindings (which do not support the managed qualities of service). We must create a new binding for our application that specifies a service integration bus and messaging engine to support the managed qualities of service. To do this, see Attaching and binding a WS-ReliableMessaging policy set to a web service application by .

Reliable messaging is interrupted because a server is unavailable

Clustering offers maximum protection against servers becoming unavailable. It provides highly available service endpoints, and (through the service integration bus) high availability of the reliable messaging layer.

For more information about configuring high availability for web services and messaging engines, see Balancing workloads and Add a messaging engine to a cluster.

Socket timeout errors are received when running multiple reliable messaging client applications in a cluster

When many applications use the same messaging engine, it can impact performance and in some cases lead to timeout errors.

For more information about configuring high availability for web services and messaging engines, see Balancing workloads and Add a messaging engine to a cluster.

A message is not recovered after a server becomes unavailable

When the reliable messaging layer receives a request message, it sends an acknowledgement then delivers the message to the target service. There is a marginal possibility that the server hosting the reliable messaging layer might become unavailable after the request message has been acknowledged and before it has been delivered. In this case, the message is only recovered if we are using in-order delivery as well as managed persistent quality of service. To specify in-order delivery, select the WS-ReliableMessaging policy option to "Deliver messages in the order that they were sent" as described in Configure the WS-ReliableMessaging policy.

There is a performance overhead in using in-order delivery, because messages are held in a queue until they can be delivered in order. However, where the highest level of reliability is required, we should always specify in-order delivery in conjunction with the managed persistent quality of service.

An exception message states that the security context token is not valid

When using reliable messaging with a persistent WS-I RSP profile and WS-SecureConversation, an exception message states that the security context token is not valid.

When we use a persistent WS-I RSP policy set, which includes WS-SecureConversation, if the scoping security context token is expired when the server is restarted then WS-ReliableMessaging cannot resend its messages and system messages are written to the log file stating that the reliable messaging sequence was not secured using the correct security token. For example:
CWWSS7215E: Cannot get valid security context token from the cache.
To ensure that the scoping security context token does not expire before WS-ReliableMessaging can recover and resend its messages, complete the following task: Configure WS-SecureConversation to work with WS-ReliableMessaging.

(ZOS) A servant region experiences a timeout abend when we are using a managed quality of service

A timeout abend could be caused by the value of the sib.wsrm.tokenLockTimeout custom property being set too high. Set on the messaging engine specified on the WS-ReliableMessaging policy binding. The value should be less than the amount of time that the control region waits before ending an inactive servant region. Refer to the Service integration custom properties topic for more information about this property.

Configure WS-SecureConversation to work with WS-ReliableMessaging
WS-ReliableMessaging
Attaching and binding a WS-ReliableMessaging policy set to a web service application by
Attaching and binding a WS-ReliableMessaging policy set to a web service application
Tune web services reliable messaging applications
Use High Performance Extensible Logging to troubleshoot applications
(ZOS) Service integration custom properties
WS-ReliableMessaging sequence reallocation