Detecting and fixing problems with WS-ReliableMessaging
The nature of WS-ReliableMessaging is that network and server failures are assumed, and therefore the target web service or message store might not be available. In these cases, message sequences cannot be completed and collections of web service messages are held awaiting transmission. We can use the SystemOut.log file, system events, and the runtime administrative panels to monitor the system and detect and fix problems with WS-ReliableMessaging.
If a sequence fails, a message is written to the application server SystemOut.log file and a system event is generated. Therefore we can detect failed sequences by looking at the SystemOut.log file, or by writing an event listener (or by using third party software) to monitor system events.
This topic references one or more of the application server log files. As a recommended alternative, we can configure the server to use the High Performance Extensible Logging (HPEL) log and trace infrastructure instead of using SystemOut.log , SystemErr.log, trace.log, and activity.log files on distributed and IBM i systems. We can also use HPEL in conjunction with the native z/OS logging facilities. If we are using HPEL, we can access all of the log and trace information using the LogViewer command-line tool from the server profile bin directory. See the information about using HPEL to troubleshoot applications for more information on using HPEL.
After a sequence has been established, WS-ReliableMessaging provides retransmission of messages to a service. However if the sequence is not established (that is, if the initial CreateSequence request is refused) then the messages are not transmitted to the service. For more information, see the troubleshooting tip A sequence is not established and WS-ReliableMessaging cannot ensure messages are transmitted.
For more detailed status information at run time, and facilities to help fix problems, use the WS-ReliableMessaging administrative console runtime panels. These panels are available at many different scopes (for example cell; application server; messaging engine). For a full list of the WS-ReliableMessaging runtime panels, and details of the scopes at which they are available, see WS-ReliableMessaging - administrative console panels.
At all scopes, the parent panel is Reliable messaging state settings. From this panel we can investigate each of the three key runtime aspects of reliable messaging:
- Message stores
- Inbound sequences
- Outbound sequences
The following icons are displayed here and on several other reliable messaging runtime panels:
Note that for troubleshooting purposes you only have to follow links to the sub-panels if states other than "OK" are displayed.
Icon Name Description
OK Everything here, and (if there is a link) in all runtime panels below this link, is running normally.
Warning Something here, or (if there is a link) in one of the runtime panels below this link, is in an unusual state and we might have to take some action to resolve it. For example, the system might be awaiting a response from an endpoint. In this case, either the response will be received (in which case we need take no action and the runtime information will be updated to "OK") or the reliable messaging destination has stopped acknowledging messages (in which case we have to take some action to resolve the failed sequence).
Error There is a definite error that you must take some action to resolve, either here or (if there is a link) in one of the runtime panels below this link. To use the reliable messaging runtime panels to detect and fix problems with WS-ReliableMessaging, complete one or more of the following steps:
- Investigate problems with message stores.
In the navigation pane, click one of the paths to this panel. For example Servers > Server Types > WebSphere application servers > server_name > [Additional Properties] Reliable messaging state > Runtime > Message store. The list of reliable messaging storage managers for the current scope is displayed in the Message store collection form.
For the managed qualities of service, the messages are written to a messaging engine. For the unmanaged non-persistent quality of service, the messages are stored in memory. For in-memory stores the only possible value is "Running". For messages stored by a messaging engine, the possible values are "Running" or "Messaging engine not contactable", probably because the messaging engine is not running. The "OK" icon indicates that the message store is running. If the messaging engine is not contactable, the "Error" icon is displayed.
For each message store in the list, the name of the associated reliable messaging application is given in the description column. If a messaging engine is not contactable, restart the message store for that application.
- Investigate problems with inbound sequences.
In the navigation pane, click one of the paths to this panel. For example Servers > Server Types > WebSphere application servers > server_name > [Additional Properties] Reliable messaging state > Runtime > Inbound sequences. The runtime state of each of the inbound sequences for the current scope is displayed in the Inbound sequence collection form.
We can use a filter to look at sequences that are in a particular state (for example Failed due to missing message) or that have a large number of messages awaiting dispatch to applications. If the sequence status is Error, there is a problem with the sequence and the source server hosting the other end of the sequence has terminated it. If the sequence is active and there are a large number of messages awaiting dispatch to the application, then there might be a problem with the application or, if in-order delivery is specified, delivery might be held up because the sequence has gaps in it.
We can select one or more sequences, use the buttons provided to dispatch the messages to their associated applications, to export the messages to compressed files, to close or terminate the selected sequences, or to delete the selected sequences and all their messages.
Delete or terminate sequences only if necessary. If we delete or terminate an active sequence, the resulting messaging behavior is unpredictable and can cause loss of messages. If we are not sure whether we can safely delete or terminate a sequence, do not delete or terminate it; the system automatically deletes sequences that have been inactive for 12 hours.
To see more detailed information about a particular sequence, click the Sequence identifier field. The Inbound sequences settings form is displayed. This detailed information includes addressing information to help identify the source of the sequence, and the value (true or false) for in-order delivery for the sequence. From this panel we can also display the following forms:
- The Acknowledgement state collection form. (The ranges of message sequence numbers received from the WS-ReliableMessaging source. If more than one range is displayed, this indicates a gap in the messages received. If "In-order delivery" is selected for the sequence manager, messages with a sequence number greater than the lowest gap cannot be delivered to the application until the gap is closed.)
- The Inbound message collection form. (The messages on the inbound sequence. We can use this form to delete individual messages.)
- The Message settings form. (The contents of an individual message in the sequence.)
For more guidance on diagnosing problems with inbound sequences, see Diagnosing the problem when a reliable messaging source cannot deliver its messages
- Investigate problems with outbound sequences.
In the navigation pane, click one of the paths to this panel. For example Servers > Server Types > WebSphere application servers > server_name > [Additional Properties] Reliable messaging state > Runtime > Outbound sequences. The runtime state of each of the outbound sequences for the current scope is displayed in the Outbound sequence collection form.
We can use a filter to look at sequences that are in a particular state. For example, the state Cannot contact the remote endpoint indicates that the sequence has been established but the reliable messaging destination has stopped acknowledging messages (which, coupled with a high number of messages awaiting transmission, might indicate a potential problem). If the sequence status is Error, there is a problem with the sequence and the server hosting the other end of the sequence has terminated it.
We can select one or more sequences, and use one of the buttons provided to export the messages to compressed files, to close or terminate the selected sequences, or to delete the selected sequences and all their messages. For more information about deleting sequences, see Delete a failed WS-ReliableMessaging outbound sequence.
Delete or terminate sequences only if necessary. If we delete or terminate an active sequence, the resulting messaging behavior is unpredictable and can cause loss of messages. If we are not sure whether we can safely delete or terminate a sequence, do not delete or terminate it; the system automatically deletes sequences that have been inactive for 12 hours.
To see more detailed information about a particular sequence, click the Sequence identifier field. The Outbound sequences settings form is displayed. This detailed information includes addressing information to help identify the server at which the sequence is targeted. From this panel we can also display the following forms:
- The Outbound message collection form. (The messages on the outbound sequence. We can use this form to delete individual messages.)
- The Message settings form. (The contents of an individual message in the sequence.)
For more guidance on diagnosing problems with outbound sequences, see Diagnosing and recovering a WS-ReliableMessaging outbound sequence that is in retransmitting state.
Subtopics
- WS-ReliableMessaging sequence reallocation
In some situations, the WS-ReliableMessaging implementation can recover from a sequence-related fault, so your application can continue without having to process the fault itself. Your application must still process the fault if the recovery fails.
- Diagnosing the problem when a reliable messaging source cannot deliver its messages
If we know the sequence identifier, and the target URI for the messages, we can use the runtime administrative console panels to examine the sequence and determine why a reliable messaging source is not delivering its messages into the application server.
- Diagnosing and recovering a WS-ReliableMessaging outbound sequence that is in retransmitting state
You have to resolve a sequence in retransmitting state, because messages are backing up awaiting delivery to the target service. The retransmission might be due to a network failure, or a failure of the target server. The problem should resolve automatically, but you might want to investigate the cause to speed up recovery.
- Delete a failed WS-ReliableMessaging outbound sequence
You have to resolve an outbound sequence in failed state, so that messages can again be transmitted to the target service. A sequence in failed state shows an unrecoverable error. The sequence can no longer be used. If messages are being delivered in order, then the failed sequence must be resolved before a new sequence can be established.
- WS-ReliableMessaging troubleshooting tips
Tips for troubleshooting the WS-ReliableMessaging configuration.
Related tasks
WS-ReliableMessaging Add assured delivery to web services through WS-ReliableMessaging Use High Performance Extensible Logging to troubleshoot applications Configure WS-SecureConversation to work with WS-ReliableMessaging
WS-ReliableMessaging - requirements for interaction with other implementations WS-ReliableMessaging - administrative console panels WS-ReliableMessaging: supported specifications and standards