Messaging engine troubleshooting tips

Messaging engine troubleshooting tips

Use this set of specific tips to help troubleshoot problems with service integration messaging engines.

Messaging engine start fails because runtime is not yet initialized

Messaging engine does not start up with a DB2 Universal JDBC type 2 driver

Messaging engine cannot start up because of a known error in the Informix JDBC Driver 3.00JC1

Problem determination for a data store

Messaging engines cause database contention messages

User ID not supported exception when connecting to a Network Attached Apache Derby Version 10.3 database

Possible causes of the XAResourceNotAvailableException exception and how to take appropriate action

Problems when you re-create a service integration bus

Problems communicating with foreign buses

Problems when attempting to communicate with a renamed foreign bus

Possible causes of a JMSException with a wrapped SILimitExceeded exception

Corruption problems on system restarts

Retrieve the status of messaging engines in the administrative console

Enable an application to be started prior to a required messaging engine has started

Channel framework messages appear during server startup

Messaging engine failover is not supported for mixed version clusters

Messaging engine start fails because runtime is not yet initialized

A messaging engine fails to start and the following error is displayed in the WAS administrative console:

The messaging engine <name> cannot be started as there is no runtime initialized for it yet, retry the operation once it has been initialized. If dynamic configuration reload is enabled for this bus, then the servers must be restarted.

Prior to trying to start the messaging engine again, make sure that we have restarted the server. For the runtime to initialize successfully, the application server must be started.
To find out whether a start up problem is preventing the messaging engine runtime from initializing, check for error messages in the SystemOut.log of the hosting server.
This topic references one or more of the application server log files. As a recommended alternative, we can configure the server to use the High Performance Extensible Logging (HPEL) log and trace infrastructure instead of using SystemOut.log , SystemErr.log, trace.log, and activity.log files on distributed and IBM i systems. We can also use HPEL in conjunction with the native z/OS logging facilities. If we are using HPEL, we can access all of the log and trace information using the LogViewer command-line tool from the server profile bin directory. See the information about using HPEL to troubleshoot applications for more information on using HPEL.

Messaging engine does not start up with a DB2 Universal JDBC type 2 driver

When attempting to use the DB2 Universal JDBC type 2 driver to store data on the z/OS platform, the messaging engine does not start up and "Storage Allocation Error" messages similar to the following message might appear in the WAS SystemOut.log file:
BBOO0220E: [SB6NLA1:SB6NLA1.server1-SB6NLA1] CWSIP0002E: An internal messaging error occurred in com.ibm.ws.sib.processor.im
pl.MessageProcessor, 1:1469:1.365, com.ibm.ws.sib.msgstore.Messa
geStoreRuntimeException: com.ibm.ws.sib.msgstore.PersistenceExce
ption: CWSIS1501E: The data source has produced an unexpected
exception: com.ibm.db2.jcc.t2zos.y: [IBM/DB2][T2zos/2.5.48]T2zo
sPreparedStatement.readPrepareDescribeOutput_:processDescribeOut
put:1563:Storage Allocation Error at com.ibm.ws.sib.msgstore.cac
he.links.AbstractItemLink.readDataFromPersistence(AbstractItemLi
nk.java:2487) at com.ibm.ws.sib.msgstore.cache.links.AbstractItemLink._restoreIte
m(AbstractItemLink.java:639)
For the z/OS platform, you should use a DB2 Universal JDBC type 4 driver. If use the DB2 Universal JDBC type 2 driver...

Use the administrative console to navigate to Resources -> JDBC -> Data sources -> data_source_name -> [Additional Properties] Custom properties

Set the JDBC driver custom property fullyMaterializeLobData to false.
The fullyMaterializeLobData custom property determines whether LOB data is fully materialized in the JDBC driver when a row is fetched, or is retrieved in pieces as needed. The actual behavior depends on whether the database server supports progressive streaming. Refer to the DB2 documentation for more information about this property. The default value is true.

Save the changes to the master configuration.

Restart the application server.

Messaging engine cannot start up because of a known error in the Informix JDBC Driver 3.00JC1

When attempting to use the Informix JDBC driver 3.00JC1 to store data, the messaging engine cannot start up and the following error message might appear in the WAS SystemOut.log file:
00000022 SibMessage E [RetireBus:retire_web.000- RetireBus] CWSIS0002E:  The messaging engine encountered an exception while starting.  Exception: com.ibm.ws.sib.msgstore.PersistenceException: CWSIS1501E:  The data source has produced an unexpected exception: java.sql.BatchUpdateException:
Unique constraint (informix.u114_62) violated.
00000022 SibMessage E [RetireBus:retire_web.000- RetireBus] CWSID0035E:  Messaging engine retire_web.000-RetireBus cannot be started;
detected error reported during com.ibm.ws.sib.msgstore.impl.MessageStoreImpl start()  00000022 SibMessage E [RetireBus:retire_web.000- RetireBus] CWSID0027I:  Messaging engine retire_web.000-RetireBus cannot be restarted because a  serious error has been reported.T]  00000022 SibMessage I [RetireBus:retire_web.000- RetireBus] CWSID0016I:  Messaging engine retire_web.000-RetireBus is in state Stopped.  
There is a known defect (PTS 172471) in the Informix JDBC Driver 3.00JC1. To avoid this error, upgrade the Informix JDBC Driver to 3.00JC2.

Problem determination for a data store
We can create a dump, in reduced form, of the data in the data store for a messaging engine. The output is intended for use by IBM Service personnel. Contact the support organization for information about how to run the command.
If there is a problem with the data in the data store, it can be hard to diagnose from the trace output. However, we can create a dump, in XML format, of the data in the data store. This makes diagnosis easier because it is a human readable representation that can be transformed to other formats as required. We can create a data store dump by typing the following command in wsadmin.sh:

Jython:
AdminControl.invoke(AdminControl.queryNames("type=SIBMessagingEngine, name=messagingenginename,*"), "dump", "com.ibm.ws.sib.msgstore.*")

Jacl:

$AdminControl invoke [$AdminControl queryNames type=SIBMessagingEngine, name=messagingenginename,*] dump com.ibm.ws.sib.msgstore.*

The dump is created as an XML file in the $WAS_HOME/logs/server1 directory. The file is named according to the format: messaging_engine_nameUUIDtimestamp.xml
The format of the file is illustrated in the following example:
<MessageStore>
    <itemStreams>
        <ItemStreamLink id="0" state="Available">
            <class>com.ibm.ws.sib.msgstore.ItemStream</class>
            <priority>5</priority>
            <canExpireSilently></canExpireSilently>
            <storageStrategy>STORE_NEVER</storageStrategy>
            <expiryTime>0</expiryTime>
            <sequence>0</sequence>
            <tranID>null</tranID>
            <tickValue>0</tickValue>
            <items>
                <ItemLink id="2" state="Available" refCount="3" refCountDecreasing="false">
                    <class>com.ibm.ws.sib.msgstore.Item</class>
                    <priority>5</priority>
                    <canExpireSilently></canExpireSilently>
                    <storageStrategy>STORE_NEVER</storageStrategy>
                    <expiryTime>0</expiryTime>
                    <sequence>1</sequence>
                    <tranID>null</tranID>
                    <tickValue>0</tickValue>
                </ItemLink></items></ItemStreamLink></itemStreams></MessageStore>
Messaging engines cause database contention messages

When a messaging engine uses a data store for the message store, if the same messaging engine is accidentally started twice, a database contention message is displayed:

CWSIS1546I: The messaging engine, ME_UUID={0}, INC_UUID={1}, has lost an existing lock or failed to gain an initial lock on the data store.

To resolve the problem:

Check for problems with the database, for example, the database is unavailable.

Check for problems with the network. For example, if the network is overloaded, two application servers might be able to connect to the database, but might not be able to connect to each other, which might cause resource coordination problems.

If we have a service integration configuration that provides high availability or workload sharing, check that the appropriate resources are configured correctly. For example, check the messaging engines, the core group policies for those messaging engines, and the match criteria that associate each core group policy with a messaging engine. See Configure high availability and workload sharing of service integration.

User ID not supported exception when connecting to a Network Attached Apache Derby Version 10.3 database

When you test the connection to the Network Attached Apache Derby Version 10.3 database, you get the following exception:

java.lang.Exception: java.sql.SQLException: null userid not supported DSRA0010E: SQL State = null, Error
When creating a new Network Attached Apache Derby data store, by default you get a blank authentication alias.If we use Apache Derby in Network Attached mode with the DB2 Universal JDBC Driver (that is, you use the "JDBC provider for Derby Network Server using the (DB2) Universal JDBC Driver"), specify an authentication alias. This requirement is documented in Data source minimum required settings for Apache Derby.
The need for an authentication alias only applies to the "JDBC provider for Derby Network Server using the (DB2) Universal JDBC Driver". This driver is deprecated and is replaced by the "JDBC provider for Derby Network Server using Derby Client", which does not need an authentication alias. See also Configure a JDBC data source for a messaging engine.

Possible causes of the XAResourceNotAvailableException exception and how to take appropriate action
When the deleteNode command is used for a node that hosts messaging engines, those messaging engines are deleted. When new messaging engines are re-created following the addNode command, they have different identifiers and so during transaction recovery it is not possible to connect to the old messaging engines. A message identifying the XAResourceNotAvailableException exception is generated in the SystemOut.log file for each server that hosts a messaging engine.
To solve this problem, follow the procedure described in Resolve indoubt transactions.
The XAResourceNotAvailableException exception can also be thrown when a server in a cluster bus member fails over. In this case, no operator intervention is required to recover and resolve transactions.

Problems when you re-create a service integration bus
If we delete a service integration bus, and later create a new bus with the same name, the messaging engine fails to start and messages such as the following are generated in SystemOut.log:
[8/11/04 21:55:01:439 CDT] 0000000f SibMessage    I    [LateBus:xyzsun15.server1-LateBus] isAlive: MessagingEngine suffered common mode error.  Correct error (see logs) and restart server.
[8/11/04 21:55:01:468 CDT] 0000000f SibMessage    I    [LateBus:xyzsun15.server1-LateBus] isAlive: MessagingEngine will be stopped  because of common mode error.  No failover will occur.
[8/11/04 21:55:01:493 CDT] 0000000f SibMessage    I    [LateBus:xyzsun15.server1-LateBus] Messaging Engine  xyzsun15.server1-LateBus not in state from which stop is valid: Starting
[8/11/04 21:55:01:513 CDT] 0000000f SibMessage    I    [LateBus:xyzsun15.server1-LateBus] isAlive: MessagingEngine stopped because  of common mode error. Correct error (see logs) and restart server.
[8/11/04 21:57:01:431 CDT] 0000000e SibMessage    I    [LateBus:xyzsun15.server1-LateBus] isAlive: MessagingEngine suffered  common mode error.  Correct error (see logs) and restart server.
The messaging engine failed to start because the database directory for the messaging engine still exists once deletion of the bus and you must manually remove it. To delete the Apache Derby database for a non-existent messaging engine, you must delete the database directory located in profile_root/databases/com.ibm.ws.sib, where profile_root is the directory in which profile-specific information is stored.
We must stop WebSphere Application Server prior to deleting the database files.
For other databases, we can either delete all of the rows from the data store tables or we can drop all of the data store tables. These tables are in the schema configuredd for the data store. For a list of the tables, refer to Data store tables.
For more information, see Data store life cycle.

Problems communicating with foreign buses

To enable communication between buses, a foreign bus and a service bus integration link must be created. On the first bus, the name of the foreign bus must match the name of the second bus that becomes a foreign bus, and the name of the foreign bus for this second bus must match the name of the first bus. The service integration bus link must be have the same name on both buses.
You may encounter the following type of error if the configuration is not correct, for example because the service integration bus links do not match:

SibMessage E [TechBus:TechCluster.000-TechBus] CWSIT0057E: The inter-bus connection BookstoreBus failed in the remote messaging engine on host aixp401.rchland.ibm.com with reason: CWSIT0067E: Inter-bus connection BookstoreBus in bus BookstoreBus is not available.

Problems when attempting to communicate with a renamed foreign bus

The administrative console panel used for configuring the properties of a service integration bus link can also be used to change the foreign bus name that the link is pointing to. However, you must not alter the name of the foreign bus once it has been configured. If we do, any messaging engines that already hold state information about the link will not be able to use the link until the foreign bus name is reset to its previous value.

Possible causes of a JMSException with a wrapped SILimitExceeded exception

When the number of messages held by a destination reaches its limiting threshold, any attempt to send a message to that destination fails with a JMSException with a wrapped SILimitExceeded exception. The destination continues to fail with this exception until the number of messages held by the destination is reduced below the limiting threshold.
To obtain an accurate count of the number of available messages, we can monitor the Available Message Count PMI statistic for queue and topicspace destinations. If the number of available messages increases, take action to balance the system. Consider stopping producers from sending new messages until the destination consumes the available messages.
Examine the following list for possible causes and solutions for this problem:

The high threshold for the destination is too low for the projected number of messages. The destination does not process some messages. The default value for the high threshold is 50000.
Solution: Increase the high threshold for the destination.

Applications are producing more messages than the destination can process.
The ideal balance is for the number of messages produced and the number of messages consumed to be equal over a period of time. If the system is unbalanced and the producing application sends more messages than the destination can consume, the producing application eventually throws a JMSException.
Solution: Aim for a balance between the number of messages produced and the number of messages consumed.
The default setting for the Object Request Broker(ORB) thread pool is 100 threads. For some applications, this might allow 100 applications to send messages to the same destination. Consider tuning the ORB thread pool to have a maximum of 10 threads. This lower setting reduces the number of producers that can send messages, which might increase the overall message throughput.

Applications are processing messages from the destination too slowly.
Solution: It might be necessary to increase the number of messages consumed by the client applications. A destination processes more message when multiple consumers read from that destination.
Consider cloning the application across multiple servers in a non-clustered environment. By default, applications are cloned in a clustered server environment. To enable subscribers in a non-clustered environment, set the cloned flag in the TopicConnectionFactory JNDI setting for DurableSubscriptions.
Restriction: This solution is not suitable for applications that require total message ordering.

Messages have a quality of service attribute that is better than best effort nonpersistent.
Solution: Use messages for which the quality of service attribute is best effort nonpersistent. If there are too many messages in the system, the destination discards best effort nonpersistent messages.
Restriction: This solution is not suitable for applications that must receive all messages.

Corruption problems on system restarts

It is possible, although rare, for a messaging engine, destination or link to be corrupted once a restart of the system. If this corruption occurs you will see a message indicating the problem. If the problem lies with the messaging engine, the messaging engine will not start. If a destination or link is corrupted, the relevant messaging engine will start, but the destination or link will not be usable on that messaging engine.
If we do not know the cause of the problem, contact the IBM service representative to establish the cause prior to attempting to resolve the situation.
If we know the cause of the problem, for example, you are aware of an issue with your database, resolve it by completing the following steps:

Use the administrative console to ensure that the configuration files are synchronized across the system, by navigating toSystem administration -> Nodes then clicking Full Resynchronize. This operation can take several minutes to run.

If the problem still exists, complete one of the following tasks:

Delete the corrupted object and recreate it. Messages produced or received prior to the corruption occurred will be lost.

Restore the system from a backup, see Restore a data store and recovering its messaging engine. Messages produced or received since the backup was taken will be lost.

Retrieve the status of messaging engines in the administrative console

To be able to retrieve the status of messaging engines, you must be logged into the administrative console with at least monitor authority. If we do not have this authority, the messaging engine status is displayed as "Unavailable", even if the messaging engine has started.
If we are not logged in with the authority needed to retrieve the status of messaging engines, an error message such as the following is logged in the server systemOut log file:

[4/20/05 10:49:57:083 CDT] 0000004b RoleBasedAuth A SECJ0305I: The role-based authorization check failed for admin-authz operation SIBMessagingEngine:stateExtended. The user UNAUTHENTICATED (unique ID: unauthenticated) was not granted any of the following required roles: administrator, operator, configurator, monitor.
Where the user ID shown in the message is the user ID that you used to log in to the administrative console.

Enabling an application to be started prior to a required messaging engine has started

If an application depends on a messaging engine being available, then the messaging engine must be started prior running the application. If we want application server to start an application automatically, you should develop the applications to test that any required messaging engine has been started and, if needed, wait for the messaging engine. If this technique is used in a startup bean, then the startup bean method should perform the test and wait work in a separate thread (using the standard WorkManager methods), so that the application server startup is not delayed.
For an example of code to test and wait for a messaging engine, see Applications with a dependency on messaging engine availability.

Channel framework messages appear during server startup

When you start the server, you might see channel framework information messages displayed in the control region adjunct (CRA) process. These messages do not indicate that any errors have occurred and do not require you to take any action.
The following message is issued because the application containing the message-driven bean has started prior to starting the messaging engine.

CWSIV0759W: During activation of a message-driven bean, no suitable active messaging engines were found in the local server on the bus {0}.

When the messaging engines start, another information message confirms this and message handling can then occur.

The following message is issued because of the asynchronous way in which the z/OS TCP proxy channel starts.
CHFW0030E: Error starting chain {0} because of exception {1}

When the messaging engines start, another information message confirms this and message handling can then occur.
These messages only appear under certain circumstances, for example these messages might appear if you change ports during migration
The following message might be displayed several times in the control region adjunct process during server startup, even though the connection succeeds on subsequent retries. This message is issued because of the asynchronous way in which the z/OS TCP proxy channel starts and does not indicate that any error has occurred.
Trace: 2009/06/17 08:24:41.434 01 t=9C6B58 c=UNK key=P8 (00000011)
Description: Log Java Message  Message: CHFW0030E: Error starting chain _InboundTCPProxyBridgeService  because of exception  com.ibm.wsspi.channel.framework.exception.RetryableChannelException:  An exception was thrown when attempting to start the TCPProxyChannel  com.ibm.ws.channel.framework.imp l.ChannelFrameworkImpl
These messages might be accompanied by First Failure Data Capture (FFDC) output similar to the following example:
Exception = com.ibm.wsspi.channel.framework.exception.RetryableChannelException
Source = com.ibm.ws.channel.framework.impl.ChannelFrameworkImpl.startChainInternal
probeid = 2577
Stack Dump = com.ibm.wsspi.channel.framework.exception.RetryableChannelException: An exception was thrown when attempting  to start the TCPProxyChannel
  at com.ibm.ws.tcpchannelproxy.jfap.impl.TCPProxyInboundChannel.start(TCPProxyInboundChannel.java:153)
  at com.ibm.ws.channel.framework.impl.ChannelFrameworkImpl.startChannelInChain(ChannelFrameworkImpl.java:1410)
  at com.ibm.ws.channel.framework.impl.ChannelFrameworkImpl.startChainInternal(ChannelFrameworkImpl.java:2863)
   at com.ibm.ws.channel.framework.impl.WSChannelFrameworkImpl.startChainInternal(WSChannelFrameworkImpl.java:960)
  at com.ibm.ws.channel.framework.impl.ChannelFrameworkImpl.startChainInternal(ChannelFrameworkImpl.java:2794)
   at com.ibm.ws.channel.framework.impl.ChannelFrameworkImpl.startChain(ChannelFrameworkImpl.java:2779)
   at com.ibm.ws.runtime.component.ChannelFrameworkServiceImpl.startChain(ChannelFrameworkServiceImpl.java:666)
   at com.ibm.ws.sib.jfapchannel.framework.impl.ChannelFrameworkReference$TCPProxyBridgeServiceInboundChainStartupRunnable
   .run(ChannelFrameworkReference.java:1641)
  at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1550)
Caused by: com.ibm.ws.tcpchannelproxy.jfap.NotYetInitializedException: Server is not yet initialized
  at com.ibm.ws.tcpchannelproxy.jfap.TCPProxyBridgeServicesImpl.startListening(TCPProxyBridgeServicesImpl.java:558)
  at com.ibm.ws.tcpchannelproxy.jfap.impl.TCPProxyInboundChannel.start(TCPProxyInboundChannel.java:131)
  ... 8 more 
Eventually the following message should be displayed indicating that the z/OS TCP proxy channel has started up correctly:
Trace: 2009/06/17 08:24:51.449 01 t=9C6B58 c=UNK key=P8 (13007002)
   ThreadId: 00000003
   FunctionName: com.ibm.ws.channel.framework.impl.WSChannelFrameworkImpl
   SourceId: com.ibm.ws.channel.framework.impl.WSChannelFrameworkImpl
   Category: AUDIT
   ExtendedMessage: BBOO0222I: CHFW0019I: The Transport Channel Service has started  chain _InboundTCPProxyBridgeService.
Messaging engine failover is not supported for mixed version clusters

A messaging engine that is hosted on WebSphere Application Server v8.5 cannot fail over to a server that is hosted on a different WebSphere Application Server version. If we have a cluster bus member that consists of servers that are hosted on different WebSphere Application Server versions, you must ensure that the high availability policy is configured to prevent failovers.
To prevent failover of a v8.5 messaging engine to a server that is hosted on different version, configure the high availability policy for the messaging engine so that the cluster is effectively divided into two sets of servers, one set for v8.5 and another set for the servers that are hosted on different Versions. The high availability configuration ensures that the v8.5 messaging engine is restricted to the v8.5 servers only. For more information, see Configure messaging engine failover for mixed version clusters

Related concepts

Messaging engines
Service integration high availability and workload sharing configurations
Troubleshooting help from IBM

Related tasks

Listing the messaging engines defined for a server bus member
Listing the messaging engines for a cluster bus member
Add a messaging engine to a cluster
Remove a messaging engine from a cluster
Configure messaging engines
Listing the messaging engines in a bus
Add additional messaging engines to a cluster bus member
Remove a messaging engine from a bus
Configure messaging engine properties
Displaying the runtime properties of a messaging engine
Start a messaging engine
Stopping a messaging engine
Manage messaging engines with administrative commands
Create the database, schema and user ID for a messaging engine
Configure messaging engine failover for mixed version clusters
Configure high availability and workload sharing of service integration
Use High Performance Extensible Logging to troubleshoot applications Reference topic