MQ: Looking at problems in more detail

Looking at problems in more detail

Contents

Have you failed to receive a response from a PCF command?
Are some of your queues failing?
Does the problem affect only remote queues?
Is your application or system running slowly?

Messages that do not appear on the queue

If messages do not appear when you are expecting them, check for the following:

Has the message been put on the queue successfully?

Has the queue been defined correctly? For example, is MAXMSGL sufficiently large?
Is the queue enabled for putting?
Is the queue already full?
Has another application got exclusive access to the queue?

Are you able to get any messages from the queue?

Do you need to take a syncpoint?
If messages are being put or retrieved within syncpoint, they are not available to other tasks until the unit of recovery has been committed.
Is your wait interval long enough?
You can set the wait interval as an option for the MQGET call. Ensure that you are waiting long enough for a response.
Are you waiting for a specific message that is identified by a message or correlation identifier (MsgId or CorrelId)?
Check that you are waiting for a message with the correct MsgId or CorrelId. A successful MQGET call sets both these values to that of the message retrieved, so you might need to reset these values in order to get another message successfully.
Also, check whether you can get other messages from the queue.
Can other applications get messages from the queue?
Was the message you are expecting defined as persistent?
If not, and WebSphere MQ has been restarted, the message has been lost.
Has another application got exclusive access to the queue?

If you cannot find anything wrong with the queue, and WebSphere MQ is running, check the process that you expected to put the message onto the queue for the following:

Did the application start?
If it should have been triggered, check that the correct trigger options were specified.
Did the application stop?
Is a trigger monitor running?
Was the trigger process defined correctly?
Did the application complete correctly?
Look for evidence of an abnormal end in the job log.
Did the application commit its changes, or were they backed out?

If multiple transactions are serving the queue, they can conflict with one another. For example, suppose one transaction issues an MQGET call with a buffer length of zero to find out the length of the message, and then issues a specific MQGET call specifying the MsgId of that message. However, in the meantime, another transaction issues a successful MQGET call for that message, so the first application receives a reason code of MQRC_NO_MSG_AVAILABLE. Applications that are expected to run in a multiple server environment must be designed to cope with this situation.
Consider that the message could have been received, but that your application failed to process it in some way. For example, did an error in the expected format of the message cause your program to reject it? If this is the case, refer to Messages that contain unexpected or corrupted information.

Messages that contain unexpected or corrupted information

If the information contained in the message is not what your application was expecting, or has been corrupted in some way, consider the following:

Has your application, or the application that put the message onto the queue, changed?
Ensure that all changes are simultaneously reflected on all systems that need to be aware of the change.
For example, the format of the message data might have been changed, in which case, both applications must be recompiled to pick up the changes. If one application has not been recompiled, the data will appear corrupt to the other.
Is an application sending messages to the wrong queue?
Check that the messages your application is receiving are not really intended for an application servicing a different queue. If necessary, change your security definitions to prevent unauthorized applications from putting messages on to the wrong queues.
If your application uses an alias queue, check that the alias points to the correct queue.
Has the trigger information been specified correctly for this queue?
Check that your application should have started; or should a different application have started?

If these checks do not enable you to solve the problem, check your application logic, both for the program sending the message, and for the program receiving it.

Problems with incorrect output when using distributed queues

If your application uses distributed queues, consider the following points:

Has WebSphere MQ been correctly installed on both the sending and receiving systems, and correctly configured for distributed queuing?
Are the links available between the two systems?
Check that both systems are available, and connected to WebSphere MQ. Check that the connection between the two systems is active.

You can use the MQSC command PING against either the queue manager (PING QMGR) or the channel (PING CHANNEL) to verify that the link is operable.
Is triggering set on in the sending system?
Is the message for which you are waiting a reply message from a remote system?
Check that triggering is activated in the remote system.
Is the queue already full?
If so, check if the message has been put onto the dead-letter queue.

The dead-letter queue header contains a reason or feedback code explaining why the message could not be put onto the target queue.
Is there a mismatch between the sending and receiving queue managers?
For example, the message length could be longer than the receiving queue manager can handle.
Are the channel definitions of the sending and receiving channels compatible?
For example, a mismatch in sequence number wrap can stop the distributed queuing component. See WebSphere MQ Intercommunication for more information about distributed queuing.
Is data conversion involved? If the data formats between the sending and receiving applications differ, data conversion is necessary. Automatic conversion occurs when the MQGET call is issued if the format is recognized as one of the built-in formats.
If the data format is not recognized for conversion, the data conversion exit is taken to allow you to perform the translation with your own routines.
Refer to the WebSphere MQ Application Programming Guide for further details of data conversion.

Have you failed to receive a response from a PCF command?

If you have issued a command but have not received a response, consider the following:

Is the command server running?
Work with the dspmqcsv command to check the status of the command server.

If the response to this command indicates that the command server is not running, use the strmqcsv command to start it.
If the response to the command indicates that the SYSTEM.ADMIN.COMMAND.QUEUE is not enabled for MQGET requests, enable the queue for MQGET requests.

Has a reply been sent to the dead-letter queue?
The dead-letter queue header structure contains a reason or feedback code describing the problem.
If the dead-letter queue contains messages, you can use the provided browse sample application (amqsbcg) to browse the messages using the MQGET call. The sample application steps through all the messages on a named queue for a named queue manager, displaying both the message descriptor and the message context fields for all the messages on the named queue.
Has a message been sent to the error log?
See Error logs for further information.
Are the queues enabled for put and get operations?
Is the WaitInterval long enough?
If your MQGET call has timed out, a completion code of MQCC_FAILED and a reason code of MQRC_NO_MSG_AVAILABLE are returned.
If you are using your own application program to put commands onto the SYSTEM.ADMIN.COMMAND.QUEUE, do you need to take a syncpoint?
Unless you have specifically excluded your request message from syncpoint, you need to take a syncpoint before receiving reply messages.
Are the MAXDEPTH and MAXMSGL attributes of your queues set sufficiently high?
Are you using the CorrelId and MsgId fields correctly?
Set the values of MsgId and CorrelId in your application to ensure that you receive all messages from the queue.

Try stopping the command server and then restarting it, responding to any error messages that are produced.
If the system still does not respond, the problem could be with either a queue manager or the whole of the WebSphere MQ system. First, try stopping individual queue managers to isolate a failing queue manager. If this does not reveal the problem, try stopping and restarting WebSphere MQ, responding to any messages that are produced in the error log.
If the problem still occurs after restart, contact your IBM Support Center for help.

Are some of your queues failing?

If you suspect that the problem occurs with only a subset of queues, check the local queues that you think are having problems:

Display the information about each queue. You can use the MQSC command DISPLAY QUEUE to display the information.
Use the data displayed to do the following checks:

If CURDEPTH is at MAXDEPTH, the queue is not being processed. Check that all applications are running normally.
If CURDEPTH is not at MAXDEPTH, check the following queue attributes to ensure that they are correct:

If triggering is being used:

Is the trigger monitor running?
Is the trigger depth too great? That is, does it generate a trigger event often enough?
Is the process name correct?
Is the process available and operational?

Can the queue be shared? If not, another application could already have it open for input.
Is the queue enabled appropriately for GET and PUT?

If there are no application processes getting messages from the queue, determine why this is so. It could be because the applications need to be started, a connection has been disrupted, or the MQOPEN call has failed for some reason.
Check the queue attributes IPPROCS and OPPROCS. These attributes indicate whether the queue has been opened for input and output. If a value is zero, it indicates that no operations of that type can occur. The values might have changed; the queue might have been open but is now closed.
You need to check the status at the time you expect to put or get a message.

If you are unable to solve the problem, contact your IBM Support Center for help.

Does the problem affect only remote queues?

If the problem affects only remote queues:

Check that required channels have started, can be triggered, and any required initiators are running.
Check that the programs that should be putting messages to the remote queues have not reported problems.
If you use triggering to start the distributed queuing process, check that the transmission queue has triggering set on. Also, check that the trigger monitor is running.
Check the error logs for messages indicating channel errors or problems.
If necessary, start the channel manually. See WebSphere MQ Intercommunication for information about starting channels.

Is your application or system running slowly?

If your application is running slowly, it might be in a loop or waiting for a resource that is not available.
This might also indicate a performance problem. Perhaps your system is operating near the limits of its capacity. This type of problem is probably worst at peak system load times, typically at mid-morning and mid-afternoon. (If your network extends across more than one time zone, peak system load might seem to occur at some other time.)
A performance problem might be caused by a limitation of your hardware.
If you find that performance degradation is not dependent on system loading, but happens sometimes when the system is lightly loaded, a poorly-designed application program is probably to blame. This could appear to be a problem that only occurs when certain queues are accessed.
The following symptoms might indicate that WebSphere MQ is running slowly:

Your system is slow to respond to MQSC commands.
Repeated displays of the queue depth indicate that the queue is being processed slowly for an application with which you would expect a large amount of queue activity.

If the performance of your system is still degraded after reviewing the above possible causes, the problem might lie with WebSphere MQ itself. If you suspect this, contact your IBM Support Center for help.

Tuning performance for nonpersistent messages on AIX

If you are using AIX, consider setting your tuning parameter to exploit full performance for nonpersistent messages. To set this tuning parameter, a root user must issue the command:
/usr/samples/kernal/vmtune -c 0.

The effect of this command persists until the next reboot. An AIX administrator can add a line such as...
vmtune:2:once:/usr/samples/kernal/vmtune -c 0

...to the system /etc/inittab file to cause the command to be issued on every reboot. This command is made available by installing the bos.adt.samples fileset from the AIX installation CDs.
Normally, nonpersistent messages are kept only in memory, but there are circumstances where AIX can schedule nonpersistent messages to be written to disk. Messages scheduled to be written to disk are unavailable for MQGET until the disk write completes. The suggested tuning command varies this threshold; instead of scheduling messages to be written to disk when 16 kilobytes of data are queued, the write-to-disk occurs only when real storage on the machine becomes close to full.

WebSphere is a trademark of the IBM Corporation in the United States, other countries, or both.

IBM is a trademark of the IBM Corporation in the United States, other countries, or both.

AIX is a trademark of the IBM Corporation in the United States, other countries, or both.