Resolving problem: Lost messages in an MQTT application

Resolve the problem of losing a message. Is the message non-persistent, sent to the wrong place, or never sent? A wrongly coded client program might lose messages.


Before starting

How certain are you that the message you sent, was lost? Can you infer that a message is lost because the message was not received? If message is a publication, which message is lost: the message sent by the publisher, or the message sent to the subscriber? Or did the subscription get lost, and the broker is not sending publications for that subscription to the subscriber?

If the solution involves distributed publish/subscribe, using clusters or publish/subscribe hierarchies, there are numerous configuration issues that might result in the appearance of a lost message.

If you sent a message with At least once or At most once quality of service, it is likely that the message you think is lost was not delivered in the way you expected. It is unlikely that the message has been wrongly deleted from the system. It might have failed to create the publication or the subscription you expected.

The most important step you take in doing problem determination of lost messages is to confirm the message is lost. Re-create the scenario and lose more messages. Use the At least once or At most once quality of service to eliminate all cases of the system discarding messages.


There are four legs to diagnosing a lost message.

  1. Fire and forget messages working as-designed. Fire and forget messages are sometimes discarded by the system.
  2. Configuration: setting up publish/subscribe with the correct authorities in a distributed environment is not straightforward.
  3. Client programming errors: the responsibility for message delivery is not solely the responsibility of code written by IBM .
  4. If we have exhausted all these possibilities, you might decide to involve IBM Support.


Procedure

  1. If the lost message had the Fire and forget quality of service, set the At least once or At most once quality of service. Attempt to lose the message again.

    • Messages sent with Fire and forget quality of service are thrown away by IBM MQ in a number of circumstances:

      • Communications loss and channel stopped.
      • Queue manager shut down.
      • Excessive number of messages.

    • The delivery of Fire and forget messages depends upon the reliability of TCP/IP. TCP/IP continues to send data packets again until their delivery is acknowledged. If the TCP/IP session is broken, messages with the Fire and forget quality of service are lost. The session might be broken by the client or server closing down, a communications problem, or a firewall disconnecting the session.

  2. Check that client is restarting the previous session, in order to send undelivered messages with At least once or At most once quality of service again.
    1. If the client application is using the Java SE MQTT client, check that it sets MqttClient.CleanSession to false
    2. If we are using different client libraries, check that a session is being restarted correctly.

  3. Check that the client application is restarting the same session, and not starting a different session by mistake.

    To start the same session again, cleanSession = false, and the Mqttclient.clientIdentifier and the MqttClient.serverURI must be the same as the previous session.

  4. If a session closes prematurely, check that the message is available in the persistence store at the client to send again.
    1. If the client application is using the Java SE MQTT client, check that the message is being saved in the persistence folder; see Client-side log files and client-side configuration files
    2. If we are using different client libraries, or you have implemented your own persistence mechanism, check that it is working correctly.

  5. Check that no one has deleted the message before it was delivered.

    Undelivered messages awaiting delivery to MQTT clients are stored in SYSTEM.MQTT.TRANSMIT.QUEUE. Messages awaiting delivery to the telemetry server are stored by the client persistence mechanism; see Message persistence in MQTT clients.

  6. Check that the client has a subscription for the publication it expects to receive.

    List subscriptions using IBM MQ Explorer, or by using runmqsc or PCF commands. All MQTT client subscriptions are named. They are given a name of the form: ClientIdentifier:Topic name

  7. Check that the publisher has authority to publish, and the subscriber to subscribe to the publication topic.
    dspmqaut -m qMgr -n topicName -t topic -p user ID
    

    In a clustered publish/subscribe system, the subscriber must be authorized to the topic on the queue manager to which the subscriber is connected. It is not necessary for the subscriber to be authorized to subscribe to the topic on the queue manager where the publication is published. The channels between the queue managers must be correctly authorized to pass on the proxy subscription and forward the publication.

    Create the same subscription and publish to it using IBM MQ Explorer. Simulate the application client publishing and subscribing by using the client utility. Start the utility from IBM MQ Explorer and change its user ID to match the one adopted by your client application.

  8. Check that the subscriber has permission to put the publication on the SYSTEM.MQTT.TRANSMIT.QUEUE.
    dspmqaut -m qMgr -n queueName -t queue -p user ID
    
  9. Check that the IBM MQ point-to-point application has authority to put its message on the SYSTEM.MQTT.TRANSMIT.QUEUE.
    dspmqaut -m qMgr -n queueName -t queue -p user ID
    

    See Send a message to a client directly.

Parent topic: MQ Telemetry troubleshooting