Resolving problem: MQTT client does not connect
Resolve the problem of an MQTT client program failing to connect to the telemetry (MQXR) service.
Before starting
Is the problem at the server, at the client, or with the connection? Have you have written your own MQTT v3 protocol handling client, or an MQTT client application using the C or Java IBM MQ TT clients?
See Verify the installation of MQ Telemetry for further information, and check that the telemetry channel and telemetry (MQXR) service are running correctly.
There are a number of reasons why an MQTT
client might not connect, or you might conclude it has not connected, to the telemetry server.
Procedure
-
Consider what inferences can be drawn from the reason code that the telemetry (MQXR) service
returned to MqttClient.Connect. What type of connection failure is it?
Option Description REASON_CODE_INVALID_PROTOCOL_VERSION Make sure that the socket address corresponds to a telemetry channel, and you have not used the same socket address for another broker.
REASON_CODE_INVALID_CLIENT_ID Check that the client identifier is no longer than 23 bytes, and contains only characters from the range: A-Z, a-z, 0-9, './_%
REASON_CODE_SERVER_CONNECT_ERROR Check that the telemetry (MQXR) service and the queue manager are running normally. Use netstat to check that the socket address is not allocated to another application.
If we have written an MQTT client library rather than use one of the libraries provided by MQ Telemetry, look at the CONNACK return code.
From these three errors we can infer that the client has connected to the telemetry (MQXR) service, but the service has found an error.
-
Consider what inferences can be drawn from the reason codes that the client produces when the
telemetry (MQXR) service does not respond:
Option Description REASON_CODE_CLIENT_EXCEPTION
REASON_CODE_CLIENT_TIMEOUT
Look for an FDC file at the server; see Server-side logs. When the telemetry (MQXR) service detects the client has timed out, it writes a first-failure data capture (FDC) file. It writes an FDC file whenever the connection is unexpectedly broken.
The telemetry (MQXR) service might not have responded to the client, and the timeout at the client expires. The MQ Telemetry Java client only hangs if the application has set an indefinite timeout. The client throws one of these exceptions after the timeout set for MqttClient.Connect expires with an undiagnosed connection problem.
Unless you find an FDC file that correlates with the connection failure we cannot infer that the client tried to connect to the server:
-
Confirm that the client sent a connection request.
Check the TCPIP request with a tool such as tcpmon, available from (for example) https://code.google.com/archive/p/tcpmon/
-
Does the remote socket address used by the client match the socket address defined for the
telemetry channel?
The default file persistence class in the Java SE MQTT client supplied with IBM MQ Telemetry creates a folder with the name: clientIdentifier-tcphostNameport or clientIdentifier-sslhostNameport in the client working directory. The folder name tells you the hostName and port used in the connection attempt; see Client-side log files and client-side configuration files.
- Can you ping the remote server address?
- Does netstat on the server show the telemetry channel is running on the port the client is connecting too?
-
Confirm that the client sent a connection request.
-
Check whether the telemetry (MQXR) service found a problem in the client request.
The telemetry (MQXR) service writes errors it detects into mqxr_n.log, and the queue manager writes errors into AMQERR01.LOG ; see
-
Attempt to isolate the problem by running another client.
See Verify the
installation of MQ Telemetry for further
information
Run the sample programs on the server platform to eliminate uncertainties about the network connection, then run the samples on the client platform.
-
Other things to check:
-
Are tens of thousands of MQTT clients trying to
connect at the same time?
Telemetry channels have a queue to buffer a backlog of incoming connections. Connections are processed in excess of 10,000 a second. The size of the backlog buffer is configurable using the telemetry channel wizard in IBM MQ Explorer. Its default size is 4096. Check that the backlog has not been configured to a low value.
- Are the telemetry (MQXR) service and queue manager still running?
- Has the client connected to a high availability queue manager that has switched its TCPIP address?
- Is a firewall selectively filtering outbound or return data packets?
-
Are tens of thousands of MQTT clients trying to
connect at the same time?
Parent topic: MQ Telemetry troubleshooting