Required information for highly available agents in Managed File Transfer

There are various types of information that we need to know about standard or bridge MFT agents that are running in a high availability configuration. This information includes the different methods by which the agent starts, how to identify the instance of the agent in the log file, and status information for the agent.

Starting an agent

An instance of an agent is running in a non-HA mode elsewhere

If an attempt is made to start another instance of the agent that is not configured as an HA agent, a check is first made to see whether a lock can be acquired on the SYSTEM.FTE.HA.agent name queue.

As the other instance was started in a non-HA mode, the lock on the SYSTEM.FTE.HA.agent name queue will be acquired by this instance. The agent continues initialization, but fails at a later point because the command queue is opened exclusively by another instance.

In this case the messages shown in the following example are logged to the output0.log file of the agent and the agent continues its attempt open the command queue every 30 seconds:

BFGMQ1045I: Agent's system queue 'SYSTEM.FTE.COMMAND.SRC' is configured as either NOSHARE or 
DEFSOPT(EXCL).

BFGAG0035W: The agent received MQI reason code 2042 when trying to open queue 
'SYSTEM.FTE.COMMAND.SRC' on the queue manager 'MFTHAQM' with connection name 'localhost(1414)' 
and channel 'MFT_HA_CHN'.  The agent will try the operation again every 30 seconds.

An instance of an agent is running in an HA mode elsewhere

If an attempt is made to start another instance of the agent that is not configured as an HA agent, a check is first made to see whether a lock can be acquired on the SYSTEM.FTE.HA.agent name queue.

Because the other instance has been running as an active instance, the attempt to acquire a lock fails. The instance fails to start, and the following error message is logged to the output0.log file of the agent:

BFGAG0194E: An instance of this agent is already running elsewhere. 
Hence this instance cannot continue and will end.

Starting the agent as a Windows service

On Windows, we can start an agent as a Windows service.

During start up, Windows starts the MFT agent in normal or HA mode. If the agent is configured to run in HA mode, the service runs as an active or standby instance, depending which instance acquires the lock first.

Identifying the instance type of an agent in the log file

Information messages are written to the output0.log file of the agent to indicate the type of instance. When an agent instance starts as an active instance, the following message is written:

BFGAG0193I: The agent has successfully initialized as an active instance.

When an agent instance starts as a standby instance, the following message is written:

BFGAG0193I: The agent has successfully initialized as a standby instance.

Agent status updates

As there are two instances of the same agent running, we need to have the information about both instances in the agent status publication.

Note that the active instance is the one publishing the status of both instances.

Standby instance

While publishing agent status, the active instance checks the age of the standby instance publication.

There are two additional properties in the agent.properties file for this purpose:

standbyStatusExpiry is the expiry time for the standby status message to be put to the command queue of the agent. The message expires if the active instance of an agent does not process this message in that period.
By default, the value of standbyStatusExipry is 30 seconds. The message is also a low priority, 9, message to allow priority processing of transfer requests over standby status messages.
standbyStatusPublishInterval sets the frequency at which the standby instance publishes its state.

Active instanceThe active instance does the following to process status updates from the standby instance:

Gets the message from the SYSTEM.FTE.COMMAND.<agent name> queue and delegates the message processing to a worker thread.
The worker thread retrieves the contents from the message body, updates the agent status object with standby instance information, and notifies the agent status publisher to publish the status.
The agent status publisher publishes the status.
Note that optimizations are done here to cache the standby status information. When a request is made, the agent status publisher checks the new status with the cached status, and publishes only if there is a difference.

The following diagram describes the flow the active or standby instances follow to publish the status of an agent:

Parent topic: Highly available agents in Managed File Transfer