+

Search Tips | Advanced Search

Timeout option for file transfers in recovery

We can set the amount of time, in seconds, during which a source agent keeps trying to recover a stalled file transfer. If the transfer is not successful when the agent reaches the timeout for the retry interval, the transfer fails.

Before Version 9.0.1, the default behavior of a Managed File Transfer source agent is to keep trying to recover a stalled transfer until it is successful. Because the new parameter is optional, we do not have to set it and the transfers follow the default behavior.

From Version 9.0.1, we can specify one of the following options:


Set the recovery retry timeout for all the transfers for one source agent

To set a recovery timeout that applies to all the transfers for a source agent, add the parameter and value pair to the agent.properties file.

In this example, setting a transfer recovery timeout value as -1 means that the agent continues to try to recover the stalled transfer until it completes successfully:
transferRecoveryTimeout=-1
Setting a transfer recovery timeout value as 0 means the agent marks transfer as failed immediately upon entering recovery:
transferRecoveryTimeout=0
Setting a transfer recovery timeout value of 21600 means the agent keeps retrying a stalled transfer for 6 hours before the transfer is marked as failed:
transferRecoveryTimeout=21600


Set or overriding the recovery retry timeout for individual transfers

We can set the recovery retry timeout parameter for an individual transfer when you are:

Setting the -rt value to -1 for a transfer is the equivalent of the default behavior, recovery continues until the transfer is successful or it is manually canceled by the user, for example
fteCreateTransfer -sa AGENT1 -da AGENT2 -rt -1 -df C:\import\transferredfile.txt C:\export\originalfile.txt
For more information about using the transfer recovery timeout parameter while creating a new transfer, see fteCreateTransfer command . Setting the -rt parameter value to 0 indicates that if the transfer initiated by using this template is stalled, it fails immediately and no recovery is attempted, for example
fteCreateTemplate -tn "payroll accounts monthly report template" -rt 0 -sa PAYROLL -sm QM_PAYROLL1 -da ACCOUNTS 
-dm QM_ACCOUNTS -df C:\payroll_reports\*.xls C:\out\*.xls
For more information about using the transfer recovery timeout parameter while creating a new transfer template, see fteCreateTemplate command.

We can use IBM MQ Explorer to set the recovery timeout parameter and value for transfers. For more information on using IBM MQ Explorer to configure transfers, see Starting a new file transfer and Creating a file transfer template using IBM MQ Explorer.

We can also set the recovery timeout by including the transferRecoveryTimeout option and value, with the fte:filecopy or fte:filemove elements for moving or copying files by using Ant tasks, for example
<fte:filecopy cmdqm="qm0@localhost@1414@SYSTEM.DEF.SVRCONN" 
              src="agent1@qm1" dst="agent2@qm2"
              rcproperty="copy.result" transferRecoveryTimeout="0">    
                
	<fte:filespec srcfilespec="/home/fteuser1/file.bin" dstfile="/home/fteuser2/file.bin"/>

</fte:filecopy>
When the file copy task is initiated and the transfer enters recovery, the transfer stops immediately without attempting recovery. Setting the transferRecoveryTimeout option with fte:filecopy or fte:filemove, overrides the value set in agent.properties file. If the transferRecoveryTimeout value is not set with fte:filecopy or fte:filemove, the value of transferRecoveryTimeout parameter from the agent.properties file is used. For more information, see fte:filecopy Ant task and fte:filemove Ant task.


Handling recovery timeout precedence

Transfer recovery timeout value as specified through the command line interface argument for create transfer, template, or monitor commands (including setting the option in IBM MQ Explorer Wizard) or as specified in the fte:filespec nested element, takes precedence over the value that is specified for the transferRecoveryTimeout parameter in the agent.properties file for the source agent. For example, for the command
fteCreateTransfer -sa AGENT1 -da AGENT2 -df C:\import\transferredfile.txt C:\export\originalfile.txt
that is started without the -rt parameter and value pair, the source agent AGENT1 checks the agent.properties file for a transferRecoveryTimeout value to determine the recovery timeout behavior.

If in the agent.properties file, the transferRecoveryTimeout is not set or is set to -1, the agent follows the default behavior and tries to recover the transfer until it is successful.

When the recovery timeout option -rt is specified through the Managed File Transfer command line interface, for example, with the fteCreateTransfer command, this value takes precedence over the value in the agent.properties file and is used as the setting for the transfer.
fteCreateTransfer -sa AGENT1 -da AGENT2 -rt 21600 -df C:\import\transferredfile.txt C:\export\originalfile.txt


Handling recovery timeout counter

The recovery timeout counter starts when the transfer enters recovering state. A transfer log message is published to the SYSTEM.FTE topic with the topic string Log/agent_name/transfer_ID to indicate that the transfer status is changed to recovering and the source agent clock time at which the status changed. If the transfer is resumed within the set retry interval and does not reach the recovery timeout (counter<=recovery timeout), then the counter is reset to 0, ready to start again if the transfer enters recovery.

If the counter reaches the maximum value set for the recovery timeout (counter==recovery timeout), the recovery of the transfer stops and the source agent reports the transfer as failed. This type of transfer failure, caused by the fact that the transfer reached the recovery timeout, is indicated by a new message code, RECOVERY TIMEOUT (69). Another transfer log message is published to the SYSTEM.FTE topic, with a topic string of Log/agent_name/transfer_ID, to indicate that the transfer is failed and includes a new message, the new return code, and the source agent's event log. Source Agent's event log is updated with a message when any of the following events occur during recovery:

These log messages enable the users (subscribers and loggers) to identify the transfers that failed due to the transfer recovery timeout.

The counter for the recovery timeout is always at the source agent. However, if the destination agent fails to receive information from the source agent in a timely manner, it can send a request to the source agent to put the transfer in recovery. For a transfer where the recovery timeout option is set, the source agent starts the recovery timeout counter when it receives the request from the destination agent.

Manual handling is still required for transfers that do not use the recovery timeout option, the failed, and partially complete transfers.

For transfer sets, where a single transfer request is issued for multiple files, and some of the files completed successfully but one completed only partially, the transfer is still marked as failed as it did not complete as expected. The source agent might have timed out while transferring the partially completed file.

Ensure that the destination agent and file server are ready and in a state to accept file transfers.

You have to issue the transfer request again for the entire set, but to avoid problems because some of the files remain on the destination from the initial transfer attempt, we can issue the new request with the overwrite if existing option specified. This ensures that the incomplete set of files from the previous transfer attempt are cleaned up as a part of the new transfer, before the files are written to the destination again.


Traces and messages

Tracing points are included for diagnostic purposes. Recovery timeout value, start of the retry interval, start of the resume period and counter reset, and whether the transfer timed out and failed, are logged. In case of a problem or unexpected behavior, we can collect the source agent output log and trace files, and provide them when requested by IBM support, to help with troubleshooting.

Messages notify the user when a transfer enters recovery (BFGTR0081I), is terminated because it timed out from recovery (BFGSS0081E) and when it resumes after being in recovery (BFGTR0082I).