Timeout option for file transfers in recovery
We can set the amount of time, in seconds, during which a source agent keeps trying to recover a stalled file transfer. If the transfer is not successful when the agent reaches the timeout for the retry interval, the transfer fails.
Before Version 9.0.1, the default behavior of a Managed File Transfer source agent is to keep trying to recover a stalled transfer until it is successful. Because the new parameter is optional, we do not have to set it and the transfers follow the default behavior.
From Version 9.0.1, we can specify one of the following options:
- -1
- The agent continues to attempt to recover the stalled transfer until the transfer is successful. Using this option is the equivalent of the default behavior of the agent when the property is not set.
- 0
- The agent stops the file transfer as soon as it enters recovery.
- >0
- The agent continues to attempt to recover the stalled transfer for the amount of time in seconds as set by the positive integer value specified. A value of 21600 indicates that the agent keeps trying to recover the transfer for 6 hours from when it enters recovery. Maximum value for this parameter is 999999999.
Set the recovery retry timeout for all the transfers for one source agent
To set a recovery timeout that applies to all the transfers for a source agent, add the parameter and value pair to the agent.properties file.
In this example, setting a transfer recovery timeout value as -1 means that the agent continues to try to recover the stalled transfer until it completes successfully:transferRecoveryTimeout=-1Setting a transfer recovery timeout value as 0 means the agent marks transfer as failed immediately upon entering recovery:transferRecoveryTimeout=0Setting a transfer recovery timeout value of 21600 means the agent keeps retrying a stalled transfer for 6 hours before the transfer is marked as failed:transferRecoveryTimeout=21600
Set or overriding the recovery retry timeout for individual transfers
We can set the recovery retry timeout parameter for an individual transfer when you are:
Setting the -rt value to -1 for a transfer is the equivalent of the default behavior, recovery continues until the transfer is successful or it is manually canceled by the user, for example
- Creating a transfer by using the fteCreateTransfer command in Managed File Transfer command line interface or IBM MQ Explorer
- Creating a transfer template by using the fteCreateTemplate: create new file transfer template command in Managed File Transfer command line interface or IBM MQ Explorer
- Creating a monitor by using the fteCreateMonitor Managed File Transfer command line interface or IBM MQ Explorer
- Copying or moving files by using fte:filecopy or fte:filemove Ant tasks
fteCreateTransfer -sa AGENT1 -da AGENT2 -rt -1 -df C:\import\transferredfile.txt C:\export\originalfile.txtFor more information about using the transfer recovery timeout parameter while creating a new transfer, see fteCreateTransfer command . Setting the -rt parameter value to 0 indicates that if the transfer initiated by using this template is stalled, it fails immediately and no recovery is attempted, for examplefteCreateTemplate -tn "payroll accounts monthly report template" -rt 0 -sa PAYROLL -sm QM_PAYROLL1 -da ACCOUNTS -dm QM_ACCOUNTS -df C:\payroll_reports\*.xls C:\out\*.xlsFor more information about using the transfer recovery timeout parameter while creating a new transfer template, see fteCreateTemplate command.We can use IBM MQ Explorer to set the recovery timeout parameter and value for transfers. For more information on using IBM MQ Explorer to configure transfers, see Starting a new file transfer and Creating a file transfer template using IBM MQ Explorer.
We can also set the recovery timeout by including the transferRecoveryTimeout option and value, with the fte:filecopy or fte:filemove elements for moving or copying files by using Ant tasks, for example<fte:filecopy cmdqm="qm0@localhost@1414@SYSTEM.DEF.SVRCONN" src="agent1@qm1" dst="agent2@qm2" rcproperty="copy.result" transferRecoveryTimeout="0"> <fte:filespec srcfilespec="/home/fteuser1/file.bin" dstfile="/home/fteuser2/file.bin"/> </fte:filecopy>When the file copy task is initiated and the transfer enters recovery, the transfer stops immediately without attempting recovery. Setting the transferRecoveryTimeout option with fte:filecopy or fte:filemove, overrides the value set in agent.properties file. If the transferRecoveryTimeout value is not set with fte:filecopy or fte:filemove, the value of transferRecoveryTimeout parameter from the agent.properties file is used. For more information, see fte:filecopy Ant task and fte:filemove Ant task.
Handling recovery timeout precedence
Transfer recovery timeout value as specified through the command line interface argument for create transfer, template, or monitor commands (including setting the option in IBM MQ Explorer Wizard) or as specified in the fte:filespec nested element, takes precedence over the value that is specified for the transferRecoveryTimeout parameter in the agent.properties file for the source agent. For example, for the commandfteCreateTransfer -sa AGENT1 -da AGENT2 -df C:\import\transferredfile.txt C:\export\originalfile.txtthat is started without the -rt parameter and value pair, the source agent AGENT1 checks the agent.properties file for a transferRecoveryTimeout value to determine the recovery timeout behavior.If in the agent.properties file, the transferRecoveryTimeout is not set or is set to -1, the agent follows the default behavior and tries to recover the transfer until it is successful.
When the recovery timeout option -rt is specified through the Managed File Transfer command line interface, for example, with the fteCreateTransfer command, this value takes precedence over the value in the agent.properties file and is used as the setting for the transfer.fteCreateTransfer -sa AGENT1 -da AGENT2 -rt 21600 -df C:\import\transferredfile.txt C:\export\originalfile.txt
Handling recovery timeout counter
The recovery timeout counter starts when the transfer enters recovering state. A transfer log message is published to the SYSTEM.FTE topic with the topic string Log/agent_name/transfer_ID to indicate that the transfer status is changed to recovering and the source agent clock time at which the status changed. If the transfer is resumed within the set retry interval and does not reach the recovery timeout (counter<=recovery timeout), then the counter is reset to 0, ready to start again if the transfer enters recovery.
If the counter reaches the maximum value set for the recovery timeout (counter==recovery timeout), the recovery of the transfer stops and the source agent reports the transfer as failed. This type of transfer failure, caused by the fact that the transfer reached the recovery timeout, is indicated by a new message code, RECOVERY TIMEOUT (69). Another transfer log message is published to the SYSTEM.FTE topic, with a topic string of Log/agent_name/transfer_ID, to indicate that the transfer is failed and includes a new message, the new return code, and the source agent's event log. Source Agent's event log is updated with a message when any of the following events occur during recovery:These log messages enable the users (subscribers and loggers) to identify the transfers that failed due to the transfer recovery timeout.
- When the recovery timeout parameter is set to a value greater than -1, the Managed File Transfer enters recovery. The agent's event log is updated to indicate the start of the recovery timer for the TransferId and the amount of time the source agent waits before it initiates the recovery timeout processing.
- When the recovering Managed File Transfer is resumed, the source agent's event log is updated with a new message to indicate that the TransferId that was in recovery is resumed.
- When a recovering Managed File Transfer has timed out, the source agent's event log is updated to indicate the TransferId that failed while recovering, due to recovery timeout.
The counter for the recovery timeout is always at the source agent. However, if the destination agent fails to receive information from the source agent in a timely manner, it can send a request to the source agent to put the transfer in recovery. For a transfer where the recovery timeout option is set, the source agent starts the recovery timeout counter when it receives the request from the destination agent.
Manual handling is still required for transfers that do not use the recovery timeout option, the failed, and partially complete transfers.
For transfer sets, where a single transfer request is issued for multiple files, and some of the files completed successfully but one completed only partially, the transfer is still marked as failed as it did not complete as expected. The source agent might have timed out while transferring the partially completed file.
Ensure that the destination agent and file server are ready and in a state to accept file transfers.
You have to issue the transfer request again for the entire set, but to avoid problems because some of the files remain on the destination from the initial transfer attempt, we can issue the new request with the overwrite if existing option specified. This ensures that the incomplete set of files from the previous transfer attempt are cleaned up as a part of the new transfer, before the files are written to the destination again.
Traces and messages
Tracing points are included for diagnostic purposes. Recovery timeout value, start of the retry interval, start of the resume period and counter reset, and whether the transfer timed out and failed, are logged. In case of a problem or unexpected behavior, we can collect the source agent output log and trace files, and provide them when requested by IBM support, to help with troubleshooting.
Messages notify the user when a transfer enters recovery (BFGTR0081I), is terminated because it timed out from recovery (BFGSS0081E) and when it resumes after being in recovery (BFGTR0082I).
- BFGTR0001 - BFGTR9999
- BFGTR0081I
- BFGSS0001 - BFGSS9999
- BFGSS0081E