Retry-step processing
Use retry-step processing to try job steps again when the processJobStep method encounters errors in a transactional batch job. Specify retry-step policies in the xJCL.
Each job step has its own retry-step policy configuration. You enable retry-step processing by specifying a non-zero value for the com.ibm.batch.step.retry.count job step property in the xJCL.
We can refine retry-step processing using the com.ibm.batch.step.retry.include.exception.class.<n> property to specify what exceptions can be tried again when a step fails and the com.ibm.batch.step.retry.exclude.exception.class.<n> property to specify what exceptions cannot be tried again when a step fails. The two properties are mutually exclusive.
The batch framework tracks retry-step processing on a per step basis in the local job status database. At the end of step processing, a message is written to the job log. The message indicates the number of times that the step was tried again and the total clock time that the step used. The format of the clock time is HH:MM:SS:MMM where HH is hours, MM is minutes, SS is seconds, and MMM is milliseconds.
The following list contains the retry-step properties followed by a description.
- com.ibm.batch.step.retry.count
Number of times a step can be tried again due to an error in step processing for the processJobStep method. When the limit is reached, no further step errors are tried again.
The BatchJobStepInterface.processJobStep method supports the throws java.lang.Exception clause. Any exception from the processJobStep method is eligible for retry-step processing.
Trying a step again is equivalent to restarting it. The BatchJobStepInterface.destroyJobStep method is called after the step error. The checkpoint transaction is rolled back before restarting the step. The BatchJobStepInterface.createJobStep method is called when a step is tried again. All batch data streams associated with the step are closed and reopened upon trying again.
If an error occurs for the step after the limit for trying a step again is reached, then the step fails and the job ends in the restartable state.
If we register a retry listener with the job step context, the retry listener receives control on every exception that can be tried again. The RetryListener.onError(Throwable t) method is called before the failed step enters the destroyJobStep method and before the checkpoint transaction is rolled back. The RetryListener.onRetry(Throwable t) method receives control when the step is tried again, but before the BatchJobStepInterface.createJobStep method is called.
The retry listener is unregistered immediately after the RetryListener.onRetry method is called. If we want the batch application to listen for further attempts to try the step again, reregister the retry listener.
The running count of the number of times a step can be tried again is reset to zero at every checkpoint. This means that the retry limit is effectively a per-checkpoint limit.
Retry-step processing is disabled by default.
- com.ibm.batch.step.retry.delay.time
Number of milliseconds to wait before trying the step again. The delay occurs after the failed step goes through the destroyJobStep method and after the checkpoint transaction is rolled back. However, the delay occurs before calling the RetryListener.onRetry method.
- com.ibm.batch.step.retry.include.exception.class.<n>
List of exceptions that can be tried again when a step fails.
The <n> is an integer. Start the variable at 1 and increment it by one for each exception.
If we do not specify any exceptions, then the default is that all exceptions are included in the list.
The following example uses the property:
<job-step name="WCGStep1"> <classname>com.ibm.ws.batch.sample.WCGSampleBDSBatchStep</classname> <checkpoint-algorithm-ref name="chkpt"/> <results-ref name="jobsum"/> <props> <prop name="com.ibm.batch.step.retry.count" value="1" /> <prop name="com.ibm.batch.step.retry.delay.time" value="3000" /> <prop name="com.ibm.batch.step.retry.include.exception.class.1" value="java.sql.SQLException" /> </props> ... </job-step>The WCGStep1 job step tries a job step again for a Structured Query Language (SQL) exception.
- com.ibm.batch.step.retry.exclude.exception.class.<n>
List of exceptions that cannot be tried again when a step fails.
The <n> variable is an integer. Start the variable at 1 and increment it by one for each exception.
If we do not specify any exceptions, then the default is that no exceptions are excluded from the list.
The following example uses the property:
<job-step name="WCGStep1"> <classname>com.ibm.ws.batch.sample.WCGSampleBDSBatchStep</classname> <checkpoint-algorithm-ref name="chkpt"/> <results-ref name="jobsum"/> <props> <prop name="com.ibm.batch.step.retry.count" value="1" /> <prop name="com.ibm.batch.step.retry.delay.time" value="3000" /> <prop name="com.ibm.batch.step.retry.exclude.exception.class.1" value="java.sql.SQLException" /> </props> ... </job-step>The WCGStep1 job step does not try a job step again for a Structured Query Language (SQL) exception.
Retry listeners
We can register a retry listener with the JobStepContext method to listen for exception to try again. The retry listener receives control whenever an exception that can be tried again occurs and the step is tried again.
The retry listener can be registered with the JobStepContext method through the addRetryListener method:
JobStepContextMgr.getContext().addRetryListener(new MyRetryListener());
Developing a simple transactional batch application Skip-record processing