+

Search Tips   |   Advanced Search

Batch job state table

As the job scheduler and grid endpoint process a batch job, the job state updates in the job scheduler database. The diagram shows the relationship between states, and the following table lists the possible batch job states and the events that trigger transitions between states. We can view the current state of a batch job from the job management console, or retrieve it using the command line or EJB interface. If a failure occurs before a batch step initializes, then the batch job goes into execution failed state. Otherwise, it goes into restartable state.

Start state Client command System action Special condition Return code End state
non-existent (delayed submit) submit


pending submit
non-existent submit


submitted
submitted
dispatch
0 executing
submitted cancel

0 restartable
executing stop

0 restartable
executing cancel

4 cancel_pending
executing
caught application error*
4 restartable
executing

Infrastructure problem** 4 restartable/unknown
executing suspend

4 suspend_pending
executing
job completed
4 ended
executing

Infrastructure problem in job setup*** 4 restartable
suspend_pending
checkpoint
2 suspended
suspend_pending

Infrastructure problem** 2 restartable/unknown
suspended resume

5 resume_pending
suspended cancel

5 cancel_pending
suspended

Infrastructure problem** 5 restartable/unknown
resume_pending
job resumed
2 executing
resume_pending

Infrastructure problem** 2 restartable/unknown
restartable restart

8 submitted
cancel_pending
job canceled
1 restartable
cancel_pending

Infrastructure problem** 1 restartable/unknown
restartable purge

8 non-existent
execution_failed purge

9 non-existent
ended purge

7 non-existent

Note Description
* Application error The batch application failed at run time. The grid endpoints detected this failure.
** Infrastructure problem An unexpected error has occurred. See the following example for infrastructure problem in job setup.
*** Infrastructure problem in job setup An unexpected error that occurs when a batch job is set up for the first time by the grid endpoints. For example, if there is an unexpected database failure, the job goes into execution_failed state.

  • In this condition, the batch job is run for the first time and no steps are processed yet. Batch jobs go into the restartable state under most failure conditions so that they can restart from checkpointed positions if the failure condition can be overcome. However, in this instance of a failure condition, a batch job goes into execution_failed state and cannot be restarted. Since this situation is a job setup scenario and work is not yet processed by the batch job, batch work is not lost as a result of failure.

  • If jobs are in a non-final state on the endpoint, the scheduler puts the jobs into an unknown state under two conditions. The conditions are that the endpoint loses communications or the endpoint goes down. If the endpoint comes back up, the scheduler synchronizes the job status with the endpoint. If the endpoint goes down, all batch jobs are put into a restartable state and all compute- intensive jobs in an execution failed state. If the endpoint has only lost communication with the scheduler and the jobs continue to run, the scheduler updates its status. The status update is the final state of the jobs running on the endpoint at that point.


Related:

  • xJCL elements
  • Developing batch applications