Parallel job manager (PJM)

Parallel job manager (PJM)

The parallel job manager (PJM) provides a facility and framework for submitting and managing transactional batch jobs that run as a coordinated collection of independent parallel subordinate jobs.

PJM basics

The parallel job manager is in the batch container instead of in a separate system application.
Only a single xJCL file is required. The xJCL combines the contents of the top-level job xJCL with the contents of the subordinate job xJCLs.
We do not need to create a separate database.
Because the PJM is part of the batch container, we do not need to install and configure the PJM.
You package the PJM APIs in the batch application as a utility JAR. No shared library is required.
The contents of the xd.spi.properties file are part of the xJCL. No xd.spi.properties file is required.

The PJM operation and invocation of the APIs

The following two images depict the PJM architecture and the sequence of a parallel job. First, the xJCL is submitted to the job scheduler. The job scheduler dispatches the xJCL to an endpoint that runs the application that the xJCL references. The batch container determines that the job is to have subordinate jobs running in parallel from inspecting the run property of the job in the xJCL. The batch container delegates the running to the PJM subcomponent. The PJM invokes the parameterizer API and uses the information in the xJCL to help divide the job into subordinate jobs. The PJM then invokes the LogicalTX synchronization API to indicate the beginning of the logical transaction. The PJM generates the subordinate job xJCL and submits the subordinate jobs to the job scheduler. The job scheduler dispatches the subordinate jobs to the batch container endpoints so that they can run. The batch container runs the subordinate job. When a checkpoint is taken, the subordinate job collector API is invoked. This API collects relevant state information about the subordinate job. This data is sent to the subordinate job analyzer API for interpretation. After all subordinate jobs reach a final state, the beforeCompletion and afterCompletion synchronization APIs are invoked. The analyzer API is also invoked to calculate the return code of the job.

A logical transaction is a unit of work demarcation that spans the running of a parallel job. Its lifecycle corresponds to the combined lifecycle of the subordinate jobs of the parallel job. Because of an extension mechanism, we can customize application-managed resources so that they can be controlled in this unit of work scope for commit and rollback purposes.

PJM architecture and programming model

The following image summarizes the PJM architecture, which shows where the APIs are called:

Sequence of a parallel job

The following image shows the order of events in a parallel job:

PJM job management

The top-level job submits the subordinate jobs and monitors their completion. The top-level job end state is influenced by the outcome of the subordinate jobs as follows:

If all subordinate jobs complete in the ended state, that is, in a successful completion, then the top-level job completes in the ended state.
If any subordinate job completes in the restartable state and no subordinate job has ended in the failed state, then the top-level job completes in the restartable state.
If any subordinate job completes in the failed state, then the top-level job completes in the failed state.
If the top-level job and subordinate jobs are in the restartable state, restart only the top-level job. If any subordinate jobs are restarted manually, then the top-level job does not process the logical transaction properly.

Related:

Batch jobs and their environment
Other considerations for the parallel job manager
Developing a parallel job management application
Parallel job manager APIs