Configure a Data Load utility scheduler job
We can use the WebSphere Commerce Administration Console to schedule a Data Load utility job for the site. By using a scheduler job, we can configure the Data Load utility to routinely load an input file, such as for loading frequently updated data.
Before beginning
Ensure that you complete the following tasks:
- If we want your scheduler job to retrieve an input file from a file transfer site, we can configure an SFTP transport for the job to use to retrieve the file. For more information see, Configure an SFTP transport to retrieve external files for the Data Load utility.
Task info
After you configure this job to run more than once, you do not need to manually run the Data Load utility to load a frequently updated input file. This job runs automatically based on the start time and date that you configure for the job. The job then runs automatically when the configured schedule interval elapses.
Procedure
- Connect to the WebSphere Commerce database and run the following SQL statements. These SQL statements update the database to register the scheduler command for the Data Load utility scheduler job. By running these SQL statements, the command for the scheduled job is registered in the struts configuration file for the site and within the SCHCMD and CHKARRANG database tables.
insert into schcmd (schcmd_id, storeent_id, pathinfo) values (-37, 0, 'DataLoad'); insert into chkarrang (chkcmd_id, schcmd_id) values (-1, -37);
- Open the Administration Console and select Site on the Administration Console Site/Store Selection page.
- Click Configuration > Scheduler. A list of jobs that are scheduled to run is displayed.
- On the Scheduler Status Display page, click New.
- From the New Scheduled Job page, select DataLoad in the Job command drop-down list.
- In the Job parameters, enter the dataLoadMainConfigFilePath parameter to specify the Data Load utility main configuration file. We can set the value to be the absolute path or relative path to the file. If you include the relative path, define the path from the currently running WebSphere Application Server directory. For example, WAS_installdir\bin We can also configure the job to use other optional parameters. For example:
- transportId
- If you configured a transport for the job to use to retrieve an input file from an external source, include this parameter. Include the ID for our transport as the value for this parameter. For example, "transportId=101".
- errorLogPath
- Specifies the error log directory where the log file generates. By default, the directory is the same as the dataLoadMainConfigFilePath directory.
- uploadType
- Specifies the value used to populate the UPLOADFILE.UPLOADTYPE column. This column is used in some Management Center tools to display file upload jobs.
If you include other name-value pair parameters, the parameters are passed directly to the Data Load utility and must be supported by the utility.
If the site uses workspaces, we can configure the Data Load utility to load data into a workspace. To load data into a workspace, configure the Data Load utility job to identify the workspace, task group, and task name in the job parameters. For example, the following parameter format identifies a workspace, task group, and task name:
workspaceIdentifier=xx&taskGroupName=yy&taskName=zz- Set the remaining properties for the scheduled job:
Parameter Value Start date and Start time Provide the date and time that this job is to start. The time must be entered in the 24-hour clock format. Associated user Type the user on whose behalf this job is to run. By default, the user ID of the current user is entered in this field. Set the value of this field to be the same as the value for the user in the LOGONID column of the USERREG database table. Allowed host Type the name of the host that runs this job. If this parameter is omitted, the job can be run by on any host. This parameter is only needed if the scheduler runs on multiple hosts and if a job must be restricted to only one of the hosts. Use the following format to specify the host name: name.domain Schedule interval Type the number of seconds between successive runs of this job. If this parameter is omitted, the job runs only once. Job attempts and Seconds to retry Type the number of times that the scheduler is to retry the job if the job fails. For the Seconds to retry parameter, type the number of seconds before the scheduler tries to run a failed job again. We must enter a value for both fields before the scheduler retries running a failed job. Scheduler policy Specify the policy that is to be used by the scheduler when the job fails to run. Select whether the job is run once, and when the next run is to occur in the future, or whether the job runs as many times as necessary to recover all missed runs. Job priority Type a number to be associated with the priority of this job. This value is inserted into the SCCPRIORITY column of the SCHCONFIG table. A greater number indicates a higher priority job. Application type Select the application schedule pool that the job is a part of. The purpose of this field is to constrain resource-intensive jobs to a limited number of threads. The application types, and the rules that govern their access to resources, are defined by the user under the WebSphere Commerce Administration Console. The default application type is null.
For more information about the full set of parameters for scheduled jobs, see Scheduling a site-level job.
If we are using configuring multiple Data Load utility jobs for the same store, consider whether the data that the jobs loads requires any existing parent data. For example, categories must exist before we can load catalog entries into the categories. If your data does require existing parent data, consider loading the data within the same load operation. We can configure the load order of the Data Load utility operation to load the data in the correct sequence. Otherwise, configure the start times for the scheduled jobs so that the scheduled job that loads the parent data finishes before the job that loads the child data starts.
- Click OK. Your Data Load utility job is now listed on the Scheduler Status Display page and runs at the configured start date and time. When the configured time interval elapses, the scheduled job runs again to retrieve an input file from the configured directory and load the data. This interval is the value that you set for the Schedule interval parameter.