Customizing Extract, Transform and Load (ETL) processes
WebSphere Commerce Analyzer extracts, transforms and loads (ETL) data from a default set of tables and columns. If you add new tables or change existing ones, customize the ETL process in order to include the information from the new or changed tables in the data mart. Changing the ETL process involves writing new ETL scripts and editing existing ETL scripts.
ETL command tags and pre-processor
WebSphere Commerce Analyzer performs many ETL steps in the process of preparing the data mart. Most of these steps are processed by ETL implementation Java classes which point to SQL input files with XML tags describing how processing should occur. These XML tags are referred to as ETL command tags. To customize ETL to process new or modified tables, you customize the ETL implementation java classes or SQL input files that contain the ETL command tags.
The optional SQL input file is used as input to the ETL implementation java classes. The implementation Java classes analyze the command tags and attempts to ensure that the commands are passed to DB2 in an optimal way. For example, if the command tags insert several thousand records into the data mart, the implementation Java classes may decide to perform a commit every thousand records to improve performance. The implementation Java classes then passes the commands to DB2 as SQL statements to be executed against the data mart.
The ETL implementation Java classes:
- use DB2 batch or SQL input files with the INSERT, UPDATE, DELETE, DROP, CREATE TABLE, CREATE INDEX, COMMIT, RUNSTATS and REORG commands. Generally, any SQL statement that does not return a result set is supported. In addition to the SQL statements that are supported, the RUNSTATS and REORG DB2 commands are also supported.
- run SQL statements as is. If there is no tagging specified for an SQL statement, the ETL implementation Java classes run the statement exactly as if it were run from the DB2 command line.
- can rewrite queries to perform the CURSOR INSERT, CURSOR UPDATE, and BULK LOAD operations. This can greatly increase the throughput of queries.
- perform necessary ETL processing automatically. For example, it can update the source table to indicate rows are loaded, or update the last updated time stamp on the target table.
Supported command tags
The ETL implementation Java classes are controlled through a set of command tags. These command tags are embedded in DB2 batch file comments. There are two types of ETL command tags; global and local.
You can use any of the WebSphere Commerce Analyzer supported global commands or local commands in your own customized .SQL file. However, it is important that you do not change these commands. In general, do not change SQL files predefined in WebSphere Commerce Analyzer. If you do, you are changing the logic of ETL functions.
- Global commands
- Global command tags affect every SQL statement in the batch file, and can be located on any line in the batch file. The global command applies to all SQL statements that are defined in the specified .SQL file.
- Global command lines are prefixed with:
--G--- Local commands
- Local command tags affect only the first statement following the local tag. Insert the local command before the statement that it applies to. The SQL statement always ends with a semicolon (;) delimiter character. If you decide to comment out a locally tagged SQL statement, you should also comment out that statement's local ETL Driver tags or delete the tags altogether. Otherwise those tags are applied to the next uncommented SQL statement and can lead to unexpected results.
- You can perform these types of operations using the following local command tags:
- CursorInsert
- CursorUpdate
- LoadInsert
- Local command lines are prefixed with:
--L--For more information about ETL command tags refer to:
Error handling
The ETL implementation java classes pre-processors stops when an error occurs. See the ETL logs, located in WCA_installdir/log. for more information. These logs contain information generated by the extract, transform, and load processes during normal operation.
(C) Copyright IBM Corporation 1996, 2006. All Rights Reserved.