Verify the results of the data load

Verify that the Data Load utility ran successfully by checking the exit value in the summary report or in the wc-dataload.log file. If errors occur, analyze the data in the error reports as a first step in finding the root cause.


Before beginning

We must run the Data Load utility before verifying the results.


Task info

The wc-dataload.log logging/tracing file is generated each time the Data Load utility is run. Also, a separate error log file is generated for each Load item that consists of one or more errors. If no error occur, then no error logs are generated. The summary report is the information displayed on your console when you run the Data Load utility. For more information about the data load order configuration file schema, see Data Load utility order configuration file.


Procedure

  1. Check your summary report. The summary report contains the following information for each Load item:

    Information included within the data load summary report
    Load item information Description
    Business Object Configuration The name of the business object configuration used.
    Data loader mode The mode specified for the data load process.
    Batch size The batch size value specified in your configuration file.
    Commit count The commit count value specified in your configuration file.
    Error Tolerance Level The error tolerance level as specified in your configuration file.
    Error Count The number of errors generated.
    Amount of data processed If we are using the default CSV reader, this number indicates the number of lines that are processed in the CSV source file.
    Amount of business objects processed The number of data objects returned from the data reader. One data object returned might be constructed into multiple business objects. But in most cases, one data object returned is constructed into one business object.
    Amount of business objects committed The number of constructed business objects that are saved in the database or in the files. This value depends on how the data writer is configured.
    Data loader initialization time The initialization time of the data loader.
    Data loader execution began The start time of the data load process in seconds.
    Data loader execution ended The end time of the data load process in seconds.
    Data loader completed in The total running time of the data load in seconds.
    Total flush time The total time that it takes to call the JDBC batch execution. This information is only useful if the JDBC batch is enabled (that is, the batch size is greater than 1.) If the JDBC batch is enabled, then it is the total time spent to run all the SQL statements.
    Total commit time The total time that it takes to call the database commit. The time is typically small and not useful.
    Total ID resolver time The time that it takes for the ID resolver to resolve and check IDs.
    CSV file location The location of the source data.
    Table name, number of rows affected The names of the database tables that are updated, and the total number of rows for each table that are affected by the load process.The number of insert, update, and delete operations that are performed are included with the total number of affected rows for a table. The number of rows affected is the number of SQL operations that run against rows in a table. There can be a discrepancy between the number of operations that run against table rows and the actual number of rows affected. For instance, if you a delete operation runs and a cascade delete occurs, the summary report might identify only the rows that are deleted by the initial operation, not the resulting cascade delete.

    Note: If the site uses workspaces, the number of insert, update, and delete operations in the summary report are calculated based on the operations against the workspace tables. For instance, a delete operation within a workspace can cause an insert or update operation against a workspace table instead of a delete operation against a base table. The summary report includes only the initial insert or update operation not the delete operation that occurs after the load process completes and the workspace changes are approved.

    A sample summary report:

      ===============================================================================
      WebSphere Commerce Data Load
      ===============================================================================
      
      Load started at: Thu Sep 05 09:05:59 EDT 2013
      Initialization completed in 2.229 seconds.
      
      Processing CatalogEntry...
      
      ----------------------------------------------------------------------------------
      Load summary for load item: CatalogEntry.
      ----------------------------------------------------------------------------------
      Business Object Configuration: wc-loader-catalog-entry.xml
      Data loader mode: Replace.
      Batch size: 1.
      Commit count: 100.
      Error Tolerance Level: 1.
      Error Count: 0.
      Amount of business objects processed: 13.
      Amount of business objects committed: 13.
      Data loader initialization time: 0 seconds.
      Data loader execution began: Thu Sep 05 09:06:07 EDT 2013
      Data loader execution ended: Thu Sep 05 09:06:13 EDT 2013
      Data loader completed in 5.765 seconds. 
      Total flush time: 0 seconds.
      Total commit time: 0.003 seconds.
      Total ID resolver time: 1.138 seconds.
      CSV file location: C:\IBM\WCDE_ENT70\bin\..\samples\DataLoad\Catalog\CatalogEntries.csv.
      Affected tables (13):
      Table                |     Total |    Insert |    Update |    Delete 
      ---------------------|-----------|-----------|-----------|-----------
      BASEITEM             |         5 |         0 |         5 |         0 
      BASEITMDSC           |         5 |         0 |         5 |         0 
      STOREITEM            |         5 |         0 |         5 |         0 
      ITEMVERSN            |         5 |         0 |         5 |         0 
      DISTARRANG           |         5 |         0 |         5 |         0 
      CATENTRY             |        13 |         0 |        13 |         0 
      STORECENT            |        13 |         0 |        13 |         0 
      CATENTDESC           |        13 |         0 |        13 |         0 
      CATGPENREL           |        13 |         0 |        13 |         0 
      LISTPRICE            |        13 |         0 |        13 |         0 
      ITEMSPC              |         8 |         0 |         8 |         0 
      VERSIONSPC           |         8 |         0 |         8 |         0 
      CATENTREL            |         8 |         0 |         8 |         0 
      
      ----------------------------------------------------------------------------------
      
      Program exiting with exit code: 0. 
      Load completed successfully with no errors.
      
      Load ended at: Thu Sep 05 09:06:13 EDT 2013
      Load completed in 13.901 seconds.

  2. Open the wc-dataload.log file in the following directory:

    The system exit value displayed in your wc-dataload.log file or the summary report, indicates the following:

    1. If the Data Load utility finishes successfully without any errors, the system exit value is 0.

    2. If the Data Load utility finishes with some errors (fewer errors than the maxError), the system exit value is 1.

    3. If the Data Load utility terminates early because one of the DataLoader error numbers reaches the maxError, the system exit value is 2.

    4. If the Data Load utility terminates early due to some unrecoverable error such as DataLoadConfigException and DataLoadSystemException, the system exit is 3.

    Some exceptions that might be displayed in your report, are described as follows:

      DataLoadSystemException
      This error occurs when there are some system errors. One example of a system error is the database deadlock exception, and Out of Memory exception.

      DataLoadSQLException
      This error occurs when the data load executes some SQL statement and the database throws an exception.

      DataLoadConfigException
      This error occurs when the configuration file contains incomplete elements or the content of element is incorrect.

      DataLoadApplicationException
      This error occurs when the input data source file has erroneous records.

  3. Optional: If an error occurs, open the error log, in the same directory. The file name of the error log contains the name of the LoadItem, and the date and time that the error occurred. For example, if an error occurred for the LoadItem CatalogGroup, the file name might be CatalogGroup_ERROR_2009.08.26_14.54.21.406.log. The error logs contain the following information:

    • File path location of the error log.

    • The application and exception messages that occurred.

  4. Optional: If you configure the Data Load utility to run a file difference preprocessor, the information for the preprocess is included in the generated summary report. This information can help you verify that your file difference preprocess completed successfully. This information can also identify whether you save time by generating and loading a difference file instead of loading your full new file.

    Information included within the data load summary report information for the file difference preprocessor
    Load item information for file difference preprocessor Description
    The number of records in the new file The number of data objects returned from the data reader for the new file that we are running a file difference against.
    The number of records in the generated difference file The number of data objects returned from the file difference preprocess and included in the difference file. These records are the records from the new file that are changed from, or do not exist in, the file that the new file is compared against.
    The time for generating the difference file: The time that is necessary to generate the difference file in seconds.
    The time for loading only the difference file: The time that is necessary to load only the difference file in seconds.
    The total time for generating and loading the difference file: The total running time for generating and loading the difference file in seconds.
    The estimated loading time for loading the full new file The estimated time that is necessary to load the new file without running a file difference preprocess in seconds.
    The estimated percentage of loading time that is saved by running the file difference preprocessor and loading only the file differences: The estimated loading time that is saved by running the file difference preprocess and loading only the difference file instead of the full new file. The time that is saved is calculated as a percentage of the estimated loading time for loading the full new file with the following formula:

      ( (full_file_load_time - file_diff_time) / full_file_load_time) x 100%

    Where

    • file_diff_time is the value for the total time for generating and loading the difference file.

    • full_file_load_time is the value for the estimated loading time for loading the full new file.

    Your summary report with file difference information can resemble the following sample report:

      Load summary for load item: CatalogEntry
      -----------------------------------------------------------------------------
      --
      Business Object Configuration: wc-loader-catalog-entry.xml
      Data loader mode: Replace.
      Batch size: 1.
      Commit count: 100.
      Error Tolerance Level: 1.
      Error Count: 0.
      Amount of data processed: 8.
      Amount of business objects processed: 6.
      Amount of business objects committed: 6.
      Data loader initialization time: 0 seconds.
      Data loader execution began: Thu Mar 10 10:52:06 EDT 2013
      Data loader execution ended: Thu Mar 10 10:52:06 EDT 2013
      Data loader completed in 1.208 seconds.
      Total flush time: 0.002 seconds.
      Total commit time: 0.003 seconds.
      The number of records in the new file WebSphere/utilities_root/samples/DataLoad/Sample/CatalogEntriesNew.csv: 15 records
      The number of records in the generated difference file WebSphere/utilities_root/samples/DataLoad/Sample/CatalogEntriesNew_diff_2013.03.10_10.52.05.501.csv: 5 records
      The time for generating the difference file: 0.028 seconds.
      The time for loading only the difference file: 1.18 seconds.
      The total time for generating and loading the difference file: 1.208 seconds.
      The estimated loading time for loading the full new file WebSphere/utilities_root/samples/DataLoad/Sample/CatalogEntriesNew.csv: 3.54 seconds
      The estimated percentage of loading time that is saved by running the file difference preprocessor and loading only the file differences: 65%.
      CSV file location: WebSphere/utilities_root/samples/DataLoad/Sample/CatalogEntriesNeww_diff_2013.03.10_10.52.05.501.csv.
      Affected tables (14):
      Table name: BASEITEM, Affected number of rows: 4.
      Table name: BASEITMDSC, Affected number of rows: 4.
      Table name: STOREITEM, Affected number of rows: 4.
      Table name: ITEMVERSN, Affected number of rows: 4.
      Table name: DISTARRANG, Affected number of rows: 4.
      Table name: CATENTRY, Affected number of rows: 6.
      Table name: STORECENT, Affected number of rows: 4.
      Table name: CATENTDESC, Affected number of rows: 4.
      Table name: CATGPENREL, Affected number of rows: 4.
      Table name: LISTPRICE, Affected number of rows: 4.
      Table name: OFFER, Affected number of rows: 4.
      Table name: OFFERPRICE, Affected number of rows: 4.
      Table name: ITEMSPC, Affected number of rows: 2.
      Table name: TI_DELTA_CATENTRY, Affected number of rows: 5.
      -----------------------------------------------------------------------------
      --

  5. Optional: If you configured the Data Load utility to output information related to the ID resolver process, this information is included in the summary report. This information can help you determine how much time is necessary to resolve and check IDs during the load process. With this information, we can identify whether it would be more efficient to resolve and check IDs for a table against the ID resolver cache or directly against the database. To include the ID resolver information in the data load summary report, we must include the summaryReportLevel configuration property in the data load order configuration file. For more information about configuring this file and the available configuration properties, see

    If you identify that resolving IDs for a table against the cache is inefficient, we can exclude the IDs for that table from the cache. To exclude a table from being included within the ID resolver cache, we must configure IDResolver configuration property within the data load environment configuration file. See Configure the data load environment settings. If the data load order configuration file is configured to include the ID resolver information in the data load summary report, the following information is included in the report.

    Information included within the data load summary report for the ID resolver
    Load item information for ID resolver Description
    ResolveId from cache Information for the ID resolve process that is performed against the ID resolver cache.
    Load Time (sec) The time that is necessary to fetch and load the IDs into the ID resolver cache.
    Entries The number of entries within the cache for the table.
    Hits The number of hits to the cache to resolve IDs for the table.
    ResolveId from database Information for the ID resolve process that is performed directly against the database table.
    Time (sec) The time that is necessary to resolve IDs directly from the database for the table.
    Hits The number of hits to the database to resolve IDs for the table.

    The summary report includes an extra section that can resemble the following sample ID resolver summary.

      Table                |          ResolveId from cache          |  ResolveId from database   
                           |Load Time (sec) |   Entries |      Hits |     Time (sec) |      Hits 
      ---------------------|----------------|-----------|-----------|----------------|-----------
      ATTRDICT             |          0.011 |         2 |         1 |            0.0 |         0 
      BASEITEM             |          0.055 |      1241 |        13 |            0.0 |         0 
      BASEITMDSC           |          0.246 |     16008 |         5 |            0.0 |         0 
      CATENTDESC           |          0.226 |     84292 |        13 |            0.0 |         0 
      CATENTREL            |          0.028 |      5270 |         8 |            0.0 |         0 
      CATENTRY             |          0.092 |      6497 |        21 |            0.0 |         0 
      CATGPENREL           |          0.044 |      6496 |        13 |            0.0 |         0 
      CATGROUP             |            0.0 |       131 |        23 |            0.0 |         0 
      CATGRPDESC           |            0.0 |         0 |         6 |            0.0 |         0 
      CATGRPREL            |            0.0 |         0 |         4 |            0.0 |         0 
      CATTOGRP             |            0.0 |         0 |         2 |            0.0 |         0 
      DISTARRANG           |          0.057 |      1238 |         5 |            0.0 |         0 
      ITEMSPC              |          0.091 |      5259 |         8 |            0.0 |         0 
      ITEMVERSN            |           0.02 |      1241 |        13 |            0.0 |         0 
      LISTPRICE            |          0.066 |      6489 |        13 |            0.0 |         0 
      STORECENT            |          0.014 |      6496 |        13 |            0.0 |         0 
      STORECGRP            |            0.0 |         0 |         6 |            0.0 |         0 
      STOREITEM            |          0.047 |      1240 |         5 |            0.0 |         0 
      VERSIONSPC           |          0.097 |      5259 |         8 |            0.0 |         0 
      ---------------------|----------------|-----------|-----------|----------------|-----------
      Total                |          1.094 |    147159 |       180 |            0.0 |         0 

    Note: We can configure the Data Load utility to exclude specific tables from the ID resolver cache. Consider excluding a table when the table meets the following conditions:

    • The table is large

    • Loading the IDs from the table into the cache takes a lot of time

    • The cache does not receive many hits to resolve IDs for the table

    Excluding a table lets the ID resolver resolve IDs directly against the database table. Resolving IDs against the database table can save time for resolving the IDs or reduce the memory usage. To exclude a table, configure the data load environment configuration file. Specify the table that is to be excluded in a <_config:ExcludeCache> element within the <_config:IDResolver> element.

    Note: For more information about configuring this file, see Configure the data load environment settings.

Previous topic: Running the Data Load utility command syntax