Overview of the Data Extract utility

The Data Extract utility is a command-line utility we use to extract data from the WebSphere Commerce database into an output file.

We can run the utility in the staging and production environments, but we are recommended to run the utility in an environment that has the information that you need to extract. For example, the staging environment might not have inventory or pricing information for a catalog entry. In this case, run the utility on the production environment. This utility uses the Data Load utility framework and follows a similar interaction process:

  1. The configured data reader for the utility reads the data that is to be extracted from the database and returns the data to the business object builder.

  2. The business object builder populates a business object that is based on the data that is passed from the data reader. The business object builder passes the object to the business object mediator.

  3. The business object mediator transforms the business object into a list of map objects that is then passed to the data writer.

  4. The data writer then generates the configured output file and writes the list of CSV or XML objects into the output file.

There are two methods for extracting data that we can use with the Data Extract utility, an SQL-based extraction and a business logic-based extraction. The extraction method that configure the utility to use depends on the type of data to extract.


SQL-based extraction

This SQL-based extraction uses a direct database connection and SQL statements to extract data. Unless we are extracting data for use with IBM Product Recommendations or are extracting data that cannot be directly retrieved from the database, we are recommended to use this SQL-based approach. This SQL-based extract process improves the performance and flexibility of the utility in comparison to the business logic-based extraction method. The SQL-based process can also reduce the implementation cost for customizing the utility to extract data that is not supported for extracting with the utility by default. By default, the utility supports extracting the following types of data with the SQL-based extraction process:

To configure the utility to use an SQL-based extract process instead of the business logic process, configure the utility to use the following classes:

For more information about configuring the Data Extract utility to use these classes and the SQL-based extraction process, see Configure and running the Data Extract utility. When we are configuring the utility, we are recommended to copy and edit the provided sample configuration files to help you quickly configure and run the utility.


Business logic-based extraction

This approach uses business logic to fetch the data, similar to the behavior of existing web service. The configured data reader class for the utility uses catalog web service to retrieve data in the catalog business object (noun) format. The business object builder class does not populate any data in this process. Instead, the builder class passes the noun objects from the data reader class to the business object mediator class. The mediator class is then used to extract the data from the business object to build a map object. The data writer then converts the map object into CSV formatted output files, such as EPCMF and ECDF files for use with IBM Product Recommendations.

This business logic approach is useful when data cannot be directly retrieved from the database. For example, when complicated business logic is needed to compute the data, such as for extracting pricing data that uses price rules. To extract this pricing data, logic is needed to apply the price rules before the catalog entry prices can be determined, extracted, and written to an output file. When complicated business logic is needed, you do not need to reimplement the logic used to load or create the data to support extracting the data. This approach, however has a few disadvantages:


Configuration files for the Data Extract utility

The Data Extract utility uses three types of configuration files. Samples of each type of file are provided, but we must update the sample files with configuration information specific to the environment. These configuration files are based on the Data Load utility configuration files, but include some extensions.


Best Practices

When we use the Data Extract utility, there are general configuration recommendations we use to ensure that you take advantage of the full capability of the utility. See Data Extract utility best practices.


Related concepts
Extract and load data