Overview of the Data Extract utility

Overview of the Data Extract utility

The Data Extract utility is a command-line utility we use to extract data from the WebSphere Commerce database into an output file.
We can run the utility in the staging and production environments, but we are recommended to run the utility in an environment that has the information that you need to extract. For example, the staging environment might not have inventory or pricing information for a catalog entry. In this case, run the utility on the production environment. This utility uses the Data Load utility framework and follows a similar interaction process:

The configured data reader for the utility reads the data that is to be extracted from the database and returns the data to the business object builder.
The business object builder populates a business object that is based on the data that is passed from the data reader. The business object builder passes the object to the business object mediator.
The business object mediator transforms the business object into a list of map objects that is then passed to the data writer.
The data writer then generates the configured output file and writes the list of CSV or XML objects into the output file.
There are two methods for extracting data that we can use with the Data Extract utility, an SQL-based extraction and a business logic-based extraction. The extraction method that configure the utility to use depends on the type of data to extract.

To extract promotions, marketing, or Commerce Composer objects, we must use the SQL-based extraction.
To extract catalog data to generate Enterprise Product Report (EPR) data for use with IBM Product Recommendations, we must use the logic-based extraction.

SQL-based extraction

This SQL-based extraction uses a direct database connection and SQL statements to extract data. Unless we are extracting data for use with IBM Product Recommendations or are extracting data that cannot be directly retrieved from the database, we are recommended to use this SQL-based approach. This SQL-based extract process improves the performance and flexibility of the utility in comparison to the business logic-based extraction method. The SQL-based process can also reduce the implementation cost for customizing the utility to extract data that is not supported for extracting with the utility by default. By default, the utility supports extracting the following types of data with the SQL-based extraction process:

Promotions
Commerce Composer objects, such as widgets, layouts, layout templates, and pages
Marketing objects, such as activities. e-Marketing Spots, content, campaigns, attachments, and customer segments
To configure the utility to use an SQL-based extract process instead of the business logic process, configure the utility to use the following classes:

UniqueIdReader

This data reader class adds support for the utility to use SQL statements to retrieve the unique ID value for a business object. The data reader class can then send a map object for the business object to the business object builder.

AssociatedObjectMediator

This business object mediator adds support for the utility to use SQL statements to retrieve the detailed business object information for the map object. The mediator can then send an updated map object containing the detailed business object information to the configured data writer class.

CSVWriter

A data writer class that can convert the map objects that are sent by the business object mediator into a CSV formatted record. This writer class can then write the record into the configured output CSV file. Use either this data writer class or the XmlWriter data writer class.

XmlWriter

A data writer class that can convert the map objects that are sent by the business object mediator into an XML formatted element. This writer class can then write the element and any subelements into the configured output XML file. Use either this data writer class or the CSVWriter data writer class.

ValueHandler

This interface provides a customization point that we can use when the utility cannot retrieve data directly from the database. We can also use this class when you need to modify data before the data writer class writes the data into the output file.

For more information about configuring the Data Extract utility to use these classes and the SQL-based extraction process, see Configure and running the Data Extract utility. When we are configuring the utility, we are recommended to copy and edit the provided sample configuration files to help you quickly configure and run the utility.

Business logic-based extraction

This approach uses business logic to fetch the data, similar to the behavior of existing web service. The configured data reader class for the utility uses catalog web service to retrieve data in the catalog business object (noun) format. The business object builder class does not populate any data in this process. Instead, the builder class passes the noun objects from the data reader class to the business object mediator class. The mediator class is then used to extract the data from the business object to build a map object. The data writer then converts the map object into CSV formatted output files, such as EPCMF and ECDF files for use with IBM Product Recommendations.
This business logic approach is useful when data cannot be directly retrieved from the database. For example, when complicated business logic is needed to compute the data, such as for extracting pricing data that uses price rules. To extract this pricing data, logic is needed to apply the price rules before the catalog entry prices can be determined, extracted, and written to an output file. When complicated business logic is needed, you do not need to reimplement the logic used to load or create the data to support extracting the data. This approach, however has a few disadvantages:

The approach can cause the performance of the extraction process to be slow. The logic-based services for retrieving data is intended to retrieve a single business object or a list of business objects. If any of the business objects are large, however, the performance can be slow.
Customizing the extraction process requires significant effort to retrieve custom data or data that is not supported for extracting by default. If you need to extract custom data or data that is not supported for extracting with the utility, we must implement our own custom services to extract the data.

Configuration files for the Data Extract utility
The Data Extract utility uses three types of configuration files. Samples of each type of file are provided, but we must update the sample files with configuration information specific to the environment. These configuration files are based on the Data Load utility configuration files, but include some extensions.

wc-dataextract.xml

This file is the order configuration file that we must point to when you run the Data Extract utility. This file specifies the paths to the environment configuration file and to the business object configuration file.

wc-dataextract-env.xml

The environment configuration file, which includes the environment variables for the WebSphere Commerce instance. These variables include the following information:

Business context variables, including the store identifier, catalog identifier, and the default language and currency for the store.
Database environment settings, including the database type, name, and schema.

wc-dataextract-business_object.xml

The business object configuration file, which configures how the utility identifies the data to extract for a specific business object. By default sample business object configuration files are provided for extracting data for the following types of objects with the SQL-based extraction process:

Commerce Composer objects
Sample configuration files for extracting Commerce Composer widgets, layouts, templates, and pages. The files are configured to generate CSV files that can be used with the Data Load utility.
Promotions
The sample configuration files for extracting promotion data are configured to generate an XML file that can be used with the Data Load utility.
Marketing objects
Sample configuration files are provided for extracting marketing activities, campaigns, content, attachments, customer segments, and e-Marketing Spots. The files are configured to generate CSV files that can be used with the Data Load utility.
These files include the following information:

Business context information.
Data mappings required to transform WebSphere Commerce business objects to the data that can be written in the output file.
Definitions for the order that the utility writes the data to the columns in the file.
Pointers to interfaces and implementation classes that the utility uses to extract and transform the data.

Note: Sample configuration files are also provided for extracting catalog entry data into an EPCMF file and category data into an ECDF file for use with IBM Product Recommendations. These sample configuration files configure the utility to use the business logic-based extraction method. For more information about configuring the utility to uses these sample files, see Data extraction utility for dynamic recommendations in IBM Product Recommendations.

Best Practices

When we use the Data Extract utility, there are general configuration recommendations we use to ensure that you take advantage of the full capability of the utility. See Data Extract utility best practices.

Related concepts
Extract and load data