Configure the XML data reader

Configure the extensible markup language (XML) data reader in the business object configuration file to modify the way that data is read from XML formatted source files. You might want to change the default settings of the XML data reader to better work with the format of your data.

The data that is read from an XML file can be mapped to a WebSphere Commerce business object by using a business object configuration file. Using the configuration file, each element of data in the input XML file can be mapped directly to a property of a WebSphere Commerce business object. This handler reads and creates a name-value pair (NVP) mapping one record at a time and then passes each mapping to a business object builder.


Procedure

  1. Locate the wc-loader-object.xml business object configuration file for the business object type that we are loading. Open the configuration file for editing. Sample business object configuration files are in the following directory:

  2. Find the data reader configuration element:

      <_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.XmlReader" >
      </_config:DataReader>

  3. Optional: Add the handler classes within the data reader configuration element to change how the Data Load utility handles loading your XML data. To add an XML handler class, we must specify the class in the following format <_config:XMLHandler className=""/>. For example, the following configuration adds the NVPXmlHandler XML handler class into the data reader configuration:

      <_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.XmlReader" >
         <_config:XmlHandler className="com.ibm.commerce.foundation.dataload.xmlhandler.NVPXmlHandler" />
      </_config:DataReader>

    The following class is available by default:

      NVPXmlHandler
      This handler class is the default handler for the XmlReader and is used to handle generic XML data that follows a specific CSV-like file format. This handler reads each second-level element as a separate object record. This handler parses your input file one object record at a time and generates a hash map for each record that is then passed to the business object builder. The key of this map is the element or attribute name for the objects that we are loading of a particular business object type. We can modify this default behavior by specifying the following parameters: xpathEnabled, qualifiedName, and nvpReMapping. See the detailed descriptions of how to use these optional properties in the following step.

      If you do not specify an XML handler in your data reader configuration, this handler is used. All data load configuration files used for loading CSV input files can be used to load XML input files. The Data Load framework switches the data reader used automatically depending on the file type, either CSV or XML, of the input file.

  4. Optional: Add configuration properties within the data reader configuration element to meet your data loading requirements. To add a configuration property, we must specify the property in the following format <_config:property name="" value""/>. For example, the following configuration adds the recordXpath configuration property for a catalog entry into the data reader configuration:

      <_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.XmlReader" >
         <_config:property name="recordXpath" value="CatalogEntry" />
      </_config:DataReader>

    The following optional properties are available for use with the default NVPXmlHandler class:

      recordXpath
      If your input file has the object element nested deeply, we can set the XPath to have the handler start reading the nested object element as the root element. When you specify this property, any XML element that has a value that matches the XPath value of this property is handled by the Data Load utility as a separate record. If you do not specify this property, only the second-level XML elements are handled as individual object records.

      Specify the value for this parameter to be an XPath. The XPath can be absolute XPath or relative XPath. An XPath is an absolute XPath if it starts with the forward slash /. The relative XPath is just a single element name. For example, we can specify the following absolute XPath:

        <_config:property name="recordXpath" value="/Object/ObjectType/CatalogEntry" /> 

      or the following relative XPath:

        <_config:property name="recordXpath" value="CatalogEntry" />

      This XPath ensures that the object <CatalogEntry> in the following sample is read as the record element:

        <Object>
          <ObjectType>
           <CatalogEntry>
             <PartNumber>productPartNumber-1</PartNumber>
           </CatalogEntry>
          </ObjectType>
        <Object>

      The other elements, <Object>, and <ObjectType> are ignored.

      xpathEnabled
      If your element names are not unique, we can use this property to use the XPath to create uniqueness in the NVP pair mapping. If you specify this property with a value of true, the key for mapping your data during the Data Load process uses the XPath to the element. If this value is false, the key for mapping your data is the element name or attribute name. The XPath used is relative to your element record. The default value for this property is false.

      Note: If you set this property as true, we must also change the value for the mapping of your object in the data load business object configuration file. For example, if your input file contains the following catalog entry element:

        <CatalogEntry catalogEntryTypeCode="ProductBean" displaySequence="1.0"> 
          <PartNumber>productPartNumber-1<PartNumber> 
          <Description> 
            <Name>name-1<Name>
          <Description> 
        </CatalogEntry>

      If you set the xpathEnable to be true, the XML handler builds the following mapping:

        catalogEntryTypeCode = ProductBean
        displaySequence = 1.0
        PartNumber = productPartNumber-1
        Description/Name = name-1

      The keys in the mapping are the XPath which always relative the root of your record element CatalogEntry without starting with the forward slash /. The attribute is treated like an element in the XPath.

      nvpReMapping
      This property controls how to redo the NVP mapping of your data that is passed for an object record to the business object builder. The value of this property defines a list of remapping rules for our data. If the elements that contain information for our object contain names that are not unique, we can use this configuration property to ensure uniqueness. For example, within a catalog entry, object elements for the catalog entry <name>shirt</name> and attribute <name>color</name> can exist. The XML handler reads the values for these elements as two values for a single name element and records these values as list in the NVP mapping, name=[shirt, color]. By remapping the XPath for these elements, we can ensure that the handler reads and maps these elements and values correctly.

      Your list of NVP remapping rules must have each rule separated by a '|' character. Each rule contains three tokens that are separated by a comma ',' character. The first token is for the new key in the remapping. The second token is for the new value in the remapping, and the third token is for the prefix in the remapping key. For example, if your input file contains the following catalog entry elements:

        <CatalogEntry>
          <CatalogEntryIdentifier>
            <ExternalIdentifier>
              <PartNumber>productPartNumber-1</PartNumber>
            </ExternalIdentifier>
          </CatalogEntryIdentifier>
          <Description>
            <Attributes name="auxDescription1">auxDesc1-1</Attributes>
            <Attributes name="auxDescription2">auxDesc2-1</Attributes>
            <Attributes name="published">1</Attributes>
          </Description>

      The handler class, by default, reads the XPath for the following description elements

        name=[auxDescription1, auxDescription2, published], Attributes=[auxDesc1-1, auxDesc1-2, 1]

      The handler maps these elements as two elements:

        name=[auxDescription1, auxDescription2, published]
        Attributes=[auxDesc1-1, auxDesc2-1, 1]

      If you set the remapping configuration property to be:

        <_config:property name="nvpReMapping" value="name, Attributes, " /> 

      The handler reads the elements as three separate elements and maps these elements as

        auxDescription1 = auxDesc1-1
        auxDescription2 = auxDesc2-1
        published = 1

      If you specify the remapping rule containing the prefix:

        <_config:property name="nvpReMapping" value="name, Attributes, Description/Attributes/name/" />

      These elements are read and mapped as

        Description/Attributes/name/auxDescription1 = auxDesc1-1
        Description/Attributes/name/auxDescription2 = auxDesc2-1
        Description/Attributes/name/published = 1

      Note: If you do change the NVP mapping for an object, we must also change the value for the mapping of your object in the data load business object configuration file. For example, to map this data to use the remapping rules, your business object configuration mapping can be:

        <_config:mapping xpath="Description/Attributes/auxDescription1" value="Description/Attributes/name/auxDescription1"   />
        <_config:mapping xpath="Description/Attributes/auxDescription2" value="Description/Attributes/name/auxDescription2"   />
        <_config:mapping xpath="Description/Attributes/published" value="Description/Attributes/name/published"   />

      The value prefix Description/Attributes/name is optional, if you do not use the prefix, your mapping can resemble:

        <_config:mapping xpath="Description/Attributes/auxDescription1" value="auxDescription1"   />
        <_config:mapping xpath="Description/Attributes/auxDescription2" value="auxDescription2"   />
        <_config:mapping xpath="Description/Attributes/published" value="published"   />

      qualifiedName
      The qualified name is used to ensure the uniqueness of the data elements that we are loading. This uniqueness is achieved by the inclusion of the namespace as part of the name for our element data in the NVP pair mapping. Specify this property value as true to include the namespace as part of the key to the map that is passed to your business object builder. Default is false.

      Note: If you set this property as true, we must also change the value for the mapping of your object in the data load business object configuration file.

    Working with element and attribute values:

    We can use either elements or attributes to add data to be loaded. Typically, they are loaded the same by using either method. However, they are loaded differently when the value is empty. By default, all elements with empty values are treated as null. However, attributes with empty values are treated as empty values. That is, the value is null in the database if we use an element for Name, and the value is empty in the database if we use an attribute for Name. This default behavior can be changed using the following optional configuration properties.

      ignoreEmptyElementText
      If set to false, empty elements are treated as empty values. Default is true.

      ignoreEmptyAttributeValue
      If set it to true, empty attribute values are treated as null. Default is false.

    The property can be specified under the <DataReader> element, <LoadItem> element, or <LoadOrder> element as:

      <_config:property name="ignoreEmptyElementText" value="false" />

    See Creating data in XML format.

  5. Optional: Configure your Data Load process to include a data reader preprocess. To configure a preprocessor to run, we must specify the preprocessor class in the following format:

      <_config:DataReaderPreprocessor className="" />

    For example, the following configuration specifies that a file difference preprocessor is to run:

      <_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.XmlReader" >
        <_config:DataReaderPreprocessor className="com.ibm.commerce.foundation.dataload.datareader.XmlFileDiffPreprocessor" />
      </_config:DataReader>

    The following data reader preprocessor is available for use with the Data Load utility:

      com.ibm.commerce.foundation.dataload.datareader.XmlFileDiffPreprocessor
      This preprocessor compares a specified old and new input file and generates a new file that contains only the differences that exist in the new file. This preprocessor can improve the performance of routine Data Load operations by avoiding loading data that was loaded previously. For more information about this preprocessor, see Data Load file difference preprocessing. If we are running this preprocessor, we can also include more configuration properties specific to this preprocessor. For more information about configuring this preprocessor, and the configuration properties available for this preprocessor, see Configure the Data Load utility to run a file difference preprocess.

  6. Save and close your file.

Previous topic: Configure the CSV data reader
Next topic: Configure the data load environment settings