Configure the CSV data reader

Configure the comma-separated values (CSV) data reader in the business object configuration file to modify the way data is read from CSV source files. You might want to change the default settings of the CSV data reader to better work with the format of your existing source data.


About this task

The CSV data reader reads and processes data from an input CSV file one record at a time until the end of the file is reached. Each record in the CSV file must have the same data structure. The data read from the CSV file can be mapped to a WebSphere Commerce business object by using a business object configuration file. Using the configuration file, each column of data in the input CSV file is mapped directly to a property of a WebSphere Commerce business object. A CSV file can contain multiple data records, with each record spanning multiple columns. Each column value for a record is also known as a token. The CSV file must include delimiter characters to separate tokens within each record and to separate records. The CSV data reader uses these delimiter characters to identify each record and token.


Procedure

  1. Open the wc-loader-<object>.xml configuration file in edit mode. A sample of this file is in the following directory:

  2. Find the <_config:DataReader> element.

  3. Add the following optional parameters inside the <_config:DataReader> tag:

      lineDelimiter
      Specifies the line separator character or record separator character. Default is the new line character. The lineDelimiter character cannot appear in the content of a token unless enclosed within the tokenValueDelimiter character.

      Note: If we want records in a CSV file to span multiple lines, we can configure a custom lineDelimiter character to identify the end of a record. By configuring a different delimiter character, CSV files can include newline characters within object records, instead of having the data reader handle each newline character as the end of a record. For instance, we can configure the lineDelimiter to be a semi-colon ( ; ) instead of the newline character. With this new lineDelimiter character configured, the following CSV file is considered to have a single object record instead of two records.

        Column1, Column2, Column3, Column4, Column5;
        Value1,Val
        ue2,Value3,Value4,Value5;

      The CSV data reader reads this object record as a single record with the value for Column2 spanning multiple lines.

      tokenDelimiter
      Specifies the token separator character. The default is the comma character (,).

      tokenValueDelimiter
      Specifies the string separator character. The tokenValueDelimiter is used to indicate the beginning and the end of a token. The default tokenValueDelimiter character is the double quotation mark ("). For instance, the following token, which contains commas, can be used for a catalog entry short description:

        "Men's fashions for business, casual, and formal occasions"

      Notes:

      • If we are editing your file with a plain text editor, use the tokenValueDelimiter when your token contains special characters, such as the tokenDelimiter character or the tokenValueDelimiter itself. To use the tokenValueDelimiter character within the token, we must use two tokenValueDelimiter characters. For instance, the following token, which contains commas and quotation marks, can be used for a catalog entry short description:

          "Men's fashions for ""business"", ""casual"", and ""formal"" occasions."

        The output can resemble the following string:

          Men's fashions for "business", "casual", and "formal" occasions.

        These usages of the tokenValueDelimeter apply only when we are using a plain text editor to edit your file.

      • To include column values that span multiple lines within your input file, enclose the column value within tokenValueDelimiter characters. By enclosing the value within these characters, we can include the newline character in the column value without causing the data reader to handle the newline character as the end of the object record.

      charset
      Specifies the character set of the CSV file. The default character set is UTF-8.

      firstLineIsHeader
      Indicates that the first line in the CSV file is column header information. Use this header line for providing the column mappings in the <_config: Data> element in the wc-loader-<object>.xml configuration file. The default value is false.

      useHeaderAsColumnName
      Indicates that the first line in the CSV file is used as column information. The default value for useHeaderAsColumnName is false. There are four possible combinations of the firstLineIsHeader and useHeaderAsColumnName parameters:

      1. firstLineIsHeader = "false" and useHeaderAsColumnName = "false". In this case, the column mappings in the wc-loader-<object>.xml configuration file is mandatory.

      2. firstLineIsHeader = "false" and useHeaderAsColumnName = "true". In this case, the useHeaderAsColumnName flag is ignored and the column mapping is mandatory.

      3. firstLineIsHeader = "true" and useHeaderAsColumnName = "false". In this case, the column mapping configuration is optional. If the column mapping configuration is defined in the wc-loader-<object>.xml configuration file, use the column mapping configuration. If not, use the CSV header for the column names.

      4. firstLineIsHeader = "true" and useHeaderAsColumnName = "true". In this case, the column mapping configuration is ignored and always use the CSV header for the column names.

      Note: The DataReader element can contain nested elements. To add column mappings, we can use the following code as an example:

        <_config:DataReader firstLineIsHeader="false" useHeaderAsColumnName="false">
            <_config:Data>
                <_config:column number="1" name="FIRST" />
                <_config:column number="2" name="SECOND" />
            </_config:Data>
        </_config:DataReader>

  4. Save and close the file.


Example

The following code snippet demonstrates how to use the parameters. This code snippet uses all default values:

Previous topic: Configure the Data Load utility to run a file difference preprocess
Next topic: Configure the XML data reader