IBM TDI, Version 7.1
The Comma Separated Values (CSV) Parser reads and writes data in
a CSV format.
In the Config Editor, the parameters are set in the Parser tab of the Connector. If you want to use
TAB as a Field Separator you need to specify \t, but when supplying Field Names use the actual tab character
between field names.
On output, multi-valued attributes only deliver
their first value.
The Parser has the following parameters:
- Field Separator
- Specifies the character used to separate each column; typically
a comma or semicolon. If not specified, the parser attempts to guess
when reading, and uses a comma when writing. You can use backslash
( \ ) as the escape character to specify non-printable characters.
For example, ( \t ) denotes the TAB character.
- Sort Fields
- Check this option to write header fields in alphabetical (ascending)
order. The default is false, that is unchecked.
- Field Names
- Specifies the name for each column the parser must read or write.
If not specified, the parser reads the first line and uses the value
as field names. You can use the Field Separator between the field
names, or specify each name on a separate line.
- Enable Quoting
- On write, when this parameter is set to true (that
is, checked), the field is output with quotation marks around it under
the same conditions as in previous versions, however, quotation marks
inside a quoted field are now doubled.
Quoting is set to false, the field is output
as is, which can cause problems.
When reading, quotation marks around the field are stripped if this parameter is
set to true, and the parser is able to read
quoted attributes containing the column separator. If this parameter
is set to false, the parser returns unexpected
values when the input contains fields delimited by quotation marks.
- Quote all fields
- Quote all fields independently if they contain quotation mark, separator or new line
- Write Header
- The default value for this parameter is true.
If Write Header is set, the first line output
by the parser contains all the field names separated by the column
- Log long lines
- Define a maximum number of bytes for a line. Linenumbers of
lines longer than this maximum number are logged.
- Combine remainder in last field
- if checked, combine all extra fields from lines exceeding the
number of defined fields into a new "Remainder" field.
The fields, and implicitly, the number of fields, are defined either using the Field Names parameter, or in absence of
this, the first line of the file.
- Character Encoding
- Character Encoding to be used. Also see Character Encoding conversion.
- Detailed Log
- If this field is checked, additional log messages are generated.
The schema which the CSV Parsers provides to the Input/Output Connector
map is taken from the value of the Field Names configuration
parameter of the Parser. The parser will simply copy the fields from
the parameter to the Maps of the Connector. This saves you from copying
all the fields one by one from the Parser to the corresponding Connector