File splitting during outbound processing, WebSphere Adapter for Flat Files

IBM BPM, V8.0.1, All platforms > Authoring services in Integration Designer > Services and service-related functions > Access external services with adapters > Configure and using adapters > IBM WebSphere Adapters > Flat Files > Overview of WebSphere Adapter for Flat Files > Technical overview > Outbound processing
File splitting

To support files that contain multiple records, the adapter provides an optional file splitting feature. When you use this feature during the Retrieve operation, the adapter divides large files into smaller chunks, which are then retrieved separately.

Depending upon the type of content contained in the file, the file can be split by delimiter or by size.

When the content of the business object has a definite structure, for example, if it contains elements such as name, address and city, the file is split by delimiter.
When the business object contains unstructured data, such as plain text or binary files, the file is split by size.
By default, the adapter splits files by size.
The value specified in the SplitCriteria property determines the method that is used. The default value for SplitCriteria property is zero, which means that no splitting is performed. You can also leave the values of the SplitCriteria and SplittingFunctionClassName properties empty if no splitting is required.
You can optionally provide a custom file splitter class. Set the SplittingFunctionClassName property to the name of the class.

File splitting by delimiter

When one or more characters such as a comma (,), semicolon (;), quotation mark ( ", ' ), brace ({}), or slash ( / \ ) (delimiters) are used to separate the business objects in a file, the adapter can split the file into smaller chunks based on the delimiter. You define the delimiter that separates the business objects in the file in the SplitCriteria property.
You can enable file splitting by delimiter by specifying the value of the SplittingFunctionClassName property as com.ibm.j2ca.utils.filesplit.SplitByDelimiter.
The following rules apply to the use of delimiters:
All new lines in the delimiter are represented by platform-specific newline characters. The platform-specific newline characters are shown in Table 1.

Platform-specific newline characters
Platform Newline character
Macintosh \r
Microsoft Windows \r\n
UNIX \n

If there is more than one delimiter, each delimiter must be separated by a semicolon (;). The delimiters are matched in the order in which they are given. If the semicolon is part of the delimiter, it must be escaped as \;.
For example, if the delimiter is ##\;##, it is processed as ##;##.
To skip content that is part of the delimiter, specify a double semicolon (;;) in front of it so that the content between the delimiters is skipped.
For example, if the event file contains a business object in the following format and the delimiter is ##;;$$, the adapter considers ##$$ as the delimiter and skips content skipped by the adapter:
Name=Smith
Company=IBM
##content skipped by the adapter$$
The delimiter can have any value, and there are no restrictions on it. The delimiter is a combination of a valid string, the newline character (for example, \n), and a semicolon separator if there is more than one delimiter. A delimiter does not have to comprise the newline character and a semicolon. The newline character is used only when a newline is to be considered when splitting the contents of the file. Examples of valid delimiters include:

####;\n;\n
####;$$$$;\n;####
%%%%;$$$$$;#####
\n;\n;$$$$
####\;####;\n;$$$$$
\n;\n;\n
####;;$$$$
\r
\r\n
$$$$;\r\n

If the delimiter is at the end of the file, the SplitCriteria property uses END_OF_FILE to determine the physical end of the file.
Example of a common scenario and the recommended delimiter format:
Delimiter format for a scenario
Data binding BO content Recommended delimiter format
XML
<?xml version="1.0" encoding="UTF-8"?>
<customer:Customer xsi:type="customer:Customer" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:customer="http://www.ibm.com/xmlns/prod/websphere/
j2ca/flatfile/customer">
<CustomerName>Deepa</CustomerName>
<Address>IBM</Address>
<City>Bangalore</City>
<State>KA</State>
</customer:Customer>
\n
File splitting by size

The value specified in the SplittingFunctionClassName property determines whether a file is split by size. If the SplittingFunctionClassName property is set to com.ibm.j2ca.utils.filesplit.SplitBySize, the SplitCriteria property must contain a valid number that represents the maximum file size, in bytes. If the file is larger than the value specified in the SplitCriteria property, the file is split into chunks and each chunk is posted to the import separately. If the file is smaller than the SplitCriteria value, the entire file is posted to the import.
When event files are split into chunks, each chunk becomes a business object. This means that the value specified for the PollQuantity property and the number of business objects delivered to the import can be different. Although the adapter polls according to the PollQuantity value, it actually processes the number of business objects in the file one at a time.
For example, if an event file is chunked into three parts, one file is polled and the three business objects are delivered to the import (because each chunk creates a single business object).
At the import, the adapter does not reassemble the chunked data into a single file, but it provides information about the chunks to enable IBM BPM or WebSphere Enterprise Service Bus to reassemble them into a single file. The chunk information is included in the ChunkFileName property of the FlatFileInputStreamRecord record, and includes the chunk size in bytes and the event ID. The event ID of a chunk uses the following form: eventFileLocation_/_timestampStr_/_MofN, where M is the current chunk number and N is the total number of chunks. An event ID would look like the following example:
C:\flatfile\eventdir\eventfile.in_/_2005_01_10_10_17_49_864_/_3of5, where timestampStr has the following format: year_month_day_hour_minutes_seconds_milliseconds.

Outbound processing