Parsers are used in conjunction with a transport Connector to interpret or generate the content that travels over the Connector's byte stream. However, sometimes you may want to parse data that is presented in a very specific format; for this purpose we will need to implement our own Parser.
All TDI Parsers implement the com.ibm.di.parser.ParserInterface Java™ interface. This interface provides a number of methods to implement that are common to all parsers. Usually the parsers that you write will not require implementing all methods provided by the interface but only a subset of them. For this purpose we can use the com.ibm.di.parser.ParserImpl abstract class that implements the ParserInterface. The ParserImpl class contains the core Parser functionality so we can subclass it when implementing our own Parser.
There are two types of parsers: ones that read from a stream and return an Entry; and others that take an Entry and write it to a stream.
Once the Parser is constructed we have to configure it. This includes setting the input/output streams and configuring some additional parameters if needed. This is usually made by the hosting component (for example, a Connector). When finished with this job, the next step is initialization of the Parser where resources for future needs are allocated and any other initialization takes place. Generally the hosting component takes care of both configuring and calling the initialization method of the Parser. Next comes the most significant moment of using the Parser - writing or reading the entries. This is the place where the actual parsing happens. Finally, when the Connector has finished transporting the entries, the Parser must be closed. When closed, the Parser releases the resources that were used in the previous stages as well as closing the input and output streams.
For an example of a Parser implementation, look at the ExampleParser.java Parser included in TDI. These are some of the important methods we will usually need to implement:
Note that when we open the input stream, it is your responsibility to close it. This is usually done in the closeParser() method. The com.ibm.di.parser.ParserImpl abstract class provides default implementation for closing the Parser input and output streams.
Note that when we open the output stream, it is your responsibility to close it. This is usually done in the closeParser() method. The com.ibm.di.parser.ParserImpl abstract class provides a default implementation for closing the Parser input and output streams.
We can allocate resources you may need in future, as well as setting any parameters or additional chained parsers. This method may not be required for all implemented parsers.
Here is an example of how we can access parameters. This set of code is part of the included example "ExampleParser.java".
str = getParam("attributeName"); if (str != null && str.trim().length() != 0) { attrName = str; }
Note that this method is called after setting of input and output streams is done.
Make sure we have initialized the input stream properly. In order to set the input stream we can use the setInputStream(...) method. We can use the getReader() method to get the reader object.
Generally input streams are initialized by the hosting component (for example, a Connector).
In order to get the writer we can use the getWriter() method which returns a "java.io.BufferedWriter".
Generally the output stream is initialized by the hosting component (for example, a Connector).
The com.ibm.di.parser.ParserImpl abstract class provides an implementation for this method but if you implement the interface we will have to write it by yourself.
When building the source code of the Parser, include in your CLASSPATH the jar files from the "jars" folder of the IBM TDI installation. As a minimum, you would need to include "miserver.jar" and "miconfig.jar". Keep in mind that the source code must be compiled for Java 5 or older.
When integrating your Java code with TDI, pay attention to the collection of pre-existing components that comprise TDI, notably in the jars directory. If your code relies upon one of our own library components that overlap or clash with one or more that are part of the TDI installation there will most likely be loader problems during execution. In other words, we should be careful about possible conflicts with third-party libraries that are shipped with TDI. This means that we should avoid creating a Parser that uses one version of a library when TDI uses another version of the same library.
The Parser GUI is implemented in the same way as for a Connector. Use a "tdi.xml" file to describe the Parser configuration form by using the same syntax as used for Connectors.
Packaging and deploying a Parser is just like packaging and deploying a Connector:
We need a jar file that contains:
After creating the jar file of the new Parser, you only need to drop that jar file in the "jars" folder in the TDI installation. We can create our own folder and put the jar there but the general place where parsers are stored is the "jars/parsers" folder. The next time the TDI is started it will automatically load the new Parser and it will be ready for use.