The Simple XML Parser reads and writes XML documents; it deals with XML data which is not more than two levels deep. This Parser uses the Apache Xerces and Xalan libraries. The Parser gives access to XML document through a script object called xmldom. The xmldom object is an instance of the org.w3c.dom.Document interface. Refer to http://java.sun.com/xml/jaxp-1.0.1/docs/api/index.html for a complete description of this interface.
We can also use the XPathAPI (http://xml.apache.org/xalan-j/apidocs/index.html and access its Java™ Classes in your Scripts) to search and select nodes from the XML document. selectNodeList, a convenience method in the system object, can be used to select a subset from the XML document.
When the Connector is initialized, the Simple XML Parser tries to perform Document Type Definition (DTD) verification if a DTD tag is present.
Use the Connector's override functions to interpret or generate the XML document yourself. Create the necessary script in either the Override GetNext or GetNext Successful in your AssemblyLine's hook definitions. If we do not override, the Parser reads or writes a very simple XML document that mimics the entry object model. The default Parser only permits you to read or write XML files two levels deep. It will also read multi-valued attributes, although only one of the multi-value attributes will be shown when browsing the data in the Schema tab.
Note that certain methods, such as setAttribute are available in both the IBM TDI entry and the objects returned by xmldom.createElement. These functions have the same name or signature. Do not confuse the xmldom objects with the IBM TDI objects.
Notes:
conn.setAttribute("dn", work.getAttribute("$dn")); conn.removeAttribute("$dn");
The Parser has the following parameters:
If this text is to be processed by a program (and not meant for human interpretation) you most likely will want to deselect this parameter. This way, no unnecessary spaces or newlines will be inserted in the output.
The default and recommended Character Encoding to use when deploying the Simple XML Parser is UTF-8. This will preserve data integrity of the XML data in most cases. When you are forced to use a different encoding, the Parser will handle the various encodings in the following way:
Override Add hook:
var root = xmldom.getDocumentElement(); var entry = xmldom.createElement ("entry"); var names = work.getAttributeNames(); for ( i = 0; i < names.length; i++ ) { xmlNode = xmldom.createElement ("attribute"); xmlNode.setAttribute ( "name", names[i] ); xmlNode.appendChild ( xmldom.createTextNode ( work.getString( names[i] ) ) ); entry.appendChild ( xmlNode ); } root.appendChild ( entry );
After Selection hook:
// // Set up variables for "override getnext" hook // var root = xmldom.getDocumentElement(); var list = system.selectNodeList ( root, "//Entry" ); var counter = 0;
Override GetNext hook
// // Note that the Iterator hooks are NOT called when we override the getnext function // Initialization done in After Select Entries hook var nxt = list.item ( counter ); if ( nxt != null ) { var ch = nxt.getFirstChild(); while ( ch != null ) { var child = ch.getFirstChild(); while (child != null ) { // Use the grandchild's value if it exist, to be able to read multivalue attributes grandchild = child.getFirstChild(); if (grandchild != null) nodeValue = grandchild.getNodeValue(); else nodeValue = child.getNodeValue(); // Ignore strings containing newlines, they are just fillers if (nodeValue != null && nodeValue.indexOf('\n') == -1) { work.addAttributeValue ( ch.getNodeName(), nodeValue ); } child = child.getNextSibling(); } ch = ch.getNextSibling(); } result.setStatus (1); // Not end of input yet counter++; } else { result.setStatus (0); // Signal end of input }
The previous example parses files containing items that look like the following entries:
<DocRoot> <Entry> <firstName>John</firstName> <lastName>Doe</lastName> <title>Engineer</title> </Entry> <Entry> <firstName>Al</firstName> <lastName">Bundy</lastName> <title">Shoe salesman</title> </Entry> </DocRoot>
Suppose instead that the input looks like the following entries:
<DocRoot> <Entry> <field name="firstName">John</field> <field name="lastName">Doe</field> <field name="title">Engineer</field> </Entry> <Entry> <field name="firstName">Al</field> <field name="lastName">Bundy</field> <field name="title">Shoe salesman</field> </Entry> </DocRoot>
Here the attribute names can be retrieved from attributes of the field node, and this code is used in the Override GetNext Hook:
var nxt = list.item ( counter ); if ( nxt != null ) { var ch = nxt.getFirstChild(); while ( ch != null ) { if(String(ch.getNodeName()) == "field") { attrName = ch.getAttributes().item(0).getNodeValue(); nodeValue = ch.getFirstChild().getNodeValue(); work.addAttributeValue ( attrName, nodeValue ); } ch = ch.getNextSibling(); } result.setStatus (1); // Not end of input yet counter++; } else { result.setStatus (0); // Signal end of input }
This example package demonstrates how the base Simple XML Parser functionality can be extended to read XML more than two levels deep, by using the Override GetNext and Override Add hooks.
Go to the root_directory/examples/simplexmlparser directory of the IBM TDI.
XML Parser,
XML SAX Parser,
XSL based XML Parser,
SOAP Parser,
DSML Parser.