Custom Taxonomy Support in the UDDI Registry

 

The IBM WebSphere UDDI Registry is supplied with six published taxonomies (or categorization schemes) in the taxonomy data. Of these six, four are checked. Taxonomies can be either checked or unchecked, and this is indicated via a keyedReference in the categoryBag of the tModel that represents a taxonomy (a "categorization tModel"). These keyedReferences have the tModel key for uddi-org:types and are added to the categoryBag to further describe the behavior of the categorization tModel, as follows:

 

checked

Marking a tModel with this classification asserts that it represents a categorization, identifier, or namespace tModel that has a validation service to check that category values are present in a specified value set.

 

unchecked

Marking a tModel with this classification asserts that it represents a categorization, identifier, or namespace tModel that does not have a validation service.

In the IBM WebSphere UDDI Registry (and also in the IBM UDDI Business Registry or UBR), the validation of categories in checked taxonomies is performed against locally managed taxonomy data. Several published taxonomies are provided:

Taxonomy name Checked Description tModel key
ntis-gov:naics:1997 Yes Business Taxonomy: NAICS (1997 Release) uuid:C0B9FE13-179F-413D-8A5B-5004DB8E5BB2
uddi-org:iso-ch:3166-1999 Yes top>ISO 3166-1:1997 and 3166-2:1998. Codes for names of countries and their subdivisions. Part 1: Country codes. Part 2: Country subdivision codes. Update newsletters include ISO 3166-1 V-1 (1998-02-05), V-2 (1999-10-01), ISO 3166-2 I-1 (1998) uuid:4E49A8D6-D5A2-4FC2-93A0-0411D8D19E88
unspsc-org:unspsc Yes Product Taxonomy: UNSPSC uuid:CD153257-086A-4237-B336-6BDCBDCC6634
unspsc-org:unspsc:3-1 No Product Taxonomy: UNSPSC (Version 3.1) uuid:DB77450D-9FA8-45D4-A7BC-04411D14E384
uddi-org:types Yes UDDI Type Taxonomy uuid:C1ACF26D-9672-4404-9D70-39B756E62AB4
uddi-org:general_keywords No Special taxonomy consisting of namespace identifiers and the keywords associated with the namespaces uuid:A035A07C-F362-44DD-8F95-E2B134BF43B4


Taxonomy data is provided in the IBM WebSphere UDDI Registry for all the above taxonomies, apart from the general keywords taxonomy (which is unchecked). The UDDI User Console (GUI) provided with the IBM WebSphere UDDI Registry uses a shortened label for taxonomies when displayed in the taxonomy tree view, or in a pull-down list of available taxonomies as follows:

Taxonomy Name (published) Taxonomy name (GUI)
ntis-gov:naics:1997 naics
uddi-org:iso-ch:3166-1999 geo
unspsc-org:unspsc unspsc7
unspsc-org:unspsc:3-1 unspsc
uddi-org:types udditype
uddi-org:general_keywords other


This release of IBM WebSphere UDDI Registry (included with IBM WebSphere Application Server, Version 5.0.2) introduces the ability to add user-defined taxonomies, with available allowed values presented in the existing GUI taxonomy tree display. IBM WebSphere Studio Application Developer, Version 5.1 has a Web Services Explorer user interface which also allows addition and display of custom checked taxonomies. The publisher of a custom taxonomy's categorization tModel may specify a 'display name' for use in GUI implementations.

 

Procedure for adding a Custom taxonomy

To add a custom taxonomy to the IBM WebSphere UDDI Registry requires you to perform two tasks: load the custom taxonomy data and publish a categorization tModel. Only when both are complete will the checked taxonomy be of practical use. Taxonomy data must be provided for validating checked taxonomies.

Taxonomy data may also be used by GUIs for unchecked taxonomies, but it is not a requirement and is usually only used for presentation of deprecated taxonomies, such as unspsc-org:unspsc.

If the taxonomy is checked, then any publish requests that have a categoryBag containing keyedReferences with the new categorization tModel will be validated. If there is taxonomy data corresponding to the categorization tModel in the registry database then only valid values will be accepted. If there is no taxonomy data in the database then all values will be rejected, and the publish request will fail. If the categorization tModel is unchecked, all values will be allowed, regardless of whether there is corresponding taxonomy data present in the UDDI Registry database.

 

Suggested approach

The suggested way of introducing a new taxonomy is to:

  1. Load custom taxonomy data into the UDDI Registry database using the UDDITaxonomyTools.jar utility (described below)

  2. Publish the categorization tModel with a keyedReference of type 'general keywords' with keyname of 'customTaxonomy:key' and a keyValue matching the taxonomy name in the taxonomy data file (described below also)

Note: the SOAP and EJB interfaces will be able to make use of categorization tModels as soon as they are published. However, the UDDI Registry GUI will currently require a restart of the UDDI application because it currently gathers its list of categorizations for use in the taxonomy tree display when the application starts.

 

Loading Custom Taxonomy Data

 

Custom Taxonomy Data File Format

Taxonomy data is identified by a common taxonomy name, a unique code value, an optional description and a parent code which specifies its relationship with other code values. Taxonomy data must adhere to this format:

Column name Maximum length Description of use
name 8 uniquely identifies the taxonomy within the registry
code 32 unique value within the taxonomy used for validation
description 128 typically used by GUIs and optionally in the keyedReference as the keyName value
parentcode 32 indicates which existing code is the logical parent of this one, and is used in tree displays


Typically columns are delimited in the taxonomy data file by '#' characters as in this example:

food#00#Food#00
food#10#Fruit#00
food#101#Apples#10
food#102#Oranges#10
food#103#Pears#10
food#1031#Anjou#103
food#1032#Conference#103
food#1033#Bosc#103
food#104#Pomegranates#10
food#20#Vegetables#00
food#201#Carrots#20
food#202#Potatoes#20
food#203#Peas#20
food#204#Sprouts#20

In the example, 'Food' is the description for the root node with child nodes of 'Fruit' and 'Vegetables' (both of these have parentcode values the same as the code value for 'Food').

The taxonomy data in the example file could then be rendered in a tree like this:

Food
  Fruit
    Apples
    Oranges
    Pears
      Anjou
      Conference
      Bosc
    Pomegranates
  Vegetables
    Carrots
    Potatoes
    Peas
    Sprouts

The file must be saved in UTF-8 format.

The following taxonomy names are reserved within the IBM WebSphere UDDI Registry and should not be used for custom taxonomy files: naics, geo, unspsc, unspsc7, other, udditype. Any attempts to publish a categorization tModel using these values for a customTaxonomy:key will be rejected. If these names are used in custom data files and the data is imported it will be indistinguishable from taxonomy data with the same name.

UDDITaxonomyTools.jar

A utility is provided to load taxonomy data into the IBM WebSphere UDDI Registry, rename existing taxonomy data and remove existing taxonomy data, for both IBM DB2 and Cloudscape databases. The usage for each database and platform is identical:

Usage: java -jar UDDITaxonomyTools.jar {function} [options]

function:
  -load <path>          Load taxonomy data from specified file
  -rename <old> <new>   Rename existing taxonomy
  -unload <name>        Unload existing taxonomy

options:
  -properties <path>    Specify location of configuration file

Note: Ensure that the command window from which the UDDITaxonomyTools.jar is run is using a suitable codepage and font for displaying the characters contained in the taxonomy name.

Use of an incorrect codepage/font may result in unclear messages on a successful load, and create difficulty using the -unload and -rename options.

The following section explains in more detail how to use the utility's commands and parameters. The configuration file, if specified by the optional properties parameter, determines the database driver, authentication information and delimiters. The contents are as follows (typical data for DB2 installation shown):

Property and example data (for DB2) Comments
classpath= "c:/program files/sqllib/java12/db2java.zip; c:/tools/UDDITaxonomyTools.jar" Classpath including database driver and the UDDITaxonomyTools.jar*
database.driver.className=com.ibm.db2j.jdbc.DB2jDriver Fully qualified classname of the database driver class
database.url=../WAS51/appserver/bin/UDDI20 JDBC URL of the database
database.userName=db2admin Database userid (DB2 only)
database.password=db2admin Database password (DB2 only)
column.delimiter=# Column delimiter used in taxonomy data files
string.delimiter=\" Field delimiter (must be different to the column.delimiter value)


* the classpath needs to be enclosed in quotes if the path includes space characters. Also, the UDDITaxonomyTools.jar filepath itself must be appended to the classpath (if the working directory is the same as the location of the UDDITaxonomyTools.jar then just the name is sufficient)

Filepath names should include the use of the forward-slash character (/) for all platforms.

For Cloudscape database users, the values of the following properties would be likely to be:

  • classpath=../WAS51/appserver/lib/db2j.jar; UDDITaxonomyTools.jar

  • database.driver.className=com.ibm.db2j.jdbc.DB2jDriver

  • database.url=jdbc:db2j:c:/ websphere/appserver/bin/uddi20

The string.delimiter is typically used where a description value contains the same character as the column delimiter character. For example, if the column.delimiter was set to ',' (comma), and there was a taxonomy description value of 'Fruits, citrus', you could include this in the taxonomy data file by setting the string.delimiter to "(double quote) and enclosing the description in quotes: 'Fruits, citrus'. Note that the quote character is escaped with a backslash to indicate the literal character is to be used.

If a properties parameter is not specified, the utility looks for and uses configuration data set in a file called customTaxonomy.properties.

Note: to make updates to taxonomy data in a Cloudscape database, the IBM WebSphere Application Server must be stopped to release the connection to the database.

Note: There is currently a limitation with UDDITaxonomyTools.jar when used with a DB2 UDDI database and multi-byte characters such as Chinese, Japanese and Korean. The maximum number of multi-byte characters is the maximum value specified earlier for name, code, description and parentcode divided by 3. For example, name can only contain values up to 8 characters in length so the maximum number of Korean characters is 2. If the taxonomy file is found to have values that exceed the limits, a message is displayed by the tool indicating the line number and column where the problem occurs. This limitation does not affect use with a Cloudscape UDDI database.

 

Publishing a Checked Categorization tModel

This section describes how to publish a checked categorization tModel with the 'customTaxonomy' keyedReferences to specify which custom taxonomy data to use and a display name.

Note: to specify an unchecked categorization substitute the 'checked' keyValue with 'unchecked' or, more simply, omit the keyedReference.

Publish a tModel to the IBM WebSphere UDDI Registry with a categoryBag containing keyedReferences as follows:

Note tModelKey KeyName KeyValue
1 (uddi-org:types) <optional> categorization
2 (uddi-org:types) <optional> checked
3 (general keywords) urn:x-ibm:uddi:customTaxonomy:key <custom taxonomy key>
4 (general keywords) urn:x-ibm:uddi:customTaxonomy:displayName <custom taxonomy name>


  1. indicates this tModel is a categorization tModel (required)

  2. indicates use of the tModel will be checked against a list of valid data (required). (Omitting this keyedReference, or explicitly specifying a value of 'unchecked' will indicate this categorization is unchecked).

  3. this special use of the general keywords taxonomy, with a proprietary urn as the keyName value, defines the value used by the UDDI Registry to look up taxonomy data in its database. The value must be 1-8 (inclusive) characters long and corresponds directly with the name value in the custom taxonomy data file. Therefore, it must be unique within the registry.

  4. this special use of the general keywords taxonomy, with a proprietary urn as the keyName value, defines a name for the custom taxonomy that is intended for use in GUI implementations where the full tModel name might be too long*. The value can be 1-255 characters (inclusive) long. If this keyedReference is not supplied, the name of the tModel should be used by the GUI implementation.

* The displayName is intended to provide a way to label a taxonomy such that, when the UDDI GUI displays it in a taxonomy tree or in a pull-down list of available taxonomies, the meaning is clear to the user without being restricted to 8 characters and without needing to be the same as the published tModelName, which could be as long as 255 characters. An example is shown below:

Uniqueness of the urn:x-ibm:uddi:customTaxonomy:key value is validated at the time a categorization tModel is published. If it is not unique, a UDDIInvalidValueException is returned. If using a GUI to publish the tModel, an appropriate message is displayed indicating the likely cause of the problem.

The urn:x-ibm:uddi:customTaxonomy:displayName should be unique if only to avoid confusion when displayed in GUIs but this is not validated.

As a further example, to display the label 'Delicious Victuals' in GUI displays, the categorization tModel would have a keyedReference like this:

type keyName keyValue
other urn:x-ibm:uddi:customTaxonomy:displayName Delicious Victuals


And to link a categorization tModel to a custom taxonomy datafile with a taxonomy name of 'goodfood' the tModel's categoryBag must have a keyedReference like this:

type keyName keyValue
other urn:x-ibm:uddi:customTaxonomy:key goodfood


To publish a new categorization tModel using SOAP, the message would be:

<save_tModel generic="2.0" xmlns="urn:uddi-org:api_v2">    
  <authInfo></authInfo>>
  <tModel tModelKey="">
    <name>Natural Foods tModel</name>
    <categoryBag>
      <keyedReference tModelKey="uuid:C1ACF26D-9672-4404-9D70-39B756E62AB4" keyValue="categorization"/>
      <keyedReference tModelKey="uuid:C1ACF26D-9672-4404-9D70-39B756E62AB4" keyValue="checked"/>
      <keyedReference tModelKey="uuid:A035A07C-F362-44DD-8F95-E2B134BF43B4" keyName="urn:x-ibm:uddi:customTaxonomy:key" keyValue="food"/>
      <keyedReference tModelKey="uuid:A035A07C-F362-44DD-8F95-E2B134BF43B4" keyName="urn:x-ibm:uddi:customTaxonomy:displayName" keyValue="Natural Foods"/>
    </categoryBag>
  </tModel>  
</save_tModel>

Note: Note that 'uuid:C1ACF26D-9672-4404-9D70-39B756E62AB4' is the tModel key for uddi-org:types and 'uuid:A035A07C-F362-44DD-8F95-E2B134BF43B4' is the tModel key for uddi-org:general_keywords.

 

Validation and Error Handling

For a DB2-based IBM WebSphere UDDI Registry, custom taxonomy data can be loaded, removed and renamed using the provided utility without restarting the application (if you are using Cloudscape the appserver will need to be stopped to make database updates). Removing data for which there is a corresponding checked categorization tModel will cause any use of that categorization's data to be reported as invalid.

If an attempt is made to add data with a name that matches any of the 'internal' taxonomies, such as NAICS, GEO, etc, the request is rejected. If an attempt is made to rename or remove one of the internal taxonomies, a warning message is returned. Likewise if the user tries to rename a taxonomy to one of the reserved taxonomies, that is rejected.

The IBM WebSphere UDDI Registry user console will perform validation while a save tModel request is being built, that is, before the publish occurs. For example, if a categorization tModel with a customTaxonomy:key keyValue of 'food' already exists (in a published categorization tModel), and the user tries to add a keyedReference with the same value to the current list of keyedReferences, the following message is displayed:

Advice: The 'urn:x-ibm:uddi:customTaxonomy:key' value of 'food' is already in use by another categorization tModel. Enter a unique value

Similarly, only one of each of the customTaxonomy:key and customTaxonomy:displayName keyedReferences are allowed. For example, if the user tries to add two customTaxonomy:displayName keyedReferences they will get the message:

Advice: Only one 'urn:x-ibm:uddi:customTaxonomy:displayName' key name is allowed for the 'Other' taxonomy

If the customTaxonomy:key keyedReference is valid and unique at the time it is added to the save_tModel request, the keyedReference is further validated when the user makes the publish request, to ensure that another session has not successfully published a categorization tModel with the same customTaxonomy:key. In this case, the user is returned to the Publish Technical Model page.

If a keyedReference containing a keyName value that starts with 'urn:x-ibm:uddi:customTaxonomy:' is followed by anything other than 'key' or 'displayName', the following message is displayed:

Advice: Only key name values of 'urn:x-ibm:uddi:customTaxonomy:displayName' and 'urn:x-ibm:uddi:customTaxonomy:key' are supported.

For SOAP, UDDI4J, and EJB initiated requests where the save_tModel message may have multiple tModels, if any one of the tModels is a categorization tModel and it fails validation, the request fails with a UDDIInvalidValueException (plus additional information explaining the likely cause), and none of the tModels is published. For example, if a publish request includes a customTaxonomy:key keyedReference with a keyValue that matches the customTaxonomy:key keyValue of an existing categorization tModel, the following UDDIInvalidValueException is thrown, with message:

E_invalidValue (20200) A value that was passed in a keyValue attribute did not pass validation. This applies to checked categorizations, identifiers and other validated code lists. The error text will clearly indicate the key and value combination that failed validation. Invalid 'customTaxonomy:dbKey' keyValue [naics] in keyedReference. KeyValue already in use by tModelKey[UUID:C0B9FE13-179F-413D-8A5B-5004DB8E5BB2]

The customTaxonomy:key and customTaxonomy:displayName keyValue values are validated. For example, a publish categorization tModel request with a keyedReference including a customTaxonmy:key of 'toolongdbkey' was attempted, the following UDDIInvalidValueException is thrown, with message:

E_invalidValue (20200) A value that was passed in a keyValue attribute did not pass validation. This applies to checked categorizations, identifiers and other validated code lists. The error text will clearly indicate the key and value combination that failed validation. Invalid 'customTaxonomy:key' keyValue [toolongdbkey] in keyedReference. tModelKey[]

If a categorization tModel is edited in the user console, or republished via SOAP, UDDI4J or EJB, such that it is no longer a categorization tModel (ie the categorization keyedReference is removed), then that tModel is removed from the internal store of categorization tModels, and its customTaxonomy:key value, if present, is available for use by new categorization tModels.


IBM WebSphere UDDI Registry

 

WebSphere is a trademark of the IBM Corporation in the United States, other countries, or both.

 

IBM is a trademark of the IBM Corporation in the United States, other countries, or both.