Directory Server, Version 6.1

 

Appendix C. LDAP data interchange format (LDIF)

This documentation describes the LDAP Data Interchange Format (LDIF), as used by the idsldapmodify, idsldapsearch, and idsldapadd utilities.The LDIF specified here is also supported by the server utilities provided with the IBM® Directory.

LDIF is used to represent LDAP entries in text form. The basic form of an LDIF entry is:

dn: <distinguished name>
<attrtype> : <attrvalue>
<attrtype> : <attrvalue>
...

A line can be continued by starting the next line with a single space or tab character, for example:

      dn: cn=John E Doe, o=University of Higher
       Learning, c=US

Multiple attribute values are specified on separate lines, for example:

      cn: John E Doe
      cn: John Doe

If an <attrvalue> contains a non-US-ASCII character, or begins with a space or a colon ':', the <attrtype> is followed by a double colon and the value is encoded in base-64 notation. For example, the value " begins with a space" would be encoded like this:

      cn:: IGJlZ2lucyB3aXRoIGEgc3BhY2U=

Multiple entries within the same LDIF file are separated by a blank line. Multiple blank lines are considered a logical end-of-file.

 

LDIF example

Here is an example of an LDIF file containing three entries.

      dn: cn=John E Doe, o=University of High
       er Learning, c=US
      cn: John E Doe
      cn: John Doe
      objectclass: person
      sn: Doe
  
      dn: cn=Bjorn L Doe, o=University of High
       er Learning, c=US
      cn: Bjorn L Doe
      cn: Bjorn Doe
      objectclass: person
      sn: Doe
  
      dn: cn=Jennifer K. Doe, o=University of High
       er Learning, c=US
      cn: Jennifer K. Doe
      cn: Jennifer Doe
      objectclass: person
      sn: Doe
      jpegPhoto:: /9j/4AAQSkZJRgABAAAAAQABAAD/2wBDABALD
       A4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQ
       ERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVG
      ...

The jpegPhoto in Jennifer Doe's entry is encoded using base-64. The textual attribute values can also be specified in base-64 format. However, if this is the case, the base-64 encoding must be in the code page of the wire format for the protocol (that is, for LDAP V2, the IA5 character set and for LDAP V3, the UTF-8 encoding).

 

Version 1 LDIF support

The client utilities (idsldapmodify and idsldapadd) have been enhanced to recognize the latest version of LDIF, which is identified by the presence of the "version: 1" tag at the head of the file. Unlike the original version of LDIF, the newer version of LDIF supports attribute values represented in UTF-8 (instead of the very limited US-ASCII).

However, manual creation of an LDIF file containing UTF-8 values may be difficult. In order to simplify this process, a charset extension to the LDIF format is supported. This extension allows an IANA character set name to be specified in the header of the LDIF file (along with the version number). A limited set of the IANA character sets are supported. See IANA character sets supported by platform for the specific charset values that are supported for each operating system platform.

The version 1 LDIF format also supports file URLs. This provides a more flexible way to define a file specification. File URLs take the following form:

      attribute:< file:///path          (where path syntax depends on platform)

For example, the following are valid file Web addresses:

      jpegphoto:< file:///d:\temp\photos\myphoto.jpg    (DOS/Windows style paths)
      jpegphoto:< file:///etc/temp/photos/myphoto.jpg   (UNIX or Linux style paths)
Note:

The IBM Directory utilities support both the new file URL specification as well as the older style (e.g. "jpegphoto: /etc/temp/myphoto"), regardless of the version specification. In other words, the new file URL format can be used without adding the version tag to your LDIF files.

 

Version 1 LDIF examples

We can use the optional charset tag so that the utilities will automatically convert from the specified character set to UTF-8 as in the following example:

 version: 1
 charset: ISO-8859-1

 dn: cn=Juan Griego, o=University of New Mexico, c=US
 cn: Juan Griego
 sn: Griego
 description:: V2hhdCBhIGNhcmVmdWwgcmVhZGVyIHlvd
 title: Associate Dean
 title: [title in Spanish]
 jpegPhoto:< file:///usr/local/photos/jgriego.jpg

In this instance, all values following an attribute name and a single colon are translated from the ISO-8859-1 character set to UTF-8. Values following an attribute name and a double colon (such as description:: V2hhdCBhIGNhcm... ) must be base-64 encoded, and are expected to be either binary or UTF-8 character strings. Values read from a file, such as the jpegPhoto attribute specified by the Web address in the previous example, are also expected to be either binary or UTF-8. No translation from the specified "charset" to UTF-8 is done on those values.

In this example of an LDIF file without the charset tag, content is expected to be in UTF-8, or base-64 encoded UTF-8, or base-64 encoded binary data:

# IBM Directorysample LDIF file
 #
 # The suffix "o=sample" should be defined before attempting to load
 # this data.

 version: 1

 dn: o=sample
 objectclass: top
 objectclass: organization
 o: IBM

 dn: ou=Austin, o=sample
 ou: Austin
 objectclass: organizationalUnit
 seealso: cn=Linda Carlesberg, ou=Austin, o=sample

This same file could be used without the version: 1 header information, as in previous releases of the IBM Directory:

 # IBM Directorysample LDIF file
 #
 # The suffix "o=sample" should be defined before attempting to load
 # this data.

 dn: o=sample
 objectclass: top
 objectclass: organization
 o: IBM

 dn: ou=Austin, o=sample
 ou: Austin
 objectclass: organizationalUnit
 seealso: cn=Linda Carlesberg, ou=Austin, o=sample
Note:

The textual attribute values can be specified in base-64 format.

 

IANA character sets supported by platform

The following table defines the set of IANA-defined character sets that can be defined for the charset tag in a Version 1 LDIF file, on a per-platform basis. The value in the left-most column defines the text string that can be assigned to the charset tag. An "X" indicates that conversion from the specified charset to UTF-8 is supported for the associated platform, and that all string content in the LDIF file is assumed to be represented in the specified charset. "n/a" indicates that the conversion is not supported for the associated platform.

String content is defined to be all attribute values that follow an attribute name and a single colon.

See IANA Character Sets for more information about IANA-registered character sets. Go to:

http://www.iana.org/assignments/character-sets
Table 31. IANA-defined character sets
Character Locale DB2® Code Page
Set Name HP-UX Linux®, Linux_390, NT AIX® Solaris UNIX® NT
ISO-8859-1 X X X X X 819 1252
ISO-8859-2 X X X X X 912 1250
ISO-8859-5 X X X X X 915 1251
ISO-8859-6 X X X X X 1089 1256
ISO-8859-7 X X X X X 813 1253
ISO-8859-8 X X X X X 916 1255
ISO-8859-9 X X X X X 920 1254
ISO-8859–15 X n/a X X X
IBM437 n/a n/a X n/a n/a 437 437
IBM850 n/a n/a X X n/a 850 850
IBM852 n/a n/a X n/a n/a 852 852
IBM857 n/a n/a X n/a n/a 857 857
IBM862 n/a n/a X n/a n/a 862 862
IBM864 n/a n/a X n/a n/a 864 864
IBM866 n/a n/a X n/a n/a 866 866
IBM869 n/a n/a X n/a n/a 869 869
IBM1250 n/a n/a X n/a n/a
IBM1251 n/a n/a X n/a n/a
IBM1253 n/a n/a X n/a n/a
IBM1254 n/a n/a X n/a n/a
IBM1255 n/a n/a X n/a n/a
IBM1256 n/a n/a X n/a n/a
TIS-620 n/a n/a X X n/a 874 874
EUC-JP X X n/a X X 954 n/a
EUC-KR n/a n/a n/a X X* 970 n/a
EUC-CN n/a n/a n/a X X 1383 n/a
EUC-TW X n/a n/a X X 964 n/a
Shift-JIS n/a X X X X 932 943
KSC n/a n/a X n/a n/a n/a 949
GBK n/a n/a X X n/a 1386 1386
Big5 X n/a X X X 950 950
GB18030 n/a X X X X
HP15CN X (with non-GB18030)

* Supported at Solaris 7.

Notes:

  1. The new Chinese character set standard (GB18030) is supported with appropriate patches available from www.sun.com and www.microsoft.com

  2. On the Windows® 2000 operating system, set the environment variable zhCNGB18030=TRUE.




[ Top of Page | Previous Page | Next Page | Contents | Index ]