Internet Draft Marc Blanchet draft-ietf-idn-idne-01.txt Viagenie July 8, 2000 Paul Hoffman Expires in six months IMC & VPNC Internationalized domain names using EDNS (IDNE) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The current DNS infrastructure does not provide a way to use internationalized domain names (IDN). This document describes an extension mechanism based on EDNS which enables the use of IDN without causing harm to the current DNS. IDNE enables IDN host names with a as many characters as current ASCII-only host names. It fully supports UTF-8 and conforms to the IDN requirements. 1. Introduction Various proposals for IDN have tried to integrate IDN into the current limited ASCII DNS. However, the compatibility issues make too many constraints on the architecture. Many of these proposals require modifications to the applications or to the DNS protocol or to the servers. This proposal take a different approach: it uses the standardized extension mechanism for DNS (EDNS) and uses UTF-8 as the mandatory charset. It causes no harm to the current DNS because it uses the EDNS extension mechanism. The major drawback of this proposal is that all protocols, applications and DNS servers will have to be upgraded to support this proposal. 1.1 Terminology The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Hexadecimal values are shown preceded with an "0x". For example, "0xa1b5" indicates two octets, 0xa1 followed by 0xb5. Binary values are shown preceded with an "0b". For example, a nine-bit value might be shown as "0b101101111". Examples in this document use the notation from the Unicode Standard [UNICODE3] as well as the ISO 10646 [ISO10646] names. For example, the letter "a" may be represented as either "U+0061" or "LATIN SMALL LETTER A". In the lists of prohibited characters, the "U+" is left off to make the lists easier to read. 1.2 IDN summary Using the terminology in [IDNCOMP], this protocol specifies an IDN architecture of arch-2 (send binary or ACE). The binary format is bin-1.1 (UTF-8), and the method for distinguishing binary from current names is bin-2.4 (mark binary with EDNS0). The transition period is not specified. 2. Functional Description DNS query and responses containing IDNE labels have the following properties: - The string in the label MUST be pre-processed as described in [NAMEPREP] before the query or response is prepared. - The characters in the label MUST be encoded using UTF-8 [RFC2279]. - The entire label MUST be encoded EDNS [RFC2671]. - The version of the IDN protocol MUST be identified. 3. Encoding An IDNE label uses the EDNS extended label type prefix (0b01), as described in [RFC2671]. (A normal label type always begin with 0b00). A new extended label type for IDNE is used to identify an IDNE label. This document uses 0b000010 as the extended label type; however, the label type will be assigned by IANA and it may not be 0b000010. 0 1 2 bits 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-//+-+-+-+-+-+-+ |0 1| ELT | Size | IDN label ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+-+-+-+-+ ELT: The six-bit extended label type to be assigned by the IANA for an IDN label. In this document, the value 0b000010 is used, although that might be changed by IANA. Size: Size (in octets) of the IDN label following. This MUST NOT be zero. IDN label: Label, encoded in UTF-8 [RFC2279]. Note that this label might contain all ASCII characters, and thus can be used for host name labels that are legal in [STD13]. IDNE labels can be mixed with STD13 labels in a domain name. The compression scheme in section 4.1.4 of [STD13] is supported as is. Pointers can refer to either IDN labels or non-IDN labels. 3.1 Examples 3.1.1 Basic example The following example shows the label me.com where the "e" in "me" is replaced by a <LATIN CAPITAL LETTER E WITH ACUTE>, which is U+00C9. The decomposition and downcasing specified in [NAMEPREP] changes the second character to <LATIN SMALL LETTER E WITH ACUTE>, U+00E9. This string is then transformed using UTF-8 [RFC2279] to 0x6DC3A9. Ignoring the other fields of the message, the domain name portion of the datagram could look like: +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 20 | 0 1 0 0 0 0 1 0| 0 0 0 0 0 0 1 1| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 22 | 0x6D (m) | 0xC3 (e'(1)) | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 24 | 0xA9 (e'(2)) | 3 | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 26 | 0x63 (c) | 0x6F (o) | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 28 | 0x6D (m) | 0x00 | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ Octet 20 means EDNS extended label type (0b01) using the IDN label type (0b000010) Octet 21 means size of label is 3 octets following Octet 22-24 are the "m*" label encoded in UTF-8 Octet 25-28 are "com" encoded as a STD13 label Octet 29 is the root domain 3.1.2 Example with compression Using the previous labels, one datagram might contain "www.m*.com" and "m*.com" (where the "*" is <LATIN CAPITAL LETTER E WITH ACUTE>). Ignoring the other fields of the message, the domain name portions of the datagram could look like: +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 20 | 0 1 0 0 0 0 1 0| 0 0 0 0 0 0 1 1| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 22 | 0x6D (m) | 0xC3 (e'(1)) | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 24 | 0xA9 (e'(2)) | 3 | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 26 | 0x63 (c) | 0x6F (o) | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 28 | 0x6D (m) | 0x00 | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ . . . +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 40 | 3 | 0x77 (w) | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 42 | 0x77 (w) | 0x77 (w) | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 44 | 1 1| 20 | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The domain name "m*.com" is shown at offset 20. The domain name "www.m*.com" is shown at offset 40; this definition uses a pointer to concatenate a label for www to the previously defined "m*.com". 4. Label Size In IDNE, the maximum length of a label is 255 octets, and the maximum size for a domain name is 1023 octets. The reason for using these values is so that IDNE labels can have the same number of characters as the ASCII-based labels in [STD13]. Because character encoding in UTF-8 is variable length, the maximum octet length for characters expected in the foreseeable future (that is, 4 octets for a single character) was used. Note that this extension allows some IDNE labels to be longer than 63 characters and some IDNE names to be longer than 255 octets. Software creating DNS queries or responses using IDNE MUST verify that, after IDN preparation and transformation to UTF8, that no labels are longer than 255 octets and that no names are longer than 1023 octets. If there is a user interface associated with the process creating the query or response, that interface SHOULD give the user an error message. Software MUST NOT transmit DNS queries or responses which contain labels that are longer than 255 octets or names that are longer than 1023 octets. Servers MUST NOT accept DNS queries or responses which contain labels that are longer than 255 octets or names that are longer than 1023 octets, and MUST send the NOTIMPL RCODE error message if such queries or responses are received. 5. UDP Packet Size IDNE-capable senders and receivers MUST support UDP packet sizes of 1220 octets, not including IP and UDP headers (note that the minimum MTU for IPv6 is 1280 [RFC2460]). A sender MUST announce its capability in the OPT pseudo-RR described in section 4.3 of [RFC2671] by having the CLASS sender's UDP payload size be greater than or equal to 1220. 6. Canonalization, Prohibited Characters, and Case Folding The string in the label MUST be pre-processed as described in [NAMEPREP] before the query or response is prepared. A query or response MUST NOT contain a label that does not conform to [NAMEPREP]. 7. Versions of IDNE The IDN protocol version number MUST be included in the OPT RR RDATA of EDNS (described in Section 4.4 of [RFC2671]). An OPTION-CODE will be assigned by IANA for storing the IDNE protocol version number; this document uses 0x0001 for the OPTION-CODE. The value (that is, the OPTION-DATA) is the version number coded in 8 bits. All requesters MUST send this information as part of the OPT RR included in the EDNS packet. 7.1 This version of IDNE This document describes version 1 of IDNE. This version is a combination of the protocol in this document and the rules as described in [NAMEPREP]. Note that [NAMEPREP] describes a single version of the list of canonicalization, case folding, and prohibited characters, and that this document is linked to that single version of [NAMEPREP]. The identifiers for this specification are: OPTION-CODE = 0x0001 (IDNE protocol version) OPTION-LENGTH = 0x0001 (1 octet following) OPTION-DATA = 0x01 (IDNE protocol version 1) 7.2 Creating new versions of IDNE A new version of IDNE is created by a standards-track RFC that specifies: - a normative reference to [NAMEPREP] or a successor document to [NAMEPREP] - an IDNE version number that is 1 greater than the highest IDNE version number at the time the RFC is published If there are any changes to the encoding or interpretation of the protocol, they must also be specified in the same standards-track RFC. 7.3 Prohibited characters and versions of IDNE If a server receives a request containing an illegal or unknown character (as described in the version number in the request), it MUST send a NOTIMPL RCODE to the client. For example, if a server that understands both version 1 and version 2 receives a request that is marked as version 1, but contains a label that includes a character that is prohibited in version 1 but allowed in version 2, that server must still send a NOTIMPL RCODE to the client. 8. API Specifications The current API for TCP/IP uses gethostbyname and gethostbyaddr for IPv4 and getnodeipbyname and getnodeipbyaddr (specified in [RFC 2671]) for both IPv4 and IPv6. These function calls returns hostent structs, where the h_name field contains a pointer to a char. In this context, receiving a UTF-8 string mean that the application should know that UTF-8 uses more than one octet per char. A new flag "IDN" (to appear in netdb.h) is defined to be passed in the flags argument of getnodeipbynode and getnodeipbyaddr. This flag tells the resolver to request an IDNE-encoded name. No new return code is defined since the returned codes in RFC 2671 are meaningful in the IDNE context. If one has not yet converted his code to IPv6 and still wants to enable IDNs with this API, one can do a macro of the getnodeipby* functions mapped to the IPv4 gethostby* ones, including the "IDN" flag, and then process differently based on the presence of the flag. 9. Transition and Deployment Deployment of this proposal means updating clients and servers, as well as applications and protocols, and therefore a transition strategy is proposed. Because many DNS servers do not yet handle IDNE and may take years or decades to do so, an ASCII-compatible encoding (ACE) format for IDN names is also needed as a transition to an all-IDNE DNS. Note that IDNE and an ACE are not related, and do not interact in the DNS. If the IETF chooses to have an ACE mechanism in use at the same time as IDNE, it would be wise to choose an ACE that allows as many characters as possible in the name parts and full names. IDNE allows names with as many characters as current names. This means that it is possible to create names in IDNE that are longer than those that can be created in the ACE protocols that have been described so far. Although not prohibited, it is unwise to create a name that can be legally represented in IDNE but not in the ACE, or a name that can be legally represented in the ACE but not in IDNE. The IETF should periodically evaluate the benefits and problems associated with having three different formats for names (STD13, IDNE, and ACE). If at some point it is decided that the problems outweigh the benefits, the IETF can state a time when one or more of the services should not be used on the Internet. 10. Root Server Considerations Because this specification uses EDNS, root servers should be prepared to receive EDNS requests. This specification handles IDN top-level domains in exactly the same fashion as it does every other domain. Considerations about IDN top-level domains are outside of this work, but the first IDN top-level domains would require all root servers to be ready for IDNE requests. 11. IANA Considerations [[ TBD. This section will have two parts. The first will request an EDNS option code. The second will specify how IDNE version numbers are allocated (namely, standards-track RFC only). ]] 12. Security Considerations Because IDNE uses EDNS, it inherits the same security considerations as EDNS. Much of the security of the Internet relies on the DNS. Thus, any change to the characteristics of the DNS can change the security of much of the Internet. Host names are used by users to connect to Internet servers. The security of the Internet would be compromised if a user entering a single internationalized name could be connected to different servers based on different interpretations of the internationalized host name. Because this document normatively refers to [NAMEPREP] and [RFC2671], it includes the security considerations from those documents as well. 13. References [IDNCOMP] Paul Hoffman, "Comparison of Internationalized Domain Name Proposals", draft-ietf-idn-compare. [ISO10646] ISO/IEC 10646-1:1993. International Standard -- Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane. Five amendments and a technical corrigendum have been published up to now. UTF-16 is described in Annex Q, published as Amendment 1. 17 other amendments are currently at various stages of standardization. [[[ THIS REFERENCE NEEDS TO BE UPDATED AFTER DETERMINING ACCEPTABLE WORDING ]]] [NAMEPREP] Paul Hoffman & Marc Blanchet, "Preparation of Internationalized Host Names", draft-ietf-idn-nameprep. [RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate Requirement Levels", March 1997, RFC 2119. [RFC2279] Francois Yergeau, "UTF-8, a transformation format of ISO 10646", January 1998, RFC 2279. [RFC2460] Steve Deering & Bob Hinden, "Internet Protocol, Version 6 (IPv6) Specification", December 1998, RFC 2460. [RFC2671] Paul Vixie, "Extension Mechanisms for DNS (EDNS0)", August 1999, RFC 2671. [STD13] Paul Mockapetris, "Domain names - implementation and specification", November 1987, STD 13 (RFC 1035). [UNICODE3] The Unicode Consortium, "The Unicode Standard -- Version 3.0", ISBN 0-201-61633-5. Described at <http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>. A. Acknowledgements This document is the result of the thinking of many people. The following people made significant comments on the early drafts: Andre Cormier Andrew Draper Bill Sommerfeld Francois Yergeau B. Changes from -00 to -01 1.1: Added reference to Unicode names. 3: Clarified that a size of zero is not allowed. 3.1.1 and 3.1.2: Fixed two very serious errors in the examples. 6: Removed second paragraph, which was redundant with 7.3. 12: Beefed up the security considerations. 13: Added [ISO10646] and [UNICODE3]. Added Appendix A. Added Appendex B. C. Authors' Addresses Marc Blanchet Viagenie 2875 boul. Laurier, bureau 300 Sainte-Foy, QC G1V 2M2 Canada Marc.Blanchet@viagenie.qc.ca Paul Hoffman Internet Mail Consortium and VPN Consortium 127 Segre Place Santa Cruz, CA 95060 USA phoffman@imc.org