This document defines the exact nature of a Jabber Identifier (JID). Note well: this document is superseded by the XMPP Core memo defined by the IETF's XMPP Working Group.
WARNING: This document has been retracted by the author(s). Implementation of the protocol described herein is not recommended. Developers desiring similar functionality should implement the protocol that supersedes this one (if any).
Series: XEP
Number: 0029
Publisher: XMPP Standards Foundation
Status:
Retracted
Type:
Standards Track
Version: 1.1
Last Updated: 2003-10-03
Approving Body: XMPP Council
Dependencies: None
Supersedes: None
Superseded By: None
Short Name: N/A
Wiki Page: <http://wiki.jabber.org/index.php/Definition of Jabber Identifiers (JIDs) (XEP-0029)>
Email:
craigk@jabber.com
JabberID:
craigk@jabber.com
The preferred venue for discussion of this document is the Standards discussion list: <http://mail.jabber.org/mailman/listinfo/standards>.
Errata may be sent to <editor@xmpp.org>.
The Extensible Messaging and Presence Protocol (XMPP) is defined in the XMPP Core (RFC 3920) and XMPP IM (RFC 3921) specifications contributed by the XMPP Standards Foundation to the Internet Standards Process, which is managed by the Internet Engineering Task Force in accordance with RFC 2026. Any protocol defined in this document has been developed outside the Internet Standards Process and is to be understood as an extension to XMPP rather than as an evolution, development, or modification of XMPP itself.
The following keywords as used in this document are to be interpreted as described in RFC 2119: "MUST", "SHALL", "REQUIRED"; "MUST NOT", "SHALL NOT"; "SHOULD", "RECOMMENDED"; "SHOULD NOT", "NOT RECOMMENDED"; "MAY", "OPTIONAL".
1. Introduction
2. JIDs
2.1. Grammar
2.2. Domain Identifier
2.3. Node Identifier
2.4. Resource Identifier
2.5. Limited Resources
Notes
Revision History
Jabber Identifiers (JIDs) uniquely identify individual entities in the Jabber network. To date, their syntax has been defined by convention, existing implementations, and available documentation. As it exists, certain characters that are allowed in JIDs cause ambiguity, and the lack of a size limit on resources defies database schemas and causes some trivial JID operations to require dynamic memory allocation. This document seeks to both define and improve the existing JID syntax. This document will not explain the general usage or nature of JIDs, instead focusing on syntax.
JIDs consist of three main parts:
JIDs are encoded UTF-8. A grammar will be presented first, followed by specific clarifying and further restricting remarks.
<JID> ::= [<node>"@"]<domain>["/"<resource>]
<node> ::= <conforming-char>[<conforming-char>]*
<domain> ::= <hname>["."<hname>]*
<resource> ::= <any-char>[<any-char>]*
<hname> ::= <let>|<dig>[[<let>|<dig>|"-"]*<let>|<dig>]
<let> ::= [a-z] | [A-Z]
<dig> ::= [0-9]
<conforming-char> ::= #x21 | [#x23-#x25] | [#x28-#x2E] |
[#x30-#x39] | #x3B | #x3D | #x3F |
[#x41-#x7E] | [#x80-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF]
<any-char> ::= [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
A domain identifier is a standard DNS hostname as specified in RFC952 [1] and RFC1123. [2] It is case-insensitive 7-bit ASCII and limited to 255 bytes. It is the only required component of a JID.
Node identifiers are restricted to 256 bytes, They may contain any Unicode character higher than #x20 with the exception of the following:
Case is preserved, but comparisons will be made in case-normalized canonical form.
Resources identifiers are case-sensitive and are limited to 256 bytes. They may include any Unicode character greater than #x20, except #xFFFE and #xFFFF.
To date, resource identifiers have not had a fixed limit on their length. This document seeks to limit it to 256 bytes for the following reasons:
In a worst-case encoding, such as Han ideographs, 256 bytes will provide enough storage space for 64 character points. This provides a lower bound on the number of characters a node may have in its resource.
Specifying limits in terms of bytes instead of characters is somewhat arbitrary once a lower bound for characters is established. This document proposes limits in terms of bytes mainly because doing so results in parsing efficiency; specifically, an implementation does not have to un-encode the UTF-8 string for the sole purpose of further restricting character sets that require fewer than four bytes per character point. It is sufficient to have a lower bound on characters and an upper bound on bytes.
END