Copyright 2006 Henri Sivonen
This specification defines a RELAX NG datatype library that allows precise attribute datatyping in RELAX NG schemas for (X)HTML5. This library does not include datatypes for all (X)HTML5 datatypes. This library only includes datatypes that are impossible or inconvenient to describe using the built-in facilities of RELAX NG or the XSD datatype library.
This is a work in progress! In its current form, this document is intended to provide a way for the author to organize and communicate his thoughts. Even though this document is intended to develop into an implementable specification, you should not implement this draft spec. This spec has not been endorsed by the WHAT WG.
RELAX NG does not provide a built-in means for constraining the lexical space of attribute values (or the text content of elements) beyond enumerating permissible string literals (with or without whitespace trimming). However, RELAX NG provides extensibility via datatype libraries. RELAX NG validators are expected to provide an API for plugging in implementations of datatype libraries. This way, the conformance to a datatype specification can be checked using a Turing-complete programming language.
Typically RELAX NG validators have a built-in implementation of the XSD datatype library. The XSD library provides the datatypes from W3C XML Schemas for use in RELAX NG schemas. Most notably, the XSD datatype library provides regular expressions for constraining the lexical space of a datatype to a regular language.
The XSD datatype library is not adequate for developing accurate RELAX NG schemas for (X)HTML5. Hence, the library described in this specification is needed.
A custom datatype is needed in the following situations:
dateTime datatype discards whitespace before testing the pattern facet, which makes it unsuitable for enforcing the format of WHAT WG dates.)The datatypes defined herein do not check that the value contains only XML 1.0 characters. That task is left for another layer of software.
This datatype library uses the ID/IDREF/IDREFS feature defined in RELAX NG DTD Compatibility. With the exception of the types whose local name is ID, IDREF and IDREFS, the ID-type of the datatypes of this datatype library is null.
Checking for value equality is not needed for these datatypes in order to be able to write RELAX NG schemas for (X)HTML5. However, in order for implementations of this datatype library to behave consistently under equality tests, the datatypes of this datatype library shall implement the equality test as the strict code point for code point string equality test.
The datatypes of this datatype library are independent of the namespace mapping context.
Whitespace characters are U+0020, U+0009, U+000D and U+000A.
This specification states which values each datatypes shall accept. The datatypes must reject values that they are not defined to accept.
In addition to matching the lexical format, an acceptable value for the date datatypes must be a valid date according to the proleptic Gregorian calendar. For example 2006-02-29 is not a valid value for date, because 2006 is not a leap year. On the other hand, 1582-10-07 and 1752-09-07 must be treated as valid dates.
Leap seconds are not allowed in times.
dateThis datatype shall accept strings that conform to the format specified for date inputs in Web Forms 2.0.
This datatype must not accept the empty string.
datetimeThis datatype shall accept strings that conform to the format specified for datetime inputs in Web Forms 2.0.
This datatype must not accept the empty string.
datetime-localThis datatype shall accept strings that conform to the format specified for datetime-local inputs in Web Forms 2.0.
This datatype must not accept the empty string.
datetime-tzThis datatype shall accept strings that conform to the format specified for datetime attribute of the ins and del elements in Web Applications 1.0.
This datatype must not accept the empty string.
IDThis datatype shall accept any string that consists of one or more characters and does not contain any whitespace characters.
The ID-type of this datatype is ID.
IDREFThis datatype shall accept any string that consists of one or more characters and does not contain any whitespace characters.
The ID-type of this datatype is IDREF.
IDREFSThis datatype shall accept any string that consists of one or more characters and contains at least one character that is not a whitespace character.
The ID-type of this datatype is IDREFS.
languageThis datatype shall accept strings that are conforming RFC 3066 language tags. When a subtag value is not reserved for private use, this datatype shall only accept values that were registered at the time the implementation of this datatype was developed.
Note that the allowed ALPHA letters are A–Z and a–z, so U+0130 and U+0131 must not be accepted as case-insensitive versions of i and I. Likewise, “oß” is not a conforming language tag for Ossetian.
This datatype must not accept the empty string.
Since registered language and country codes change over time, implementations should document when their internal snapshot of registered language and country codes was taken.
patternThis datatype shall accept the strings that are allowed as the value of the Web Forms 2.0 pattern attribute.
weekThis datatype shall accept strings that conform to the format specified for week inputs in Web Forms 2.0.
This datatype must not accept the empty string.