An Unofficial Q&A about the Discontinuation of the XHTML2 WG

So the W3C finally announced that the XHTML2 WG will be taken off life support at the end of 2009. I’m annoyed that Zeldman used the F-laden TLA “WTF” instead of “AFT” in title of his post about the announcement. Moreover, many of the comments on Zeldman’s post indicate that there are people who are badly misinformed about the matters surrounding this announcement. To help remedy that, here’s some quick Q&A for getting informed.

First off, is there anything to disclose?
I’ve been working on HTML5—the main competition of XHTML2—for a long time. I get paid for it. I wrote this Q&A on my own time and it wasn’t vetted by anyone prior to publication. (And yes, this question is copied from Steve O’Grady.)
What was announced?
The W3C management has decided to allow the charter of the XHTML2 Working Group to expire at the end of 2009 and not to renew it.
What’s XHTML?
There are two meanings to XHTML: technical and marketing. The technical kind (XHTML served using the application/xhtml+xml MIME type) is a formulation of HTML as an XML vocabulary. The marketing kind (XHTML served using the text/html MIME type) is processed just like HTML by browsers but the authors attempt to observe slightly different syntax rules in order to make it seem that they are doing something newer and shinier compared to HTML. (Note added 2009-07-07: I apologize to authors who are using XHTML and got offended. The “in order to” part of the previous sentence was meant as a jibe at gurus who used XHTML as a marketing platform but gave pseudo-technical reasons e.g. about parsing and mobile clients—not at people who listened to them.)
What’s XHTML2?
XHTML2 was a new language similar to XHTML but incompatible with it. It wasn’t an XML formulation of any HTML spec.
Was XHTML2 being implemented?
Not in any of the top 5 browsers. There was an experimental implementation of an old draft years ago in a research browser.
What was the XHTML2 WG working on?
The XHTML2 WG was working on several things:
Did the W3C kill XHTML2?
No, XHTML2 was already dead for all practical purposes due to its failure to be backwards compatible and its failure to deliver compelling new features. The W3C just announced they will take it off life support.
What happens to XHTML 1.x specs?
If the XHTML2 WG completes its editorial revisions by the end of the year, it’s possible that they publish editorial revisions as new Editions of the previous Recommendations.
What happens to RDFa?
RDFa (in XHTML but not in HTML!) is a W3C Recommendation and, as such, doesn’t need any terminating action per the W3C Process. It’s unclear if another WG will develop further RDFa specs.
What happens to the role attribute?
Most likely an ARIA-only CURIEless incarnation of the role attribute will find its way into HTML 5 as the ARIA specs mature.
What happens to the other XHTML2 WG deliverables?
According to the FAQ published by the W3C, the XML events spec will likely end up in the Forms Working Group (the group working on XForms), Access modules will likely end up in the HTML WG and the remaining deliverables will end up as Working Group Notes. Personally, I doubt that the Access module will be supported by consensus at the HTML WG.
What’s the HTML WG?
The HTML WG is another W3C working group that is working on HTML 5 together with the WHATWG.
What’s the WHATWG?
The WHATWG is a group that individuals from Apple, Mozilla and Opera founded outside the W3C in order to evolve HTML when the W3C told them that work wasn’t welcome within the W3C. Later, the W3C changed its mind, renamed the previous HTML WG into XHTML2 WG and formed a new HTML WG.
If the remaining deliverables of the XHTML2 WG are going to be published as Notes, does it mean the W3C endorses them after all?
No. The W3C Process doesn’t allow a document to be simply abandoned once it has been published as a First Public Working Draft. The documents have to end up as either Recommendations or Notes. Groups that are still within charter can stall their abandoned deliverables indefinitely, but the upcoming expiration of the XHTML2 WG charter will force the adherence to the W3C Process on this point.
Is the W3C dropping work on XHTML?
No. The HTML WG is defining XHTML5 which is an XML serialization for HTML5.
I’ve published Web pages using XHTML 1.0 or XHTML 1.1. Do I need to rewrite them now?
No. They will continue to function as before.
What’s the upgrade path from XHTML 1.x?
For the technical kind of XHTML 1.x—that is, XHTML served as application/xhtml+xml—the upgrade path is to XHTML5. For the marketing kind of XHTML 1.x—that is, XHTML served as text/html—the upgrade path is to HTML5. Moreover, “HTML5” replaces “XHTML” (and “Ajax”!) as the coolest marketing buzzword.
What’s HTML5?
HTML5 is a new level of the Web’s most significant markup language. New features provide better support for Web applications, for video and audio and for expressing document structure. This language is defined in a specification called HTML 5. “HTML5” is also used as a marketing buzzword for all the new cool features in the browser platform—even for features that have never been in the HTML 5 spec or that have been spun off it.
Video? Wasn’t video removed from HTML5 recently?
No. That’s a bogus rumor. (What was removed was some placeholder text about codecs.)
Is HTML5 being implemented?
Yes. Firefox, Opera, Safari, Chrome and IE implement bits and pieces of HTML5—even more so in nightly builds than in releases. The future is already here. It just isn’t evenly distributed yet.
If I upgrade from XHTML-served-as-text/html to HTML5, do I need to revise all my empty tags?
No. HTML5 permits both the XHTML-style syntax (<br/>) and the HTML 4-style syntax (<br>) for void elements (elements that never take any content).
Is XHTML5 more semantic than HTML5?
No.
Can I serve XHTML5 as text/html?
You can’t. HTML5 and XHTML5 are defined in terms of MIME type, so text/html isn’t XHTML5 by definition.
Can a document be both HTML5 and XHTML5 if I serve it as text/html to IE and as application/xhtml+xml to other browsers?
It is possible to construct documents that are valid HTML5 when labeled as text/html and valid XHTML5 when labeled as application/xhtml+xml. Doing so is much harder than it first appears and is most often useless, so you’d probably spend your time better by not trying.
Can HTML5 be validated?
Yes. With an HTML5 validator.
Which one should I use: HTML5 or XHTML5?
In most cases, the answer is HTML5. XHTML5 doesn’t work in IE. (Just like technical XHTML has never worked in IE. Only the marketing kind of XHTML has worked in IE.)
What if I want to include SVG or MathML inline?
You will be able to use SVG and MathML inline in text/html once browsers upgrade their parsers. You can test this today by downloading a nightly build of Firefox, going to about:config and flipping the preference html5.enable to true. (Demo page.) However, for the time being, to use inline SVG or MathML with released versions of Firefox, Opera, Safari or Chrome you need to use application/xhtml+xml instead.
What’s the doctype for HTML5 documents?
Simply: <!DOCTYPE html>
What’s the doctype for XHTML5 documents?
application/xhtml+xml documents don’t need a doctype. XHTML5 can use any doctype (or none), because any other requirement would reach onto the XML layer and violate the clean layering of XHTML5 and XML. For simplicity, I suggest you use no doctype for XHTML5. (Yes, the XHTML 1.0 specification violates clean layering.)
If I can use any doctype for XHTML5, how can browsers tell XHTML 1.0 and XHTML5 apart?
They can’t and they don’t need to. By design, a user agent that implements XHTML5 will process inputs authored as XHTML 1.0 appropriately.
I’m using XML tools to consume content. What do I do with HTML5?
When your application receives content labeled as application/xhtml+xml, instantiate an XML parser. When your application receives content labeled as text/html, instantiate an HTML5 parser. There are now off-the-shelf HTML5 parsers (such as the Validator.nu HTML Parser) that expose an XML API so your application sees an infoset that looks just like the infoset from an XML parser parsing the equivalent XHTML5 document.
I’m using XML tools to generate content. What should I do?
If you don’t care about IE, you can use an XML serializer and serialize to XHTML5 (application/xhtml+xml). However, if you do care about IE, you can use an HTML5 serializer and serialize from an XML pipeline to text/html. In this case, you must avoid constructs that aren’t supported in text/html (e.g. div as a child of p).
But XSLT and XPath don’t work with HTML!
Incorrect. As mentioned above, HTML5 parsers expose an infoset equivalent to an XML parser parsing XHTML5. The Validator.nu HTML Parser comes with a sample application for using the JDK XSLT engine with HTML5 inputs.
Do semantics round-trip in an HTML5 to XHTML5 to HTML5 conversion?
Yes, provided that the first HTML5 input is valid and you don’t ascribe semantics to characters that aren’t allowed in XML (such as form feed or U+FFFF). Note that RDFa isn’t valid in either HTML5 or XHTML5 as currently drafted.
What about XHTML5 to HTML5 to XHTML5?
Not if namespace-based extensibility is used. However, in the common case, the conversion chain does round trip if the input is valid XHTML5 + SVG 1.1 + MathML 2.0 (this excludes RDFa), doesn’t use namespaces from outside those specs (It’s debatable if the previous condition already covers this.), xml:space on HTML elements is not considered to affect semantics and relative URLs are rewritten so that xml:base attributes can be removed without breaking links. (Answer clarified/corrected 2009-07-07.)
What’s the namespace for HTML5?
HTML elements are in the http://www.w3.org/1999/xhtml namespace. You don’t need to declare this namespace in text/html. An HTML5 parser puts HTML elements in the namespace automatically.
So does HTML5 support namespaces?
There’s no syntax for declaring namespaces in text/html. Syntax that looks like a namespace declaration has no effect. However, the HTML5 parsing algorithm automatically assigns stuff to namespaces appropriately.
Does XHTML5 support namespaces?
Yes. XHTML5 is layered on top of XML plus Namespaces.
Are the semantics of HTML5 extensible?
Yes. With microdata.
Is it true that HTML5 has fewer accessibility features than XHTML 1.x?
No. HTML5 has a larger number of accessibility features, but it isn’t obvious that they are accessibility features, because by design they haven’t been designed solely for accessibility but provide opportunities for enhanced accessibility as a side effect of something else.
Will Zeldman now just do s/XHTML/HTML5/ in all his books and republish?
No. He says that conjecture is “Wrong as prohibition”.