Henri Sivonen
<!DOCTYPE html> <html> <head> <title>Hello World!</title> </head> <body> <h1>Hello World!</h1> <p>Foo</p> </body> </html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Hello World!</title> </head> <body> <h1>Hello World!</h1> <p>Foo</p> </body> </html>
Vocabulary | HTML | |
---|---|---|
Serialization | HTML | XHTML |
Media Type | text/html | a…n/xhtml+xml |
Parser | HTML | XML |
Tree API | DOM |
Vocabulary | HTML | |
---|---|---|
Serialization | HTML | XHTML |
Media Type | text/html | a…n/xhtml+xml |
Parser | HTML | XML |
Tree API | DOM |
“HTML 4 is an SGML application conforming to International Standard ISO 8879 -- Standard Generalized Markup Language SGML (defined in [ISO8879]).”
“SGML systems conforming to [ISO8879] are expected to recognize a number of features that aren’t widely supported by HTML user agents. We recommend that authors avoid using all of these features.”
Source: http://www.w3.org/TR/html401/
10.2.4.10 Tag name state
Consume the next input character:
- U+0009 CHARACTER TABULATION
- U+000A LINE FEED (LF)
- U+000C FORM FEED (FF)
- U+0020 SPACE
- Switch to the before attribute name state.
- U+002F SOLIDUS (/)
- Switch to the self-closing start tag state.
- U+003E GREATER-THAN SIGN (>)
- Switch to the data state. Emit the current tag token.
- U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z
- Append the lowercase version of the current input character (add 0x0020 to the character's code point) to the current tag token's tag name.
- U+0000 NULL
- Parse error. Append a U+FFFD REPLACEMENT CHARACTER character to the current tag token's tag name.
- EOF
- Parse error. Reconsume the EOF character in the data state.
- Anything else
- Append the current input character to the current tag token's tag name.
Source: http://www.whatwg.org/specs/web-apps/current-work/
<meta>
on the byte level
Content-Type: text/html; charset=utf-8
<meta charset=utf-8>
<?import >
foo=`bar`
10.2.4.10 Tag name state
Consume the next input character:
- U+0009 CHARACTER TABULATION
- U+000A LINE FEED (LF)
- U+000C FORM FEED (FF)
- U+0020 SPACE
- Switch to the before attribute name state.
- U+002F SOLIDUS (/)
- Switch to the self-closing start tag state.
- U+003E GREATER-THAN SIGN (>)
- Switch to the data state. Emit the current tag token.
- U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z
- Append the lowercase version of the current input character (add 0x0020 to the character's code point) to the current tag token's tag name.
- U+0000 NULL
- Parse error. Append a U+FFFD REPLACEMENT CHARACTER character to the current tag token's tag name.
- EOF
- Parse error. Reconsume the EOF character in the data state.
- Anything else
- Append the current input character to the current tag token's tag name.
Source: http://www.whatwg.org/specs/web-apps/current-work/
<b><i></b></i>
11.2.5.4.4 The "in head" insertion mode
When the user agent is to apply the rules for the "in head" insertion mode, the user agent must handle the token as follows:
- A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
Insert the character into the current node.
- A comment token
Append a
Comment
node to the current node with thedata
attribute set to the data given in the comment token.- A DOCTYPE token
Parse error. Ignore the token.
- A start tag whose tag name is "html"
Process the token using the rules for the "in body" insertion mode.
- A start tag whose tag name is one of: "base", "basefont", "bgsound", "command", "link"
Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
- …
- …
- Anything else
Act as if an end tag token with the tag name "head" had been seen, and reprocess the current token.
Source: http://www.whatwg.org/specs/web-apps/current-work/
Beware of WebKit monoculture on mobile
<foo<bar>
<!--
…EOF
<title>
…EOF
<script src='foo.js' />
…EOF
Python Ruby Java JavaScript
“Now you have two problems”
script
and style
content
Here be product-specific stuff
<script src=foo.js></script> <img src=photo.jpg> <script src=bar.js></script>
document.write()
document.write()
↰
document.write("<script src=foo.js></script>");
document.write("<script src=bar.js></script>");
document.write()
parsing
document.write()
document.write()
document.write()
Tail Prescandocument.write("<script src=a.js></script>" + "<script src=b.js></script>"); document.write("<script src=c.js></script>");
The solution for quadratic equations is .
Warning: Remember that ± means that there are two solutions!
<p>The solution for quadratic equations is <math> <!-- ... --> <mfrac> <mrow> <mo>−</mo> <!-- ... --> </math>.</p> <p><svg viewBox='5 9 90 86'> <path d='M 10,90 L 90,90 L 50,14 Z'/> <line x1=50 x2=50 y1=45 y2=75 /> </svg><b>Warning:</b> Remember that ± means that there are two solutions!</p>
Just Works
(ignoring degradation in old browsers)
<svg>
…</svg>
becomes SVG<math>
…</math>
becomes MathMLforeignObject
, annotation-xml
, etc.xmlns
has absolutely no effectxlink:href
<SVG VIEWBOX='0 0 10 10'>
workssvg:
or math:
not supported<foo/>
is empty
<![CDATA[
…]]>
works
script
and style
tokenized as in XML
<cirle fill=red/>
<cirle fill=green />
<cirle fill="green"/>
<foo/>
in legacy browsers
<foo/>
<foo></foo>
instead
<text>Text to show</text>
<text><![CDATA[Text to hide]]></text>