Speculative HTML5 Parsing Landed

As mentioned earlier, there is an ongoing project for replacing Gecko’s old HTML parser with an HTML5 parser. Today, a significant milestone landed: off-the-main-thread speculative HTML5 parsing.

This means that the HTML source arriving from the network is not parsed on the main thread. (Browsers have traditionally been single-threaded.) Also, when the main thread is waiting for a script to load or execute, the rest of the HTML file is parsed ahead speculatively. This doesn’t mean merely scanning the rest of the file for URLs. It means running the HTML5 tokenization and tree building algorithm speculatively.

Bad use of document.write can cause speculation to fail and parsing work to be wasted. There is preliminary documentation for avoiding speculation failures.

The HTML5 parser continues to be turned off by default, so this landing shouldn’t disrupt your browsing with nightlies if you haven’t opted in to HTML5 parsing.

How to Try It?

First, this isn’t release-quality software. Testing the HTML5 parser carries all the same risks as testing a nightly build in general, and then some. It may crash, it may corrupt your Firefox profile, etc. If you aren’t comfortable with taking the risks associated with running nighly builds, you shouldn’t try the HTML5 parser.

If you are still comfortable with testing, download a trunk nightly build tomorrow, run it, navigate to about:config and flip the preference named html5.enable to true. This makes Gecko use the HTML5 parser when loading pages into the content area and when setting innerHTML. The HTML5 parser is not used for HTML embedded in feeds, Netscape bookmark import, View Source, etc., yet.

The html5.enable preference doesn’t require a restart to take effect. It takes effect the next time you load a page.

There is also another preference called html5.offmainthread that defaults to true. If you suspect a thread collaboration bug, you can try flipping the pref to false to make all parts of the HTML5 parser run on the main thread.

Known Problems

First and foremost, please refer to the list of known bugs. In particular, please be aware that there’s a known crash for which the fix hasn’t landed yet: If document.write writes an external script followed by an unbalanced start tag and the script ends without writing a corresponding end tag, the browser crashes.

Note that the speculative parsing landing does not fix the known Web compatibility bugs that have already been reported. The landing consists of changes to the way the parser integrates into Gecko.

What’s the Performance Impact?

Talos does not run to completion with the HTML5 parser enabled, so the impact is so far unknown. It is known that perceived performance is bad and will get better.

Reporting Bugs

Please file bugs in the “Core” product under “HTML: Parser” component with “[HTML5] ” at the start of the summary.