Always Use UTF-8 & Always Label Your HTML Saying So

To avoid having to deal with escapes (other than for <, >, &, and "), to avoid data loss in form submission, to avoid XSS when serving user-provided content, and to comply with the HTML Standard, always encode your HTML as UTF-8. Furthermore, in order to let browsers know that the document is UTF-8-encoded, always label it as such. To label your document, you need to do at least one of the following:

Doing more than one of these is OK.

Answers to Questions

The above says the important bit. Here are answers to further questions:

Why Do I Need to Label UTF-8 in HTML?

Because HTML didn’t support UTF-8 in the very beginning and legacy content can’t be expected to opt out, you need to opt into UTF-8 just like you need to opt into the standards mode (via <!DOCTYPE html>) and to mobile-friedly layout (via <meta name="viewport" content="width=device-width, initial-scale=1">). (Longer answer)

Which Method Should I Choose?

<meta charset="utf-8"> has the benefit of keeping the label within your document even if you move it around. The main risk is that someone forgets that it needs to be within the first 1024 bytes and puts comments, Facebook metadata, rel=preloads, stylesheets or scripts before it. Always put that other stuff after it.

The HTTP header has the benefit that if you are setting up a new server that doesn’t have any old non-UTF-8 documents on it, you can configure the header once, and it works for all HTML documents on the server thereafter.

The BOM method has the problem that it’s too easy to edit the file in a text editor that removes the BOM and not notice that this has happened. However, if you are writing a serializer library and you are neither in control of the HTTP header nor can inject a tag without interfering with what your users are doing, you can make the serializer always start with the UTF-8 BOM and know that things will be OK.

Can I Use UTF-16 Instead?

Don’t. If you serve user-provided content as UTF-16, it is possible to smuggle content that becomes executable when interpreted as other encodings. This is a cross-site scripting vulnerability if the user uses a browser that allows the user to manually override UTF-16 with another encoding.

UTF-16 cannot be labeled via <meta charset>.

What about Plain Text?

The <meta charset="utf-8"> method is not available for plain text, but the other two are. In the case of plain text, the HTTP header is obviously Content-Type: text/plain; charset=utf-8 instead.

Why Does Unlabeled UTF-8 Plain Text or HTML Work in XHR/Fetch?

XMLHttpRequest and Fetch post-date UTF-8, so they had a chance to introduce new rules. While the rules being inconsistent with navigating to HTML or plain text is not great, defaulting to UTF-8 is a simple rule that avoids issues related to reloading content in ways that would be consistent with navigation.

What about JavaScript?

If you’ve labeled your HTML as UTF-8, you don’t need to label your UTF-8-encoded JavaScript files, since by default they inherit the encoding from the document that includes them. However, to make your JavaScript robust when referenced form non-UTF-8 HTML you can use the UTF-8 BOM or the HTTP header, which is Content-Type: application/javascript; charset=utf-8 in the JavaScript case.

What about CSS?

If you’ve labeled your HTML as UTF-8, you don’t need to label your UTF-8-encoded CSS files, since by default they inherit the encoding from the document that includes them. However, to make your CSS robust when referenced form non-UTF-8 HTML you can use the UTF-8 BOM or the HTTP header, which is Content-Type: text/css; charset=utf-8 in the CSS case, or you can put @charset "utf-8"; as the very first thing in the CSS file.

What about XML (Including SVG)?

Unlabeled XML defaults to UTF-8, so you don’t need to label it.

What about JSON?

JSON must be UTF-8 and is processed as UTF-8, so there’s no labeling.

What about WebVTT?

WebVTT is always UTF-8, so there’s no labeling.