fi.iki.hsivonen.xml
Class HtmlSerializer

java.lang.Object
  extended by fi.iki.hsivonen.xml.HtmlSerializer
All Implemented Interfaces:
ContentHandler

public class HtmlSerializer
extends Object
implements ContentHandler

Serializes a sequence of SAX events representing an XHTML 1.0 Strict document to an OutputStream as a UTF-8-encoded HTML 4.01 Strict document. The SAX events must represent a valid XHTML 1.0 document, except the namespace prefixes don't matter and there may be startElement and endElement calls for elements from other namespaces. The startElement and endElement calls for non-XHTML elements are ignored. No validity checking is performed. Hence, the emitter of the SAX events is responsible for making sure the events represent a document that meets the above requirements. The OutputStream is closed when the end of the document is seen.

Version:
$Id: HtmlSerializer.java,v 1.18 2006/10/30 20:03:10 hsivonen Exp $
Author:
hsivonen, taavi

Field Summary
private static String[] booleanAttributes
          Minimized "boolean" HTML attributes
private  int doctype
           
static int DOCTYPE_HTML401_STRICT
           
static int DOCTYPE_HTML401_TRANSITIONAL
           
static int DOCTYPE_HTML5
           
private  boolean emitMeta
           
private static String[] emptyElements
          HTML 4.01 elements which don't have an end tag
private  String encoding
           
static int NO_DOCTYPE
           
protected  Writer writer
          The writer used for output
private static String XHTML_NS
          The XHTML namespace URI
 
Constructor Summary
HtmlSerializer(OutputStream out)
          Creates a new instance of HtmlSerializer in the HTML 4.01 doctype mode with the UTF-8 encoding and no charset meta.
HtmlSerializer(OutputStream out, int doctype, boolean emitMeta)
           
HtmlSerializer(OutputStream out, int doctype, boolean emitMeta, String enc)
           
 
Method Summary
 void characters(char[] ch, int start, int length)
          Writes out characters.
 void endDocument()
          Must be called in the end.
 void endElement(String namespaceURI, String localName, String qName)
          Writes an end tag if the element is an XHTML element and is not an empty element in HTML 4.01 Strict.
 void endPrefixMapping(String str)
          Does nothing.
 void ignorableWhitespace(char[] values, int param, int param2)
          Does nothing.
static void main(String[] args)
          Used for testing.
 void processingInstruction(String str, String str1)
          Does nothing.
 void setDocumentLocator(Locator locator)
          Does nothing.
 void skippedEntity(String str)
          Does nothing.
 void startDocument()
          Must be called first.
 void startElement(String namespaceURI, String localName, String qName, Attributes atts)
          Writes a start tag if the element is an XHTML element.
 void startPrefixMapping(String str, String str1)
          Does nothing.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NO_DOCTYPE

public static final int NO_DOCTYPE
See Also:
Constant Field Values

DOCTYPE_HTML401_TRANSITIONAL

public static final int DOCTYPE_HTML401_TRANSITIONAL
See Also:
Constant Field Values

DOCTYPE_HTML401_STRICT

public static final int DOCTYPE_HTML401_STRICT
See Also:
Constant Field Values

DOCTYPE_HTML5

public static final int DOCTYPE_HTML5
See Also:
Constant Field Values

XHTML_NS

private static final String XHTML_NS
The XHTML namespace URI

See Also:
Constant Field Values

emptyElements

private static final String[] emptyElements
HTML 4.01 elements which don't have an end tag


booleanAttributes

private static final String[] booleanAttributes
Minimized "boolean" HTML attributes


writer

protected Writer writer
The writer used for output


doctype

private int doctype

encoding

private String encoding

emitMeta

private boolean emitMeta
Constructor Detail

HtmlSerializer

public HtmlSerializer(OutputStream out)
Creates a new instance of HtmlSerializer in the HTML 4.01 doctype mode with the UTF-8 encoding and no charset meta.

Parameters:
out - the stream to which the output is written

HtmlSerializer

public HtmlSerializer(OutputStream out,
                      int doctype,
                      boolean emitMeta)

HtmlSerializer

public HtmlSerializer(OutputStream out,
                      int doctype,
                      boolean emitMeta,
                      String enc)
Method Detail

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws SAXException
Writes out characters.

Specified by:
characters in interface ContentHandler
Parameters:
ch - the source array
start - the index of the first character to be written
length - the number of characters to write
Throws:
SAXException - if there are IO problems

endDocument

public void endDocument()
                 throws SAXException
Must be called in the end.

Specified by:
endDocument in interface ContentHandler
Throws:
SAXException - if there are IO problems

endElement

public void endElement(String namespaceURI,
                       String localName,
                       String qName)
                throws SAXException
Writes an end tag if the element is an XHTML element and is not an empty element in HTML 4.01 Strict.

Specified by:
endElement in interface ContentHandler
Parameters:
namespaceURI - the XML namespace
localName - the element name in the namespace
qName - ignored
Throws:
SAXException - if there are IO problems

startDocument

public void startDocument()
                   throws SAXException
Must be called first.

Specified by:
startDocument in interface ContentHandler
Throws:
SAXException

startElement

public void startElement(String namespaceURI,
                         String localName,
                         String qName,
                         Attributes atts)
                  throws SAXException
Writes a start tag if the element is an XHTML element.

Specified by:
startElement in interface ContentHandler
Parameters:
namespaceURI - the XML namespace
localName - the element name in the namespace
qName - ignored
atts - the attribute list
Throws:
SAXException - if there are IO problems

main

public static void main(String[] args)
Used for testing. Pass a file:// URL as the command line argument.


endPrefixMapping

public void endPrefixMapping(String str)
                      throws SAXException
Does nothing.

Specified by:
endPrefixMapping in interface ContentHandler
Throws:
SAXException

ignorableWhitespace

public void ignorableWhitespace(char[] values,
                                int param,
                                int param2)
                         throws SAXException
Does nothing.

Specified by:
ignorableWhitespace in interface ContentHandler
Throws:
SAXException

processingInstruction

public void processingInstruction(String str,
                                  String str1)
                           throws SAXException
Does nothing.

Specified by:
processingInstruction in interface ContentHandler
Throws:
SAXException

setDocumentLocator

public void setDocumentLocator(Locator locator)
Does nothing.

Specified by:
setDocumentLocator in interface ContentHandler

skippedEntity

public void skippedEntity(String str)
                   throws SAXException
Does nothing.

Specified by:
skippedEntity in interface ContentHandler
Throws:
SAXException

startPrefixMapping

public void startPrefixMapping(String str,
                               String str1)
                        throws SAXException
Does nothing.

Specified by:
startPrefixMapping in interface ContentHandler
Throws:
SAXException