/* Copyright 2002-2006 Elliotte Rusty Harold
   
   This library is free software; you can redistribute it and/or modify
   it under the terms of version 2.1 of the GNU Lesser General Public 
   License as published by the Free Software Foundation.
   
   This library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
   GNU Lesser General Public License for more details.
   
   You should have received a copy of the GNU Lesser General Public
   License along with this library; if not, write to the 
   Free Software Foundation, Inc., 59 Temple Place, Suite 330, 
   Boston, MA 02111-1307  USA
   
   You can contact Elliotte Rusty Harold by sending e-mail to
   elharo@ibiblio.org. Please include the word "XOM" in the
   subject line. The XOM home page is located at http://www.xom.nu/
*/

package nu.xom;

import java.io.IOException;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.UnsupportedEncodingException;
import java.io.Writer;
import java.util.Locale;
import org.xml.sax.helpers.NamespaceSupport;

Outputs a Document object in a specific encoding using various options for controlling white space, normalization, indenting, line breaking, and base URIs. However, in general these options do affect the document's infoset. In particular, if you set either the maximum line length or the indent size to a positive value, then the serializer will not respect input white space. It may trim leading and trailing space, condense runs of white space to a single space, convert carriage returns and linefeeds to spaces, add extra space where none was present before, and otherwise muck with the document's white space. The defaults, however, preserve all significant white space including ignorable white space and boundary white space.

Author:Elliotte Rusty Harold
Version:1.2d1
/** * <p> * Outputs a <code>Document</code> object in a specific encoding using * various options for controlling white space, normalization, * indenting, line breaking, and base URIs. However, in general these * options do affect the document's infoset. In particular, if you set * either the maximum line length or the indent size to a positive * value, then the serializer will not respect input white space. It * may trim leading and trailing space, condense runs of white * space to a single space, convert carriage returns and linefeeds * to spaces, add extra space where none was present before, * and otherwise muck with the document's white space. * The defaults, however, preserve all significant white space * including ignorable white space and boundary white space. * </p> * * @author Elliotte Rusty Harold * @version 1.2d1 * */
public class Serializer { private TextWriter escaper; private boolean preserveBaseURI = false; // ???? reset when exception is thrown? private NamespaceSupport namespaces = new NamespaceSupport();

Create a new serializer that uses the UTF-8 encoding.

Params:
  • out – the output stream to write the document on
Throws:
/** * <p> * Create a new serializer that uses the UTF-8 encoding. * </p> * * @param out the output stream to write the document on * * @throws NullPointerException if <code>out</code> is null */
public Serializer(OutputStream out) { try { this.setOutputStream(out, "UTF-8"); } catch (UnsupportedEncodingException ex) { throw new RuntimeException( "The VM is broken. It does not understand UTF-8."); } }

Create a new serializer that uses the specified encoding. The encoding must be recognized by the Java virtual machine. If you attempt to use an encoding that the local Java virtual machine does not support, the constructor will throw an UnsupportedEncodingException. Currently the following encodings are recognized by XOM:

  • UTF-8
  • UTF-16
  • UTF-16BE
  • UTF-16LE
  • ISO-10646-UCS-2
  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-3
  • ISO-8859-4
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • ISO-8859-10
  • ISO-8859-11 (a.k.a. TIS-620)
  • ISO-8859-13
  • ISO-8859-14
  • ISO-8859-15
  • ISO-8859-16
  • IBM037 (a.k.a. CP037, EBCDIC-CP-US, EBCDIC-CP-CA, EBCDIC-CP-WA, EBCDIC-CP-NL, and CSIBM037)
  • GB18030

You can use encodings not in this list if the virtual machine supports them. However, they may be significantly slower than the encodings in this list.

I've noticed Java has significant bugs in its handling of some of these encodings. In some cases such as 0x80 in Big5, XOM will escape a character that should not need to be escaped because Java can't output that character in the specified encoding, even though the output character set does contain it. :-(

Params:
  • out – the output stream to write the document on
  • encoding – the character encoding for the serialization
Throws:
/** * <p> * Create a new serializer that uses the specified encoding. * The encoding must be recognized by the Java virtual machine. If * you attempt to use an encoding that the local Java virtual * machine does not support, the constructor will throw an * <code>UnsupportedEncodingException</code>. * Currently the following encodings are recognized by XOM: * </p> * * <ul> * <li>UTF-8</li> * <li>UTF-16</li> * <li>UTF-16BE</li> * <li>UTF-16LE</li> * <li>ISO-10646-UCS-2</li> * <li>ISO-8859-1</li> * <li>ISO-8859-2</li> * <li>ISO-8859-3</li> * <li>ISO-8859-4</li> * <li>ISO-8859-5</li> * <li>ISO-8859-6</li> * <li>ISO-8859-7</li> * <li>ISO-8859-8</li> * <li>ISO-8859-9</li> * <li>ISO-8859-10</li> * <li>ISO-8859-11 (a.k.a. TIS-620)</li> * <li>ISO-8859-13</li> * <li>ISO-8859-14</li> * <li>ISO-8859-15</li> * <li>ISO-8859-16</li> * <li>IBM037 (a.k.a. CP037, EBCDIC-CP-US, EBCDIC-CP-CA, * EBCDIC-CP-WA, EBCDIC-CP-NL, and CSIBM037)</li> * <li>GB18030</li> * </ul> * * <p> * You can use encodings not in this list if the virtual * machine supports them. However, they may be * significantly slower than the encodings in this list. * </p> * * <p> * I've noticed Java has significant bugs in its handling of some * of these encodings. In some cases such as 0x80 in Big5, XOM * will escape a character that should not need to be escaped * because Java can't output that character in the specified * encoding, even though the output character set does contain it. * :-( * </p> * * @param out the output stream to write the document on * @param encoding the character encoding for the serialization * @throws NullPointerException if <code>out</code> * or <code>encoding</code> is null * @throws UnsupportedEncodingException if the VM does not * support the requested encoding * */
public Serializer(OutputStream out, String encoding) throws UnsupportedEncodingException { if (encoding == null) { throw new NullPointerException("Null encoding"); } this.setOutputStream(out, encoding); }

Flushes the previous output stream and redirects further output to the new output stream.

Params:
  • out – the output stream to write the document on
Throws:
/** * <p> * Flushes the previous output stream and * redirects further output to the new output stream. * </p> * * * @param out the output stream to write the document on * @throws NullPointerException if <code>out</code> is null * @throws IOException if the previous output stream * encounters an I/O error when flushed * */
public void setOutputStream(OutputStream out) throws IOException { // flush any data onto the old output stream this.flush(); int maxLength = getMaxLength(); int indent = this.getIndent(); String lineSeparator = getLineSeparator(); boolean nfc = getUnicodeNormalizationFormC(); String encoding = escaper.getEncoding(); boolean lineSeparatorSet = escaper.lineSeparatorSet; setOutputStream(out, encoding); setIndent(indent); setMaxLength(maxLength); setUnicodeNormalizationFormC(nfc); if (lineSeparatorSet) setLineSeparator(lineSeparator); } private void setOutputStream(OutputStream out, String encoding) throws UnsupportedEncodingException { if (out == null) { throw new NullPointerException("Null OutputStream"); } Writer writer; String encodingUpperCase = encoding.toUpperCase(Locale.ENGLISH); if (encodingUpperCase.equals("UTF-8")) { writer = new OutputStreamWriter(out, "UTF-8"); } else if (encodingUpperCase.equals("UTF-16") || encodingUpperCase.equals("ISO-10646-UCS-2")) { // For compatibility with Java 1.2 and earlier writer = new OutputStreamWriter(out, "UnicodeBig"); } // Java's Cp037 encoding is broken, so we have to // provide our own. else if (encodingUpperCase.equals("IBM037") || encodingUpperCase.equals("CP037") || encodingUpperCase.equals("EBCDIC-CP-US") || encodingUpperCase.equals("EBCDIC-CP-CA") || encodingUpperCase.equals("EBCDIC-CP-WA") || encodingUpperCase.equals("EBCDIC-CP-NL") || encodingUpperCase.equals("CSIBM037")) { writer = new EBCDICWriter(out); } else if (encodingUpperCase.equals("ISO-8859-11") || encodingUpperCase.equals("TIS-620")) { // Java doesn't recognize the name ISO-8859-11 and // Java 1.3 and earlier don't recognize TIS-620 writer = new OutputStreamWriter(out, "TIS620"); } else writer = new OutputStreamWriter(out, encoding); writer = new UnsynchronizedBufferedWriter(writer); this.escaper = TextWriterFactory.getTextWriter(writer, encoding); }

Serializes a document onto the output stream using the current options.

Params:
  • doc – the Document to serialize
Throws:
/** * <p> * Serializes a document onto the output * stream using the current options. * </p> * * @param doc the <code>Document</code> to serialize * * @throws IOException if the underlying output stream * encounters an I/O error * @throws NullPointerException if <code>doc</code> is null * @throws UnavailableCharacterException if the document contains * an unescapable character (e.g. in an element name) that is * not available in the current encoding */
public void write(Document doc) throws IOException { escaper.reset(); namespaces.reset(); namespaces.declarePrefix("", ""); // The OutputStreamWriter automatically inserts // the byte order mark if necessary. writeXMLDeclaration(); int childCount = doc.getChildCount(); for (int i = 0; i < childCount; i++) { writeChild(doc.getChild(i)); // Might want to remove this line break in a // non-XML serializer where it's not guaranteed to be // OK to add extra line breaks in the prolog escaper.breakLine(); } escaper.flush(); }

Writes the XML declaration onto the output stream, followed by a line break.

Throws:
  • IOException – if the underlying output stream encounters an I/O error
/** * <p> * Writes the XML declaration onto the output stream, * followed by a line break. * </p> * * @throws IOException if the underlying output stream * encounters an I/O error */
protected void writeXMLDeclaration() throws IOException { escaper.writeUncheckedMarkup("<?xml version=\"1.0\" encoding=\""); escaper.writeUncheckedMarkup(escaper.getEncoding()); escaper.writeUncheckedMarkup("\"?>"); escaper.breakLine(); }

Serializes an element onto the output stream using the current options. The result is guaranteed to be well-formed.

If the element is empty, this method invokes writeEmptyElementTag. If the element is not empty, then:

  1. It calls writeStartTag.
  2. It passes each of the element's children to writeChild in order.
  3. It calls writeEndTag.

It may break lines or add white space if the serializer has been configured to indent or use a maximum line length.

Params:
  • element – the Element to serialize
Throws:
/** * <p> * Serializes an element onto the output stream using the current * options. The result is guaranteed to be well-formed. * </p> * * <p> * If the element is empty, this method invokes * <code>writeEmptyElementTag</code>. If the element is not * empty, then: * </p> * * <ol> * <li>It calls <code>writeStartTag</code>.</li> * <li>It passes each of the element's children to * <code>writeChild</code> in order.</li> * <li>It calls <code>writeEndTag</code>.</li> * </ol> * * <p> * It may break lines or add white space if the serializer has * been configured to indent or use a maximum line length. * </p> * * @param element the <code>Element</code> to serialize * * @throws IOException if the underlying output stream * encounters an I/O error * @throws UnavailableCharacterException if the element name * contains a character that is not available in the * current encoding */
protected void write(Element element) throws IOException { // workaround for case where only children are empty text nodes boolean hasRealChildren = false; int childCount = element.getChildCount(); for (int i = 0; i < childCount; i++) { Node child = element.getChild(i); if (child.isText()) { Text t = (Text) child; if (t.isEmpty()) continue; } hasRealChildren = true; break; } if (hasRealChildren) { boolean wasPreservingWhiteSpace = escaper.isPreserveSpace(); writeStartTag(element); // children for (int i = 0; i < childCount; i++) { Node child = element.getChild(i); // need to work around a very tricky case here where // denormalized characters cross boundaries of // consecutive text nodes if (escaper.getNFC() && child.isText()) { Text t = (Text) child; while (i < childCount-1) { // not the last node Node next = element.getChild(i+1); if (next.isText()) { t = new Text(t.getValue() + next.getValue()); i++; } else break; } writeChild(t); } else { writeChild(child); } } writeEndTag(element); // restore parent value escaper.setPreserveSpace(wasPreservingWhiteSpace); } else { writeEmptyElementTag(element); } } private boolean hasNonTextChildren(Element element) { int childCount = element.getChildCount(); for (int i = 0; i < childCount; i++) { if (! element.getChild(i).isText()) return true; } return false; } // writeEndTag should not normally throw UnavailableCharacterException // because that would already have been thrown for the // corresponding start-tag.

Writes the end-tag for an element in the form </name>.

Params:
  • element – the element whose end-tag is written
Throws:
  • IOException – if the underlying output stream encounters an I/O error
/** * <p> * Writes the end-tag for an element in the form * <code>&lt;/<i>name</i>&gt;</code>. * </p> * * @param element the element whose end-tag is written * * @throws IOException if the underlying output stream * encounters an I/O error */
protected void writeEndTag(Element element) throws IOException { escaper.decrementIndent(); if (escaper.getIndent() > 0 && !escaper.isPreserveSpace()) { if (hasNonTextChildren(element)) { escaper.breakLine(); } } escaper.write('<'); escaper.write('/'); escaper.writeName(element.getQualifiedName()); escaper.write('>'); namespaces.popContext(); }

Writes the start-tag for the element including all its namespace declarations and attributes.

The writeAttributes method is called to write all the non-namespace-declaration attributes. The writeNamespaceDeclarations method is called to write all the namespace declaration attributes.

Params:
  • element – the element whose start-tag is written
Throws:
  • IOException – if the underlying output stream encounters an I/O error
  • UnavailableCharacterException – if the name of the element or the name of any of its attributes contains a character that is not available in the current encoding
/** * * <p> * Writes the start-tag for the element including * all its namespace declarations and attributes. * </p> * * <p> * The <code>writeAttributes</code> method is called to write * all the non-namespace-declaration attributes. * The <code>writeNamespaceDeclarations</code> method * is called to write all the namespace declaration attributes. * </p> * * @param element the element whose start-tag is written * * @throws IOException if the underlying output stream * encounters an I/O error * @throws UnavailableCharacterException if the name of the element * or the name of any of its attributes contains a character * that is not available in the current encoding */
protected void writeStartTag(Element element) throws IOException { writeTagBeginning(element); escaper.write('>'); escaper.incrementIndent(); String xmlSpaceValue = element.getAttributeValue( "space", "http://www.w3.org/XML/1998/namespace"); if (xmlSpaceValue != null) { if ("preserve".equals(xmlSpaceValue)){ escaper.setPreserveSpace(true); } else if ("default".equals(xmlSpaceValue)){ escaper.setPreserveSpace(false); } } }

Writes an empty-element tag for the element including all its namespace declarations and attributes.

The writeAttributes method is called to write all the non-namespace-declaration attributes. The writeNamespaceDeclarations method is called to write all the namespace declaration attributes.

If subclasses don't wish empty-element tags to be used, they can override this method to simply invoke writeStartTag followed by writeEndTag.

Params:
  • element – the element whose empty-element tag is written
Throws:
  • IOException – if the underlying output stream encounters an I/O error
  • UnavailableCharacterException – if the name of the element or the name of any of its attributes contains a character that is not available in the current encoding
/** * * <p> * Writes an empty-element tag for the element * including all its namespace declarations and attributes. * </p> * * <p> * The <code>writeAttributes</code> method is called to write * all the non-namespace-declaration attributes. * The <code>writeNamespaceDeclarations</code> method * is called to write all the namespace declaration attributes. * </p> * * <p> * If subclasses don't wish empty-element tags to be used, * they can override this method to simply invoke * <code>writeStartTag</code> followed by * <code>writeEndTag</code>. * </p> * * @param element the element whose empty-element tag is written * * @throws IOException if the underlying output stream * encounters an I/O error * @throws UnavailableCharacterException if the name of the element or the name of * any of its attributes contains a character that is not * available in the current encoding */
protected void writeEmptyElementTag(Element element) throws IOException { writeTagBeginning(element); escaper.write('/'); escaper.write('>'); namespaces.popContext(); } // This just extracts the commonality between writeStartTag // and writeEmptyElementTag private void writeTagBeginning(Element element) throws IOException { namespaces.pushContext(); if (escaper.isIndenting() && !escaper.isPreserveSpace() && !escaper.justBroke()) { escaper.breakLine(); } escaper.write('<'); escaper.writeName(element.getQualifiedName()); writeAttributes(element); writeNamespaceDeclarations(element); }

Writes all the attributes of the specified element onto the output stream, one at a time, separated by white space. If preserveBaseURI is true, and it is necessary to add an xml:base attribute to the element in order to preserve the base URI, then that attribute is also written here. Each individual attribute is written by invoking write(Attribute).

Params:
  • element – the Element whose attributes are written
Throws:
  • IOException – if the underlying output stream encounters an I/O error
  • UnavailableCharacterException – if the name of any of the element's attributes contains a character that is not available in the current encoding
/** * <p> * Writes all the attributes of the specified * element onto the output stream, one at a time, separated * by white space. If preserveBaseURI is true, and it is * necessary to add an <code>xml:base</code> attribute * to the element in order to preserve the base URI, then * that attribute is also written here. * Each individual attribute is written by invoking * <code>write(Attribute)</code>. * </p> * * @param element the <code>Element</code> whose attributes are * written * @throws IOException if the underlying output stream * encounters an I/O error * @throws UnavailableCharacterException if the name of any of * the element's attributes contains a character that is not * available in the current encoding */
protected void writeAttributes(Element element) throws IOException { // check to see if we need an xml:base attribute if (preserveBaseURI) { ParentNode parent = element.getParent(); if (element.getAttribute("base", "http://www.w3.org/XML/1998/namespace") == null) { String baseValue = element.getBaseURI(); if (parent == null || parent.isDocument() || !element.getBaseURI() .equals(parent.getBaseURI())) { escaper.write(' '); Attribute baseAttribute = new Attribute( "xml:base", "http://www.w3.org/XML/1998/namespace", baseValue); write(baseAttribute); } } } int attributeCount = element.getAttributeCount(); for (int i = 0; i < attributeCount; i++) { Attribute attribute = element.getAttribute(i); escaper.write(' '); write(attribute); } }

Writes all the namespace declaration attributes of the specified element onto the output stream, one at a time, separated by white space. Each individual declaration is written by invoking writeNamespaceDeclaration.

Params:
  • element – the Element whose namespace declarations are written
Throws:
  • IOException – if the underlying output stream encounters an I/O error
  • UnavailableCharacterException – if any of the element's namespace prefixes contains a character that is not available in the current encoding
/** * <p> * Writes all the namespace declaration * attributes of the specified element onto the output stream, * one at a time, separated by white space. Each individual * declaration is written by invoking * <code>writeNamespaceDeclaration</code>. * </p> * * @param element the <code>Element</code> whose namespace * declarations are written * @throws IOException if the underlying output stream * encounters an I/O error * @throws UnavailableCharacterException if any of the element's * namespace prefixes contains a character that is not * available in the current encoding */
protected void writeNamespaceDeclarations(Element element) throws IOException { String prefix = element.getNamespacePrefix(); if (!("xml".equals(prefix))) { writeNamespaceDeclarationIfNecessary(prefix, element.getNamespaceURI()); } // write attribute namespaces int attCount = element.getAttributeCount(); for (int i = 0; i < attCount; i++) { Attribute att = element.getAttribute(i); String attPrefix = att.getNamespacePrefix(); if (attPrefix.length() != 0 && !("xml".equals(attPrefix))) { writeNamespaceDeclarationIfNecessary(attPrefix, att.getNamespaceURI()); } } // write additional namespaces Namespaces namespaces = element.namespaces; if (namespaces == null) return; int namespaceCount = namespaces.size(); for (int i = 0; i < namespaceCount; i++) { String additionalPrefix = namespaces.getPrefix(i); String uri = namespaces.getURI(additionalPrefix); writeNamespaceDeclarationIfNecessary(additionalPrefix, uri); } } private void writeNamespaceDeclarationIfNecessary(String prefix, String uri) throws IOException { String currentValue = namespaces.getURI(prefix); // NamespaceSupport returns null for no namespace, not the // empty string like XOM does if (currentValue == null && "".equals(uri)) { return; } else if (uri.equals(currentValue)) { return; } escaper.write(' '); writeNamespaceDeclaration(prefix, uri); }

Writes a namespace declaration in the form xmlns:prefix="uri" or xmlns="uri". It does not write the spaces on either side of the namespace declaration. These are written by writeNamespaceDeclarations.

Params:
  • prefix – the namespace prefix; the empty string for the default namespace
  • uri – the namespace URI
Throws:
/** * <p> * Writes a namespace declaration in the form * <code>xmlns:<i>prefix</i>="<i>uri</i>"</code> or * <code>xmlns="<i>uri</i>"</code>. It does not write * the spaces on either side of the namespace declaration. * These are written by <code>writeNamespaceDeclarations</code>. * </p> * * @param prefix the namespace prefix; the empty string for the * default namespace * @param uri the namespace URI * * @throws IOException if the underlying output stream * encounters an I/O error * @throws UnavailableCharacterException if the namespace prefix contains a * character that is not available in the current encoding */
protected void writeNamespaceDeclaration(String prefix, String uri) throws IOException { namespaces.declarePrefix(prefix, uri); if ("".equals(prefix)) { escaper.writeUncheckedMarkup("xmlns"); } else { escaper.writeUncheckedMarkup("xmlns:"); escaper.writeName(prefix); } escaper.write('='); escaper.write('"'); escaper.writePCDATA(uri); escaper.write('"'); }

Writes an attribute in the form name="value". Characters in the attribute value are escaped as necessary.

Params:
  • attribute – the Attribute to write
Throws:
/** * <p> * Writes an attribute in the form * <code><i>name</i>="<i>value</i>"</code>. * Characters in the attribute value are escaped as necessary. * </p> * * @param attribute the <code>Attribute</code> to write * * @throws IOException if the underlying output stream * encounters an I/O error * @throws UnavailableCharacterException if the attribute name contains a character * that is not available in the current encoding * */
protected void write(Attribute attribute) throws IOException { escaper.writeName(attribute.getQualifiedName()); escaper.write('='); escaper.write('"'); escaper.writeAttributeValue(attribute.getValue()); escaper.write('"'); }

Writes a comment onto the output stream using the current options. Since character and entity references are not resolved in comments, comments can only be serialized when all characters they contain are available in the current encoding.

Params:
  • comment – the Comment to serialize
Throws:
/** * <p> * Writes a comment onto the output stream using the current * options. Since character and entity references are not resolved * in comments, comments can only be serialized when all * characters they contain are available in the current * encoding. * </p> * * @param comment the <code>Comment</code> to serialize * * @throws IOException if the underlying output stream * encounters an I/O error * @throws UnavailableCharacterException if the comment contains a * character that is not available in the current encoding */
protected void write(Comment comment) throws IOException { if (escaper.isIndenting()) escaper.breakLine(); escaper.writeUncheckedMarkup("<!--"); escaper.writeMarkup(comment.getValue()); escaper.writeUncheckedMarkup("-->"); }

Writes a processing instruction onto the output stream using the current options. Since character and entity references are not resolved in processing instructions, processing instructions can only be serialized when all characters they contain are available in the current encoding.

Params:
  • instruction – the ProcessingInstruction to serialize
Throws:
/** * <p> * Writes a processing instruction * onto the output stream using the current options. * Since character and entity references are not resolved * in processing instructions, processing instructions * can only be serialized when all * characters they contain are available in the current * encoding. * </p> * * @param instruction the <code>ProcessingInstruction</code> * to serialize * * @throws IOException if the underlying output stream * encounters an I/O error * @throws UnavailableCharacterException if the comment contains a * character that is not available in the current encoding */
protected void write(ProcessingInstruction instruction) throws IOException { if (escaper.isIndenting()) escaper.breakLine(); escaper.writeUncheckedMarkup("<?"); escaper.writeName(instruction.getTarget()); String value = instruction.getValue(); // for canonical XML, only output a space after the target // if there is a value if (!"".equals(value)) { escaper.write(' '); escaper.writeMarkup(value); } escaper.writeUncheckedMarkup("?>"); }

Writes a Text object onto the output stream using the current options. Reserved characters such as <, > and " are escaped using the standard entity references such as &lt;, &gt;, and &quot;.

Characters which cannot be encoded in the current character set (for example, Ω in ISO-8859-1) are encoded using character references.

Params:
  • text – the Text to serialize
Throws:
  • IOException – if the underlying output stream encounters an I/O error
/** * <p> * Writes a <code>Text</code> object * onto the output stream using the current options. * Reserved characters such as &lt;, &gt; and " * are escaped using the standard entity references * such as <code>&amp;lt;</code>, <code>&amp;gt;</code>, * and <code>&amp;quot;</code>. * </p> * * <p> * Characters which cannot be encoded in the current character set * (for example, &Omega; in ISO-8859-1) are encoded using * character references. * </p> * * @param text the <code>Text</code> to serialize * * @throws IOException if the underlying output stream * encounters an I/O error */
protected void write(Text text) throws IOException { // XXX Is there a shortcut that takes advantage of the // data being stored in UTF-8 here? perhaps even if only // when serializing to UTF-8? String value = text.getValue(); if (text.isCDATASection() && value.indexOf("]]>") == -1) { if (!(escaper instanceof UnicodeWriter)) { int length = value.length(); for (int i = 0; i < length; i++) { if (escaper.needsEscaping(value.charAt(i))) { // can't use CDATA section escaper.writePCDATA(value); return; } } } escaper.writeUncheckedMarkup("<![CDATA["); escaper.writeMarkup(value); escaper.writeUncheckedMarkup("]]>"); } // is this boundary whitespace we can ignore? else if (isBoundaryWhitespace(text, value)) { return; // without writing node } else { escaper.writePCDATA(value); } } private boolean isBoundaryWhitespace(Text text, String value) { if (getIndent() <= 0) return false; ParentNode parent = text.getParent(); if (parent == null) { return "".equals(value.trim()); } // ???? cutting next line only breaks a few tests; and what it does // break might be better off if the breakage is accepted as correct behavior int childCount = parent.getChildCount(); if (childCount == 1) return false; if (! "".equals(value.trim())) return false; // ???? This is a huge Hotspot. maybe 12% of serialization time // when indenting. Is there any way to eliminate this? // We only actually need to test a couple of positions, 0 and // parent.getChildCount()-1 // Instead of getting position we could get those two elements and compare // to the text. But you still need the previous and next int position = parent.indexOf(text); Node previous = null; Node next = null; if (position != 0) previous = parent.getChild(position-1); if (position != childCount-1) { next = parent.getChild(position+1); } if (previous == null || !previous.isText()) { if (next == null || !next.isText()) { return true; } } return false; }

Writes a DocType object onto the output stream using the current options.

Params:
  • doctype – the document type declaration to serialize
Throws:
  • IOException – if the underlying output stream encounters an I/O error
  • UnavailableCharacterException – if the document type declaration contains a character that is not available in the current encoding
/** * <p> * Writes a <code>DocType</code> object * onto the output stream using the current options. * </p> * * @param doctype the document type declaration to serialize * * @throws IOException if the underlying * output stream encounters an I/O error * @throws UnavailableCharacterException if the document type * declaration contains a character that is not available * in the current encoding */
protected void write(DocType doctype) throws IOException { escaper.writeUncheckedMarkup("<!DOCTYPE "); escaper.writeName(doctype.getRootElementName()); if (doctype.getPublicID() != null) { escaper.writeMarkup(" PUBLIC \"" + doctype.getPublicID() + "\" \"" + doctype.getSystemID() + "\""); } else if (doctype.getSystemID() != null) { escaper.writeMarkup( " SYSTEM \"" + doctype.getSystemID() + "\""); } String internalDTDSubset = doctype.getInternalDTDSubset(); if (!internalDTDSubset.equals("")) { escaper.writeUncheckedMarkup(" ["); escaper.breakLine(); escaper.setInDocType(true); escaper.writeMarkup(internalDTDSubset); escaper.setInDocType(false); escaper.write(']'); } escaper.write('>'); }

Writes a child node onto the output stream using the current options. It is invoked when walking the tree to serialize the entire document. It is not called, and indeed should not be called, for either the Document node or for attributes.

Params:
  • node – the Node to serialize
Throws:
  • IOException – if the underlying output stream encounters an I/O error
  • XMLException – if an Attribute, a Document, or Namespace is passed to this method
/** * <p> * Writes a child node onto the output stream using the * current options. It is invoked when walking the tree to * serialize the entire document. It is not called, and indeed * should not be called, for either the <code>Document</code> * node or for attributes. * </p> * * @param node the <code>Node</code> to serialize * * @throws IOException if the underlying output stream * encounters an I/O error * @throws XMLException if an <code>Attribute</code>, a * <code>Document</code>, or <code>Namespace</code> * is passed to this method */
protected void writeChild(Node node) throws IOException { if (node.isElement()) { write((Element) node); } else if (node.isText()) { write((Text) node); } else if (node.isComment()) { write((Comment) node); } else if (node.isProcessingInstruction()) { write((ProcessingInstruction) node); } else if (node.isDocType()) { write((DocType) node); } else { throw new XMLException("Cannot write a " + node.getClass().getName() + " from the writeChild() method"); } }

Writes a string onto the underlying output stream. Non-ASCII characters that are not available in the current character set are encoded with numeric character references. The three reserved characters <, >, and & are escaped using the standard entity references &lt;, &gt;, and &amp;. Double and single quotes are not escaped.

Params:
  • text – the parsed character data to serialize
Throws:
  • IOException – if the underlying output stream encounters an I/O error
/** <p> * Writes a string onto the underlying output stream. * Non-ASCII characters that are not available in the * current character set are encoded with numeric character * references. The three reserved characters &lt;, &gt;, and &amp; * are escaped using the standard entity references * <code>&amp;lt;</code>, <code>&amp;gt;</code>, * and <code>&amp;amp;</code>. * Double and single quotes are not escaped. * </p> * * @param text the parsed character data to serialize * * @throws IOException if the underlying output stream * encounters an I/O error */
protected final void writeEscaped(String text) throws IOException { escaper.writePCDATA(text); }

Writes a string onto the underlying output stream. Non-ASCII characters that are not available in the current character set are escaped using hexadecimal numeric character references. Carriage returns, line feeds, and tabs are also escaped using hexadecimal numeric character references in order to ensure their preservation on a round trip. The four reserved characters <, >, &, and " are escaped using the standard entity references &lt;, &gt;, &amp;, and &quot;. The single quote is not escaped.

Params:
  • value – the attribute value to serialize
Throws:
  • IOException – if the underlying output stream encounters an I/O error
/** <p> * Writes a string onto the underlying output stream. * Non-ASCII characters that are not available in the * current character set are escaped using hexadecimal numeric * character references. Carriage returns, line feeds, and tabs * are also escaped using hexadecimal numeric character * references in order to ensure their preservation on a round * trip. The four reserved characters &lt;, &gt;, &amp;, * and &quot; are escaped using the standard entity references * <code>&amp;lt;</code>, <code>&amp;gt;</code>, * <code>&amp;amp;</code>, and <code>&amp;quot;</code>. * The single quote is not escaped. * </p> * * @param value the attribute value to serialize * * @throws IOException if the underlying output stream * encounters an I/O error */
protected final void writeAttributeValue(String value) throws IOException { escaper.writeAttributeValue(value); }

Writes a string onto the underlying output stream. without escaping any characters. Non-ASCII characters that are not available in the current character set cause an IOException.

Params:
  • text – the String to serialize
Throws:
  • IOException – if the underlying output stream encounters an I/O error or text contains characters not available in the current character set
/** <p> * Writes a string onto the underlying output stream. * without escaping any characters. * Non-ASCII characters that are not available in the * current character set cause an <code>IOException</code>. * </p> * * @param text the <code>String</code> to serialize * * @throws IOException if the underlying output stream * encounters an I/O error or <code>text</code> contains * characters not available in the current character set */
protected final void writeRaw(String text) throws IOException { escaper.writeMarkup(text); }

Writes the current line break string onto the underlying output stream and indents as specified by the current level and the indent property.

Throws:
  • IOException – if the underlying output stream encounters an I/O error
/** <p> * Writes the current line break string * onto the underlying output stream and indents * as specified by the current level and the indent property. * </p> * * @throws IOException if the underlying output stream * encounters an I/O error */
protected final void breakLine() throws IOException { escaper.breakLine(); }

Flushes the data onto the output stream. It is not enough to flush the output stream. You must flush the serializer object itself because it uses some internal buffering. The serializer will flush the underlying output stream.

Throws:
  • IOException – if the underlying output stream encounters an I/O error
/** * <p> * Flushes the data onto the output stream. * It is not enough to flush the output stream. * You must flush the serializer object itself because it * uses some internal buffering. * The serializer will flush the underlying output stream. * </p> * * @throws IOException if the underlying * output stream encounters an I/O error */
public void flush() throws IOException { escaper.flush(); }

Returns the number of spaces this serializer indents.

Returns:the number of spaces this serializer indents each successive level beyond the previous one
/** * <p> * Returns the number of spaces this serializer indents. * </p> * * @return the number of spaces this serializer indents * each successive level beyond the previous one */
public int getIndent() { return escaper.getIndent(); }

Sets the number of additional spaces to add to each successive level in the hierarchy. Use 0 for no extra indenting. The maximum indentation is in limited to approximately half the maximum line length. The serializer will not indent further than that no matter how many levels deep the hierarchy is.

When this variable is set to a value greater than 0, the serializer does not preserve white space. Spaces, tabs, carriage returns, and line feeds can all be interchanged at the serializer's discretion, and additional white space may be added before and after tags. Carriage returns, line feeds, and tabs will not be escaped with numeric character references.

Inside elements with an xml:space="preserve" attribute, white space is preserved and no indenting takes place, regardless of the setting of the indent property, unless, of course, an xml:space="default" attribute overrides the xml:space="preserve" attribute.

The default value for indent is 0; that is, the default is not to add or subtract any white space from the source document.

Params:
  • indent – the number of spaces to indent each successive level of the hierarchy
Throws:
/** * <p> * Sets the number of additional spaces to add to each successive * level in the hierarchy. Use 0 for no extra indenting. The * maximum indentation is in limited to approximately half the * maximum line length. The serializer will not indent further * than that no matter how many levels deep the hierarchy is. * </p> * * <p> * When this variable is set to a value greater than 0, * the serializer does not preserve white space. Spaces, * tabs, carriage returns, and line feeds can all be * interchanged at the serializer's discretion, and additional * white space may be added before and after tags. * Carriage returns, line feeds, and tabs will not be * escaped with numeric character references. * </p> * * <p> * Inside elements with an <code>xml:space="preserve"</code> * attribute, white space is preserved and no indenting * takes place, regardless of the setting of the indent * property, unless, of course, an * <code>xml:space="default"</code> attribute overrides the * <code>xml:space="preserve"</code> attribute. * </p> * * <p> * The default value for indent is 0; that is, the default is * not to add or subtract any white space from the source * document. * </p> * * @param indent the number of spaces to indent * each successive level of the hierarchy * * @throws IllegalArgumentException if indent is less than zero * */
public void setIndent(int indent) { if (indent < 0) { throw new IllegalArgumentException( "Indent cannot be negative" ); } escaper.setIndent(indent); }

Returns the string used as a line separator. This is always "\n", "\r", or "\r\n".

Returns:the line separator
/** * <p> * Returns the string used as a line separator. * This is always <code>"\n"</code>, <code>"\r"</code>, * or <code>"\r\n"</code>. * </p> * * @return the line separator */
public String getLineSeparator() { return escaper.getLineSeparator(); }

Sets the line separator. This can only be one of the three strings "\n", "\r", or "\r\n". All other values are forbidden. If this method is invoked, then line separators in the character data will be changed to this string. Line separators in attribute values will be changed to the hexadecimal numeric character references corresponding to this string.

The default line separator is "\r\n". However, line separators in character data and attribute values are not changed to this string, unless this method is called first.

Params:
  • lineSeparator – the line separator to set
Throws:
/** * <p> * Sets the line separator. This can only be one of the * three strings <code>"\n"</code>, <code>"\r"</code>, * or <code>"\r\n"</code>. All other values are forbidden. * If this method is invoked, then * line separators in the character data will be changed to this * string. Line separators in attribute values will be changed * to the hexadecimal numeric character references corresponding * to this string. * </p> * * <p> * The default line separator is <code>"\r\n"</code>. However, * line separators in character data and attribute values are not * changed to this string, unless this method is called first. * </p> * * @param lineSeparator the line separator to set * * @throws IllegalArgumentException if you attempt to use any line * separator other than <code>"\n"</code>, <code>"\r"</code>, * or <code>"\r\n"</code>. * */
public void setLineSeparator(String lineSeparator) { escaper.setLineSeparator(lineSeparator); }

Returns the preferred maximum line length.

Returns:the preferred maximum line length.
/** * <p> * Returns the preferred maximum line length. * </p> * * @return the preferred maximum line length. */
public int getMaxLength() { return escaper.getMaxLength(); }

Sets the suggested maximum line length for this serializer. Setting this to 0 indicates that no automatic wrapping is to be performed. When a line approaches this length, the serializer begins looking for opportunities to break the line. Generally it will break on any ASCII white space character (tab, carriage return, linefeed, and space). In some circumstances the serializer may not be able to break the line before the maximum length is reached. For instance, if an element name is longer than the maximum line length the only way to correctly serialize it is to exceed the maximum line length. In this case, the serializer will exceed the maximum line length.

The default value for maximum line length is 0, which is interpreted as no maximum line length. Setting this to a negative value just sets it to 0.

When this variable is set to a value greater than 0, the serializer does not preserve white space. Spaces, tabs, carriage returns, and line feeds can all be interchanged at the serializer's discretion. Carriage returns, line feeds, and tabs will not be escaped with numeric character references.

Inside elements with an xml:space="preserve" attribute, the maximum line length is not enforced, regardless of the setting of the this property, unless, of course, an xml:space="default" attribute overrides the xml:space="preserve" attribute.

Params:
  • maxLength – the preferred maximum line length
/** * <p> * Sets the suggested maximum line length for this serializer. * Setting this to 0 indicates that no automatic wrapping is to be * performed. When a line approaches this length, the serializer * begins looking for opportunities to break the line. Generally * it will break on any ASCII white space character (tab, carriage * return, linefeed, and space). In some circumstances the * serializer may not be able to break the line before the maximum * length is reached. For instance, if an element name is longer * than the maximum line length the only way to correctly * serialize it is to exceed the maximum line length. In this case, * the serializer will exceed the maximum line length. * </p> * * <p> * The default value for maximum line length is 0, which is * interpreted as no maximum line length. * Setting this to a negative value just sets it to 0. * </p> * * <p> * When this variable is set to a value greater than 0, * the serializer does not preserve white space. Spaces, * tabs, carriage returns, and line feeds can all be * interchanged at the serializer's discretion. * Carriage returns, line feeds, and tabs will not be * escaped with numeric character references. * </p> * * <p> * Inside elements with an <code>xml:space="preserve"</code> * attribute, the maximum line length is not enforced, * regardless of the setting of the this property, unless, * of course, an <code>xml:space="default"</code> attribute * overrides the <code>xml:space="preserve"</code> attribute. * </p> * * @param maxLength the preferred maximum line length */
public void setMaxLength(int maxLength) { escaper.setMaxLength(maxLength); }

Returns true if this serializer preserves the original base URIs by inserting extra xml:base attributes.

Returns:true if this Serializer inserts extra xml:base attributes to attempt to preserve base URI information from the document.
/** * <p> * Returns true if this serializer preserves the original * base URIs by inserting extra <code>xml:base</code> attributes. * </p> * * @return true if this <code>Serializer</code> inserts * extra <code>xml:base</code> attributes to attempt to * preserve base URI information from the document. */
public boolean getPreserveBaseURI() { return preserveBaseURI; }

Determines whether this serializer inserts extra xml:base attributes to attempt to preserve base URI information from the document. The default is false, do not preserve base URI information. xml:base attributes that have been explicitly added to an element are always output. This property only determines whether or not extra xml:base attributes are added.

Params:
  • preserve – true if xml:base attributes should be added as necessary to preserve base URI information
/** * <p> * Determines whether this serializer inserts * extra <code>xml:base</code> attributes to attempt to * preserve base URI information from the document. * The default is false, do not preserve base URI information. * <code>xml:base</code> attributes that have been explicitly * added to an element are always output. This property only * determines whether or not extra <code>xml:base</code> * attributes are added. * </p> * * @param preserve true if <code>xml:base</code> * attributes should be added as necessary * to preserve base URI information */
public void setPreserveBaseURI(boolean preserve) { this.preserveBaseURI = preserve; }

Returns the name of the character encoding used by this serializer.

Returns:the encoding used for the output document
/** * <p> * Returns the name of the character encoding used by * this serializer. * </p> * * @return the encoding used for the output document */
public String getEncoding() { return escaper.getEncoding(); }

If true, this property indicates serialization will perform Unicode normalization on all data using normalization form C (NFC). Performing Unicode normalization may change the document's infoset. The default is false; do not normalize. This version is based on Unicode 4.0.

This feature has not yet been benchmarked or optimized. It may result in substantially slower code.

If all your data is in the first 256 code points of Unicode (i.e. the ISO-8859-1, Latin-1 character set), then it's already in normalization form C and normalizing won't change anything.

Params:
  • normalize – true if normalization is performed; false if it isn't
/** * <p> * If true, this property indicates serialization will * perform Unicode normalization on all data using normalization * form C (NFC). Performing Unicode normalization may change the * document's infoset. The default is false; do not normalize. * This version is based on Unicode 4.0. * </p> * * <p> * This feature has not yet been benchmarked or optimized. * It may result in substantially slower code. * </p> * * <p> * If all your data is in the first 256 code points of Unicode * (i.e. the ISO-8859-1, Latin-1 character set), then it's * already in normalization form C and normalizing won't change * anything. * </p> * * @param normalize true if normalization is performed; * false if it isn't */
public void setUnicodeNormalizationFormC(boolean normalize) { escaper.setNFC(normalize); }

Indicates whether serialization will perform Unicode normalization on all data using normalization form C (NFC). The default is false; do not normalize.

Returns:true if this serializer performs Unicode normalization; false if it doesn't
/** * <p> * Indicates whether serialization will * perform Unicode normalization on all data using normalization * form C (NFC). The default is false; do not normalize. * </p> * * @return true if this serializer performs Unicode * normalization; false if it doesn't */
public boolean getUnicodeNormalizationFormC() { return escaper.getNFC(); }

Returns the current column number of the output stream. This method useful for subclasses that implement their own pretty printing strategies by inserting white space and line breaks at appropriate points.

Columns are counted based on Unicode characters, not Java chars. A surrogate pair counts as one character in this context, not two. However, a character followed by a combining character (e.g. e followed by combining accent acute) counts as two characters. This latter choice (treating combining characters like regular characters) is under review, and may change in the future if it's not too big a performance hit.

Returns:the current column number
/** * <p> * Returns the current column number of the output stream. This * method useful for subclasses that implement their own pretty * printing strategies by inserting white space and line breaks * at appropriate points. * </p> * * <p> * Columns are counted based on Unicode characters, not Java * chars. A surrogate pair counts as one character in this * context, not two. However, a character followed by a * combining character (e.g. e followed by combining accent * acute) counts as two characters. This latter choice * (treating combining characters like regular characters) * is under review, and may change in the future if it's not * too big a performance hit. * </p> * * @return the current column number */
protected final int getColumnNumber() { return escaper.getColumnNumber(); } }