XML Space Stripper
Usage
To use this program, select input and output files using the appropriate buttons.
Start the program by clicking the Go button. If all goes well,
the window should display some statistics. Otherwise, it will show you a
list of errors, as generated by the Xerces parser.
The meaning of the various controls is as follows:
- Strip whitespace according to DTD or Schema
If this box is
checked, whitespace content of element-only content models will be
stripped. In other words, if the content model of an element only
allows child elements, intervening whitespace will be stripped.
However, if the attribute xml:space="preserve" was
specified on the element, all whitespace is preserved.
- Strip whitespace according to Convention
If this box is
checked, whitespace content directly following an open tag and
whitespace directly preceding a closing tag is stripped, also in
case of mixed content models. However, if the attribute
xml:space="preserve" was specified on the
element, all whitespace is preserved. Note that non-breaking spaces
do not count as whitespace.
As an example, the following
XML sample:
<p> aaaa <b> bbbb </b> cccc </p>
will
be transformed into:
<p>aaaa <b>bbbb</b> cccc</p>
- Suppress parser messages for warnings/errors
Use the boxes
to suppres output from the Xerces parser.
- Output character encoding
This listbox specifies the encoding
for the output file. Note that ISO-8859-12 was omitted on purpose.
There is no UTF-16 yet due to lack of knowledge on how to deal with
big endian and little endian BOM characters.
- Character references in Decimal, Hexadecimal or HEXADECIMAL
This listbox specifies the digits used for character references. For
example, character nummer 255 can be written as
ÿ, ÿ or
ÿ respectively.
- Character references only if needed, for characters > 126, for
characters > 255
This listbox specifies when
to use normal characters and when to use character references
instead. Note that this setting will be silently adjusted if the
encoding requires it.
Acknowledgements
This tiny program was brought to you by Pieter Masereeuw (http://www.masereeuw.nl).
The parser used by this program is the Java Xerces parser
(http://xerces.apache.org/xerces-j/).