1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
<?xml version="1.0" standalone="no"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
"http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-forrest/src/resources/schema/dtd/document-v11.dtd">
<document>
<header>
<title>FOP Design: Input Parsing</title>
</header>
<body>
<section id="intro">
<title>Introduction</title>
<p>Parsing is the process of reading the XSL-FO input and making the information in it available to FOP.</p>
</section>
<section id="input">
<title>SAX for Input</title>
<p>The two standard ways of dealing with XML input are SAX and DOM.
SAX basically creates events as it parses an XML document in a serial fashion; a program using SAX (and not storing anything internally) will only see a small window of the document at any point in time, and can never look forward in the document.
DOM creates and stores a tree representation of the document, allowing a view of the entire document as an integrated whole.
One issue that may seem counter-intuitive to some new FOP developers, and which has from time to time been contentious, is that FOP uses SAX for input.
(DOM can be used as input as well, but it is converted into SAX events before entering FOP, effectively negating its advantages).</p>
<p>Since FOP essentially needs a tree representation of the FO input, at first glance it seems to make sense to use DOM.
Instead, FOP takes SAX events and builds its own tree-like structure. Why?</p>
<ul>
<li>DOM has a relatively large memory footprint. FOP's FO Tree is a lighter-weight structure.</li>
<li>DOM contains an entire document. FOP is able to process individual fo:page-sequence objects discretely, without the need to have the entire document in memory. For documents that have only one fo:page-sequence object, FOP's approach is no advantage, but in other cases it is a huge advantage. A 500-page book that is broken into 100 5-page chapters, each in its own fo:page-sequence, essentially needs only 1% of the document memory that would be required if using DOM as input.</li>
</ul>
<p>See the <link href="../embedding.html#input">Input Section of the User Embedding Document</link> for a discussion of input usage patterns and some implementation details.</p>
<p>FOP's <link href="fotree.html">FO Tree Mechanism</link> is responsible for catching the SAX events and processing them.</p>
</section>
<section id="validation">
<title>Validation</title>
<p>If the input XML is not well-formed, that will be reported.</p>
<p>There is no DTD for XSL-FO, so no formal validation is possible at the parser level.</p>
<p>The SAX handler will report an error for unrecognized <link href="#namespaces">namespaces</link>.</p>
</section>
<section id="namespaces">
<title>Namespaces</title>
<p>To allow for extensions to the XSL-FO language, FOP provides a mechanism for handling foreign namespaces.</p>
<p>See <link href="../extensions.html">User Extensions</link> for a discussion of standard extensions shipped with FOP, and their related namespaces.</p>
<p>See <link href="../dev/extenstions.html">Developer Extensions</link> for a discussion of the mechanisms in place to allow developers to add their own extensions, including how to tell FOP about the foreign namespace.</p>
</section>
</body>
</document>
|