aboutsummaryrefslogtreecommitdiffstats
path: root/src/documentation/content/xdocs/dev/design/parsing.xml
diff options
context:
space:
mode:
Diffstat (limited to 'src/documentation/content/xdocs/dev/design/parsing.xml')
-rw-r--r--src/documentation/content/xdocs/dev/design/parsing.xml76
1 files changed, 76 insertions, 0 deletions
diff --git a/src/documentation/content/xdocs/dev/design/parsing.xml b/src/documentation/content/xdocs/dev/design/parsing.xml
new file mode 100644
index 000000000..23f5e5df8
--- /dev/null
+++ b/src/documentation/content/xdocs/dev/design/parsing.xml
@@ -0,0 +1,76 @@
+<?xml version="1.0" standalone="no"?>
+<!--
+ Copyright 1999-2004 The Apache Software Foundation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<!-- $Id$ -->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
+ "http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-forrest/src/core/context/resources/schema/dtd/document-v12.dtd">
+<document>
+ <header>
+ <title>FOP Design: Input Parsing</title>
+ <version>$Revision$</version>
+ </header>
+ <body>
+ <section id="intro">
+ <title>Introduction</title>
+ <p>Parsing is the process of reading the XSL-FO input and making the information in it available to FOP.</p>
+ </section>
+ <section id="input">
+ <title>SAX for Input</title>
+ <p>The two standard ways of dealing with XML input are SAX and DOM.
+SAX basically creates events as it parses an XML document in a serial fashion; a program using SAX (and not storing anything internally) will only see a small window of the document at any point in time, and can never look forward in the document.
+DOM creates and stores a tree representation of the document, allowing a view of the entire document as an integrated whole.
+One issue that may seem counter-intuitive to some new FOP developers, and which has from time to time been contentious, is that FOP uses SAX for input.
+(DOM can be used as input as well, but it is converted into SAX events before entering FOP, effectively negating its advantages).</p>
+ <p>Since FOP essentially needs a tree representation of the FO input, at first glance it seems to make sense to use DOM.
+Instead, FOP takes SAX events and builds its own tree-like structure. Why?</p>
+ <ul>
+ <li>DOM has a relatively large memory footprint. FOP's FO Tree is a lighter-weight structure.</li>
+ <li>DOM contains an entire document. FOP is able to process individual fo:page-sequence objects discretely, without the need to have the entire document in memory. For documents that have only one fo:page-sequence object, FOP's approach is no advantage, but in other cases it is a huge advantage. A 500-page book that is broken into 100 5-page chapters, each in its own fo:page-sequence, essentially needs only 1% of the document memory that would be required if using DOM as input.</li>
+ </ul>
+ <p>See the <link href="../embedding.html#input">Input Section of the User Embedding Document</link> for a discussion of input usage patterns and some implementation details.</p>
+ <p>FOP's <link href="fotree.html">FO Tree Mechanism</link> is responsible for catching the SAX events and processing them.</p>
+ </section>
+ <section id="validation">
+ <title>Validation</title>
+ <p>If the input XML is not well-formed, that will be reported.</p>
+ <p>There is no DTD for XSL-FO, so no formal validation is possible at the parser level.</p>
+ <p>The SAX handler will report an error for unrecognized <link href="#namespaces">namespaces</link>.</p>
+ </section>
+ <section id="namespaces">
+ <title>Namespaces</title>
+ <p>To allow for extensions to the XSL-FO language, FOP provides a mechanism for handling foreign namespaces.</p>
+ <p>See <link href="../extensions.html">User Extensions</link> for a discussion of standard extensions shipped with FOP, and their related namespaces.</p>
+ <p>See <link href="../dev/extensions.html">Developer Extensions</link> for a discussion of the mechanisms in place to allow developers to add their own extensions, including how to tell FOP about the foreign namespace.</p>
+ </section>
+ <section id="status">
+ <title>Status</title>
+ <section id="status-todo">
+ <title>To Do</title>
+ </section>
+ <section id="status-wip">
+ <title>Work In Progress</title>
+ </section>
+ <section id="status-complete">
+ <title>Completed</title>
+ <ul>
+ <li>better handling of unknown xml and xml from an unknown namespace</li>
+ <li>Changed extensions to allow for external xml</li>
+ <li>Can have a default element mapping for extensions</li>
+ </ul>
+ </section>
+ </section>
+ </body>
+</document>