diff options
Diffstat (limited to 'src/documentation/content/xdocs/dev/design/parsing.xml')
-rw-r--r-- | src/documentation/content/xdocs/dev/design/parsing.xml | 76 |
1 files changed, 76 insertions, 0 deletions
diff --git a/src/documentation/content/xdocs/dev/design/parsing.xml b/src/documentation/content/xdocs/dev/design/parsing.xml new file mode 100644 index 000000000..23f5e5df8 --- /dev/null +++ b/src/documentation/content/xdocs/dev/design/parsing.xml @@ -0,0 +1,76 @@ +<?xml version="1.0" standalone="no"?> +<!-- + Copyright 1999-2004 The Apache Software Foundation + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<!-- $Id$ --> +<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" + "http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-forrest/src/core/context/resources/schema/dtd/document-v12.dtd"> +<document> + <header> + <title>FOP Design: Input Parsing</title> + <version>$Revision$</version> + </header> + <body> + <section id="intro"> + <title>Introduction</title> + <p>Parsing is the process of reading the XSL-FO input and making the information in it available to FOP.</p> + </section> + <section id="input"> + <title>SAX for Input</title> + <p>The two standard ways of dealing with XML input are SAX and DOM. +SAX basically creates events as it parses an XML document in a serial fashion; a program using SAX (and not storing anything internally) will only see a small window of the document at any point in time, and can never look forward in the document. +DOM creates and stores a tree representation of the document, allowing a view of the entire document as an integrated whole. +One issue that may seem counter-intuitive to some new FOP developers, and which has from time to time been contentious, is that FOP uses SAX for input. +(DOM can be used as input as well, but it is converted into SAX events before entering FOP, effectively negating its advantages).</p> + <p>Since FOP essentially needs a tree representation of the FO input, at first glance it seems to make sense to use DOM. +Instead, FOP takes SAX events and builds its own tree-like structure. Why?</p> + <ul> + <li>DOM has a relatively large memory footprint. FOP's FO Tree is a lighter-weight structure.</li> + <li>DOM contains an entire document. FOP is able to process individual fo:page-sequence objects discretely, without the need to have the entire document in memory. For documents that have only one fo:page-sequence object, FOP's approach is no advantage, but in other cases it is a huge advantage. A 500-page book that is broken into 100 5-page chapters, each in its own fo:page-sequence, essentially needs only 1% of the document memory that would be required if using DOM as input.</li> + </ul> + <p>See the <link href="../embedding.html#input">Input Section of the User Embedding Document</link> for a discussion of input usage patterns and some implementation details.</p> + <p>FOP's <link href="fotree.html">FO Tree Mechanism</link> is responsible for catching the SAX events and processing them.</p> + </section> + <section id="validation"> + <title>Validation</title> + <p>If the input XML is not well-formed, that will be reported.</p> + <p>There is no DTD for XSL-FO, so no formal validation is possible at the parser level.</p> + <p>The SAX handler will report an error for unrecognized <link href="#namespaces">namespaces</link>.</p> + </section> + <section id="namespaces"> + <title>Namespaces</title> + <p>To allow for extensions to the XSL-FO language, FOP provides a mechanism for handling foreign namespaces.</p> + <p>See <link href="../extensions.html">User Extensions</link> for a discussion of standard extensions shipped with FOP, and their related namespaces.</p> + <p>See <link href="../dev/extensions.html">Developer Extensions</link> for a discussion of the mechanisms in place to allow developers to add their own extensions, including how to tell FOP about the foreign namespace.</p> + </section> + <section id="status"> + <title>Status</title> + <section id="status-todo"> + <title>To Do</title> + </section> + <section id="status-wip"> + <title>Work In Progress</title> + </section> + <section id="status-complete"> + <title>Completed</title> + <ul> + <li>better handling of unknown xml and xml from an unknown namespace</li> + <li>Changed extensions to allow for external xml</li> + <li>Can have a default element mapping for extensions</li> + </ul> + </section> + </section> + </body> +</document> |