12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576 |
- <?xml version="1.0" encoding="UTF-8" standalone="no"?>
- <!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
- -->
- <!-- $Id$ -->
- <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.3//EN" "http://forrest.apache.org/dtd/document-v13.dtd">
- <document>
- <header>
- <title>Apache™ FOP Design: Input Parsing</title>
- <version>$Revision$</version>
- </header>
- <body>
- <section id="intro">
- <title>Introduction</title>
- <p>Parsing is the process of reading the XSL-FO input and making the information in it available to Apache™ FOP.</p>
- </section>
- <section id="input">
- <title>SAX for Input</title>
- <p>The two standard ways of dealing with XML input are SAX and DOM.
- SAX basically creates events as it parses an XML document in a serial fashion; a program using SAX (and not storing anything internally) will only see a small window of the document at any point in time, and can never look forward in the document.
- DOM creates and stores a tree representation of the document, allowing a view of the entire document as an integrated whole.
- One issue that may seem counter-intuitive to some new FOP developers, and which has from time to time been contentious, is that FOP uses SAX for input.
- (DOM can be used as input as well, but it is converted into SAX events before entering FOP, effectively negating its advantages).</p>
- <p>Since FOP essentially needs a tree representation of the FO input, at first glance it seems to make sense to use DOM.
- Instead, FOP takes SAX events and builds its own tree-like structure. Why?</p>
- <ul>
- <li>DOM has a relatively large memory footprint. FOP's FO Tree is a lighter-weight structure.</li>
- <li>DOM contains an entire document. FOP is able to process individual fo:page-sequence objects discretely, without the need to have the entire document in memory. For documents that have only one fo:page-sequence object, FOP's approach is no advantage, but in other cases it is a huge advantage. A 500-page book that is broken into 100 5-page chapters, each in its own fo:page-sequence, essentially needs only 1% of the document memory that would be required if using DOM as input.</li>
- </ul>
- <p>See the <link href="../../trunk/embedding.html#input">Input Section of the User Embedding Document</link> for a discussion of input usage patterns and some implementation details.</p>
- <p>FOP's <link href="fotree.html">FO Tree Mechanism</link> is responsible for catching the SAX events and processing them.</p>
- </section>
- <section id="validation">
- <title>Validation</title>
- <p>If the input XML is not well-formed, that will be reported.</p>
- <p>There is no DTD for XSL-FO, so no formal validation is possible at the parser level.</p>
- <p>The SAX handler will report an error for unrecognized <link href="#namespaces">namespaces</link>.</p>
- </section>
- <section id="namespaces">
- <title>Namespaces</title>
- <p>To allow for extensions to the XSL-FO language, FOP provides a mechanism for handling foreign namespaces.</p>
- <p>See <link href="../../trunk/extensions.html">User Extensions</link> for a discussion of standard extensions shipped with FOP, and their related namespaces.</p>
- <p>See <link href="../extensions.html">Developer Extensions</link> for a discussion of the mechanisms in place to allow developers to add their own extensions, including how to tell FOP about the foreign namespace.</p>
- </section>
- <section id="status">
- <title>Status</title>
- <section id="status-todo">
- <title>To Do</title>
- </section>
- <section id="status-wip">
- <title>Work In Progress</title>
- </section>
- <section id="status-complete">
- <title>Completed</title>
- <ul>
- <li>better handling of unknown xml and xml from an unknown namespace</li>
- <li>Changed extensions to allow for external xml</li>
- <li>Can have a default element mapping for extensions</li>
- </ul>
- </section>
- </section>
- </body>
- </document>
|