XML Parsing

- XML Input -

The xml document is always handled internally as SAX. The SAX events - are used to read the elements, attributes and text data of the FO document. - After the manipulation of the data the renderer writes out the pages in the - appropriate format. It may write as it goes, a page at a time or the whole - document at once. Once finished the document should contain all the data in the - chosen format ready for whatever use.

FOP can take the input XML in a number of ways:

+ Introduction +

Parsing is the process of reading the XSL-FO input and making the information in it available to FOP.

+ SAX for Input +

The two standard ways of dealing with XML input are SAX and DOM. +SAX basically creates events as it parses an XML document in a serial fashion; a program using SAX (and not storing anything internally) will only see a small window of the document at any point in time, and can never look forward in the document. +DOM creates and stores a tree representation of the document, allowing a view of the entire document as an integrated whole. +One issue that may seem counter-intuitive to some new FOP developers, and which has from time to time been contentious, is that FOP uses SAX for input. +(DOM can be used as input as well, but it is converted into SAX events before entering FOP, effectively negating its advantages).

Since FOP essentially needs a tree representation of the FO input, at first glance it seems to make sense to use DOM. +Instead, FOP takes SAX events and builds its own tree-like structure. Why?

SAX Events through SAX Handler: FOTreeBuilder is the SAX Handler which is obtained through getContentHandler on Driver.
DOM (which is converted into SAX Events): The conversion of a DOM tree is done via the render(Document) method on Driver.
Data Source (which is parsed and converted into SAX Events): The Driver can take an InputSource as input. -This can use a Stream, String etc.
XML+XSLT Transformation (which is transformed using an XSLT Processor and the result is fired as SAX Events: XSLTInputHandler is used as an InputSource in the render(XMLReader, InputSource) method on Driver
DOM has a relatively large memory footprint. FOP's FO Tree is a lighter-weight structure.
DOM contains an entire document. FOP is able to process individual fo:page-sequence objects discretely, without the need to have the entire document in memory. For documents that have only one fo:page-sequence object, FOP's approach is no advantage, but in other cases it is a huge advantage. A 500-page book that is broken into 100 5-page chapters, each in its own fo:page-sequence, essentially needs only 1% of the document memory that would be required if using DOM as input.

The SAX Events which are fired on the SAX Handler, class FOTreeBuilder, must represent an XSL:FO document. -If not there will be an error. -Any problems with the XML being well-formed are also handled here.

See the Input Section of the User Embedding Document for a discussion of input usage patterns and some implementation details.

- Element Mappings +

+ Validation +

If the input XML is not well-formed, that will be reported.

There is no DTD for XSL-FO, so no formal validation is possible at the parser level.

The SAX handler will report an error for unrecognized namespaces.

+ Namespaces

The element mapping is a hashmap of all the elements in a particular namespace. This makes it easy to create a different object for each element. Element mappings are static to save on memory.

diff --git a/src/documentation/content/xdocs/embedding.xml b/src/documentation/content/xdocs/embedding.xml index 56f51aa40..a71431e0c 100644 --- a/src/documentation/content/xdocs/embedding.xml +++ b/src/documentation/content/xdocs/embedding.xml @@ -149,28 +149,28 @@ issues should be fixed in the upcoming JDK 1.4

If you want FOP to be totally silent you can also set an org.apache.avalon.framework.logger.NullLogger instance.

If you want to use yet another logging facility you simply have to create a class that implements org.apache.avalon.framework.logging.Logger and set it on the Driver object. See the existing implementations in Avalon Framework for examples.

- Hints

- XML/XSL/DOM Inputs -

-You can supply your input to FOP from a variety of data sources. + Input Sources +

The input XSL-FO document is always handled internally as SAX (see the Parsing Design Document for the rationale). +However, the input itself can be provided in a variety of ways to FOP, which normalizes the input (if necessary) into SAX events:

SAX Events through SAX Handler: FOTreeBuilder is the SAX Handler which is obtained through getContentHandler on Driver.
DOM (which is converted into SAX Events): The conversion of a DOM tree is done via the render(Document) method on Driver.
Data Source (which is parsed and converted into SAX Events): The Driver can take an InputSource as input. +This can use a Stream, String etc.
XML+XSLT Transformation (which is transformed using an XSLT Processor and the result is fired as SAX Events: XSLTInputHandler is used as an InputSource in the render(XMLReader, InputSource) method on Driver.

There are a variety of upstream data manipulations possible. For example, you may have a DOM and an XSL stylesheet; or you may want to set variables in the stylesheet. - -Xalan Basic Usage Patterns provides some interface documentation and cookbook solutions for such situations. -

+Interface documentation and some cookbook solutions to these situations are provided in Xalan Basic Usage Patterns.

-You can use the content handler from the driver to create a SAXResult. -The transformer then can fire SAX events on the content handler which -will in turn create the rendered output. -

-Examples showing this can be found at the bott +See the Examples for some variations on input.

+ Hints

Object reuse