XML Parsing All you wanted to know about XML Parsing !

Since everyone knows the basics we can get into the various stages starting with the XML handling.

FOP can take the input XML in a number of ways:

  • SAX Events through SAX Handler
    • FOTreeBuilder is the SAX Handler which is obtained through getContentHandler on Driver.
  • DOM which is converted into SAX Events
    • The conversion of a DOM tree is done via the render(Document) method on Driver.
  • data source which is parsed and converted into SAX Events
    • The Driver can take an InputSource as input. This can use a Stream, String etc.
  • XML+XSLT which is transformed using an XSLT Processor and the result is fired as SAX Events
    • XSLTInputHandler is used as an InputSource in the render(XMLReader, InputSource) method on Driver

The SAX Events which are fired on the SAX Handler, class FOTreeBuilder, must represent an XSL:FO document. If not there will be an error. Any problems with the XML being well formed are handled here.

The element mapping is a hashmap of all the elements in a particular namespace. This makes it easy to create a different object for each element. Element mappings are static to save on memory.

To add an extension a developer can put in the classpath a jar that contains the file /META-INF/services/org.apache.fop.fo.ElementMapping. This must contain a line with the fully qualified name of a class that implements the org.apache.fop.fo.ElementMapping interface. This will then be loaded automatically at the start. Internal mappings are: FO, SVG and Extension (pdf bookmarks)

The SAX Events will fire all the information for the document with start element, end element, text data etc. This information is used to build up a representation of the FO document. To do this for a namespace there is a set of element mappings. When an element + namepsace mapping is found then it can create an object for that element. If the element is not found then it creates a dummy object or a generic DOM for unknown namespaces.

The object is then setup and then given attributes for the element. For the FO Tree the attributes are converted into properties. The FO objects use a property list mapping to convert the attributes into a list of properties for the element. For other XML, for example SVG, a DOM of the XML is constructed. This DOM can then be passed through to the renderer. Other element mappings can be used in different ways, for example to create elements that create areas during the layout process or setup information for the renderer etc.

While the tree building is mainly about creating the FO Tree there are some stages that can propagate to the renderer. At the end of a page sequence we know that all pages in the page sequence can be laid out without being effected by any further XML. The significance of this is that the FO Tree for the page sequence may be able to be disposed of. The end of the XML document also tells us that we can finalise the output document. (The layout of individual pages is accomplished by the layout managers page at a time; i.e. they do not need to wait for the end of the page sequence. The page may not yet be complete, however, containing forward page number references, for example.)

  • Error handling for xml not well formed.
  • Error handling for other XML parsing errors.
  • Developer info for adding namespace handlers.