From: Peter Bernard West Date: Wed, 12 Mar 2003 14:39:46 +0000 (+0000) Subject: Replaced with xml-parsing.ehtml X-Git-Tag: Alt-Design-integration-base~26 X-Git-Url: https://source.dussan.org/?a=commitdiff_plain;h=4e8caab7e205de78f0986be5e4ab9d9397c0df12;p=xmlgraphics-fop.git Replaced with xml-parsing.ehtml git-svn-id: https://svn.apache.org/repos/asf/xmlgraphics/fop/trunk@196084 13f79535-47bb-0310-9956-ffa450edef68 --- diff --git a/src/documentation/content/xdocs/design/alt.design/xml-parsing.xml b/src/documentation/content/xdocs/design/alt.design/xml-parsing.xml deleted file mode 100644 index 4c44b0f84..000000000 --- a/src/documentation/content/xdocs/design/alt.design/xml-parsing.xml +++ /dev/null @@ -1,231 +0,0 @@ - - - - -
- Integrating XML Parsing - - - -
- -
- An alternative parser integration -

- This note proposes an alternative method of integrating the - output of the SAX parsing of the Flow Object (FO) tree into - FOP processing. The pupose of the proposed changes is to - provide for better decomposition of the process of analysing - and rendering an fo tree such as is represented in the output - from initial (XSLT) processing of an XML source document. -

-
- Structure of SAX parsing -

- Figure 1 is a schematic representation of the process of SAX - parsing of an input source. SAX parsing involves the - registration, with an object implementing the - XMLReader interface, of a - ContentHandler which contains a callback - routine for each of the event types encountered by the - parser, e.g., startDocument(), - startElement(), characters(), - endElement() and endDocument(). - Parsing is initiated by a call to the parser() - method of the XMLReader. Note that the call to - parser() and the calls to individual callback - methods are synchronous: parser() will only - return when the last callback method returns, and each - callback must complete before the next is called.

- Figure 1 -

-
-

- In the process of parsing, the hierarchical structure of the - original FO tree is flattened into a number of streams of - events of the same type which are reported in the sequence - in which they are encountered. Apart from that, the API - imposes no structure or constraint which expresses the - relationship between, e.g., a startElement event and the - endElement event for the same element. To the extent that - such relationship information is required, it must be - managed by the callback routines. -

-

- The most direct approach here is to build the tree - "invisibly"; to bury within the callback routines the - necessary code to construct the tree. In the simplest case, - the whole of the FO tree is built within the call to - parser(), and that in-memory tree is subsequently - processed to (a) validate the FO structure, and (b) - construct the Area tree. The problem with this approach is - the potential size of the FO tree in memory. FOP has - suffered from this problem in the past. -

-
-
- Cluttered callbacks -

- On the other hand, the callback code may become increasingly - complex as tree validation and the triggering of the Area - tree processing and subsequent rendering is moved into the - callbacks, typically the endElement() method. - In order to overcome acute memory problems, the FOP code was - recently modified in this way, to trigger Area tree building - and rendering in the endElement() method, when - the end of a page-sequence was detected. -

-

- The drawback with such a method is that it becomes difficult - to detemine the order of events and the circumstances in - which any particular processing events are triggered. When - the processing events are inherently self-contained, this is - irrelevant. But the more complex and context-dependent the - relationships are among the processing elements, the more - obscurity is engendered in the code by such "side-effect" - processing. -

-
-
- From passive to active parsing -

- In order to solve the simultaneous problems of exposing the - structure of the processing and minimising in-memory - requirements, the experimental code separates the parsing of - the input source from the building of the FO tree and all - downstream processing. The callback routines become - minimal, consisting of the creation and buffering of - XMLEvent objects as a producer. All - of these objects are effectively merged into a single event - stream, in strict event order, for subsequent access by the - FO tree building process, acting as a - consumer. In itself, this does not reduce the - footprint. This occurs when the approach is generalised to - modularise FOP processing.

Figure 2 -

-
-

- The most useful change that this brings about is the switch - from passive to active XML element - processing. The process of parsing now becomes visible to - the controlling process. All local validation requirements, - all object and data structure building, is initiated by the - process(es) getting from the queue - in the case - above, the FO tree builder. -

-
-
- XMLEvent methods - -

- The experimental code uses a class XMLEvent - to provide the objects which are placed in the queue. - XMLEvent includes a variety of methods to access - elements in the queue. Namespace URIs encountered in - parsing are maintined in a static - HashMap where they are associated with a unique - integer index. This integer value is used in the signature - of some of the access methods. -

-
-
XMLEvent getEvent(SyncedCircularBuffer events)
-
- This is the basis of all of the queue access methods. It - returns the next element from the queue, which may be a - pushback element. -
-
XMLEvent getEndDocument(events)
-
- get and discard elements from the queue - until an ENDDOCUMENT element is found and returned. -
-
XMLEvent expectEndDocument(events)
-
- If the next element on the queue is an ENDDOCUMENT event, - return it. Otherwise, push the element back and throw an - exception. Each of the get methods (except - getEvent() itself) has a corresponding - expect method. -
-
XMLEvent get/expectStartElement(events)
-
Return the next STARTELEMENT event from the queue.
-
XMLEvent get/expectStartElement(events, String - qName)
-
- Return the next STARTELEMENT with a QName matching - qName. -
-
- XMLEvent get/expectStartElement(events, int uriIndex, - String localName) -
-
- Return the next STARTELEMENT with a URI indicated by the - uriIndex and a local name matching localName. -
-
- XMLEvent get/expectStartElement(events, LinkedList list) -
-
- list contains instances of the nested class - UriLocalName, which hold a - uriIndex and a localName. Return - the next STARTELEMENT with a URI indicated by the - uriIndex and a local name matching - localName from any element of - list. -
-
XMLEvent get/expectEndElement(events)
-
Return the next ENDELEMENT.
-
XMLEvent get/expectEndElement(events, qName)
-
Return the next ENDELEMENT with QName - qname.
-
XMLEvent get/expectEndElement(events, uriIndex, localName)
-
- Return the next ENDELEMENT with a URI indicated by the - uriIndex and a local name matching - localName. -
-
- XMLEvent get/expectEndElement(events, XMLEvent event) -
-
- Return the next ENDELEMENT with a URI matching the - uriIndex and localName - matching those in the event argument. This - is intended as a quick way to find the ENDELEMENT matching - a previously returned STARTELEMENT. -
-
XMLEvent get/expectCharacters(events)
-
Return the next CHARACTERS event.
-
-
-
- FOP modularisation -

- This same principle can be extended to the other major - sub-systems of FOP processing. In each case, while it is - possible to hold a complete intermediate result in memory, - the memory costs of that approach are too high. The - sub-systems - xml parsing, FO tree construction, Area tree - construction and rendering - must run in parallel if the - footprint is to be kept manageable. By creating a series of - producer-consumer pairs linked by synchronized buffers, - logical isolation can be achieved while rates of processing - remain coupled. By introducing feedback loops conveying - information about the completion of processing of the - elements, sub-systems can dispose of or precis those - elements without having to be tightly coupled to downstream - processes.

- Figure 3 -

-
-
-
- -
-