Process FO Elements ASAP

Primary Design Goals -

A discussion of project design properly begins with a list of the goals of the project. Out of these goals will flow the design issues and details, and eventually, the implementation.

A discussion of project design properly begins with a list of the goals of the project. +Out of these goals will flow the design issues and details, and eventually, the implementation.

Conformance to the XSL-FO Specification

The current design goal is to reach the "basic" level of conformance, and to have enough flexibility in the design to reach "complete" conformance without major rewriting. -After "basic" conformance is achieved, it is probably that higher levels of conformance will be sought.

+After "basic" conformance is achieved, it is probable that higher levels of conformance will be sought.

Process Files of Arbitrary Size @@ -48,7 +52,7 @@ After "basic" conformance is achieved, it is probably that higher levels of conf

Secondary Design Goals

- Keep Memory Minimal + Minimize Memory Use

Many FOP design decisions revolve around trying to minimize the use of memory. The primary purpose here is to reduce the amount of data that must be serialized to storage during processing. Since our primary design goals include the ability to process files of arbitrary size, there is no way to avoid the need to serialize. @@ -59,13 +63,13 @@ To the extent that it can be done so without jeopardizing the primary design goa

The Big Picture View -

With our design goals outlines, we'll now open the Black Box and look at the major processes inside. +

With our design goals outlined, we'll now open the Black Box and look at the major processes inside. FOP has adopted the basic structure of the XSL-FO standard itself as a convenient model for the major processes in FOP. The Result in each row is the input for the next.

- + - + @@ -74,90 +78,33 @@ FOP has adopted the basic structure of the XSL-FO standard itself as a convenien - - + + - - + + - - + + - +

FOP from a Big Picture Standpoint
Process	Result	Process Result/Input for Next	Notes
.
SAX Handler	FO Tree	Parsing	FO Tree	.
Refinement	Refined FO Tree	Refinement	Refined FO Tree	.
Layout	Area Tree	Layout	Area Tree	Layout and Area Tree are not needed or used for the structural outputs (MIF and RTF), as they are not paginated.
Renderer	Renderer	Output: PDF, Postscript, Print, etc.	.

In general, each piece of data will go be processed in the same way. +

In general, each piece of data will be processed in the same way. However, some information may be used more than once, and some may be used out of order. To reduce memory, one process may start before the previous process is completed.

The FO Tree is constructed from the xml document. It is an internal - representation of the xml document and it is like a DOM with some differences. - The Layout Managers use the FO Tree do their layout stuff and create an Area - Tree. The Area Tree is a representation of the final result. It is a - representation of a set of pages containing the text and other graphics. The - Area Tree is then given to a Renderer. The Renderer can read the Area Tree and - convert the information into the render format. For example the PDF Renderer - creates a PDF Document. For each page in the Area Tree the renderer creates a - PDF Page and places the contents of the page into the PDF Page. Once a PDF Page - is complete then it can be written to the output stream.

- Design Issues -

As with any significant programming project, we need to first understand the big problem, then break it into smaller solvable problems. -To achieve our design goals, we have identified and attempted to resolve some design issues. -Since they are in support of the primary and secondary goals, they are not necessarily written in stone. +

For a detailed discussion of the design of any component, follow its link in the table above. +Each component outlines the design issues which have already been addressed. +These resolution of these design issues is in support of the primary and secondary goals, so they are not necessarily written in stone. However, most of them have been discussed at length among the developers, and are reasonably well settled.

- Use SAX as Input -

The two standard ways of dealing with XML input are SAX and DOM. SAX basically creates events as it parses an XML document in a serial fashion; a program using SAX (and not storing anything internally) will only see a small window of the document at any point in time, and can never look forward in the document. DOM creates and stores a tree representation of the document, allowing a view of the entire document as an integrated whole. One issue that may seem counter-intuitive to some new FOP developers, and which has from time to time been contentious, is that FOP uses SAX for input. (DOM can be used as input as well, but it is converted into SAX events before entering FOP, effectively negating its advantages).

Since FOP essentially needs a tree representation of the FO input, at first glance it seems to make sense to use DOM. Instead, FOP takes SAX events and builds its own tree-like structure. Why?

DOM has a relatively large memory footprint. FOP's FO Tree is a lighter-weight structure.
DOM contains an entire document. FOP is able to process individual fo:page-sequence objects discretely, without the need to have the entire document in memory. For documents that have only one fo:page-sequence object, FOP's approach is no advantage, but in other cases it is a huge advantage. A 500-page book that is broken into 100 5-page chapters, each in its own fo:page-sequence, essentially needs only 1% of the document memory that would be required if using DOM as input.

- Process FO Elements ASAP -

The issue here is that we wish to recycle FO Tree memory as much as possible. There are at least three possible places that FO Tree fragments can be passed to the Layout process, and their memory recycled:

- fo:block It might be tempting to start laying out pages as soon as the first fo:block object is finished. However, there are many downstream things that can affect the placement of that block on a page, such as graphics and footnotes. So, in order to maintain conformance to the XSL-FO specification, and create high-quality output, we must see more of the document.
- fo:root The other extreme is to wait until the entire document is read in before processing any of it. This essentially means that there is no memory recycling. Processing the document correctly is more important than saving memory, so this option would be used if there were no better alternative.
- fo:page-sequence The page-sequence object provides a nice clean break in the document. Content from one page-sequence will never interfere with nor affect the placement of the content of another. FOP uses this option as the optimum way to maintain compliance with the standard and to minimize memory consumption.

- Serialize FO Tree as Necessary -

This issue is implied by the requirement to process documents of arbitrary size. Unless some arbitrary limit is placed on the size of page-sequence objects, FOP must be able to serialize FO tree fragments as necessary.

- Keep Layouts Simple -

Layout should handle floats, footnotes and keeps in a simple, straightforward way.

- Keep ID References Simple -

- Render Pages ASAP -

The issue here is that we wish to recycle the Area Tree memory as much as possible. The problem is that forward references prevent pages from being resolved until the forward references are resolved. If memory is insufficient to store unresolved pages, Area Tree fragments must be serialized until resolved.

FOP developers have discussed adding the capability of using an Area Tree to render to more than one output target in the same run, which would be a complicating factor in disposal of pages as they are rendered.

- Renderers are Responsible -

Each renderer is totally responsible for its output format.

- Send Output to a Stream -

diff --git a/src/documentation/content/xdocs/design/layout.xml b/src/documentation/content/xdocs/design/layout.xml index abaecf78c..85105f801 100644 --- a/src/documentation/content/xdocs/design/layout.xml +++ b/src/documentation/content/xdocs/design/layout.xml @@ -27,6 +27,18 @@ Note: it may be possible to start immediately after a block formatting object ha It is also possible to layout all pages in a page sequence after each page sequence has been added from the xml.

The layout process is handled by a set of layout managers. The block level layout managers are used to create the block areas which are added to the region area of a page.

+ Keep Layouts Simple +

Layout should handle floats, footnotes and keeps in a simple, straightforward way.

+ Keep ID References Simple +

+ Render Pages ASAP +

Layout Managers

The layout managers are set up from the hierarchy of the formatting object tree. diff --git a/src/documentation/content/xdocs/design/parsing.xml b/src/documentation/content/xdocs/design/parsing.xml index 9dc2646c3..326280de6 100644 --- a/src/documentation/content/xdocs/design/parsing.xml +++ b/src/documentation/content/xdocs/design/parsing.xml @@ -6,6 +6,15 @@ XML Parsing +

+ Use SAX as Input +

Since FOP essentially needs a tree representation of the FO input, at first glance it seems to make sense to use DOM. Instead, FOP takes SAX events and builds its own tree-like structure. Why?

DOM has a relatively large memory footprint. FOP's FO Tree is a lighter-weight structure.
DOM contains an entire document. FOP is able to process individual fo:page-sequence objects discretely, without the need to have the entire document in memory. For documents that have only one fo:page-sequence object, FOP's approach is no advantage, but in other cases it is a huge advantage. A 500-page book that is broken into 100 5-page chapters, each in its own fo:page-sequence, essentially needs only 1% of the document memory that would be required if using DOM as input.

XML Input

The xml document is always handled internally as SAX. The SAX events diff --git a/src/documentation/content/xdocs/design/properties.xml b/src/documentation/content/xdocs/design/properties.xml index 426cbd4dd..6947cc163 100644 --- a/src/documentation/content/xdocs/design/properties.xml +++ b/src/documentation/content/xdocs/design/properties.xml @@ -322,5 +322,11 @@ In either case, the result is a Property object, and the actual value may be accessed (in this example) by using the "getLength()" accessor.

+ Refinement +

+ Refined FO Tree +

diff --git a/src/documentation/content/xdocs/design/renderers.xml b/src/documentation/content/xdocs/design/renderers.xml index 6de480243..bc7d2ded7 100644 --- a/src/documentation/content/xdocs/design/renderers.xml +++ b/src/documentation/content/xdocs/design/renderers.xml @@ -10,6 +10,13 @@ +

+ Renderers are Responsible +

Each renderer is totally responsible for its output format.

+ Send Output to a Stream +

Introduction

A renderer is primarily designed to convert a given area tree into the output