|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331 |
- <?xml version="1.0" encoding="UTF-8"?>
- <!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
- -->
- <!-- $Id$ -->
- <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
- <document>
- <header>
- <title>Apache™ FOP: Intermediate Format</title>
- <version>$Revision$</version>
- </header>
- <body>
- <note>
- Please note that the intermediate formats described here are
- <strong>advanced features</strong> and can be ignored by most users of Apache FOP.
- </note>
- <section id="introduction">
- <title>Introduction</title>
- <p>
- Apache™ FOP now provides two different so-called intermediate formats. The first one
- (let's call it the area tree XML format) is basically a 1:1 XML representation of FOP's
- area tree as generated by the layout engine. The area tree is conceptually defined in the
- <a href="http://www.w3.org/TR/2001/REC-xsl-20011015/slice1.html#section-N742-Formatting">XSL-FO specification in chapter 1.1.2</a>.
- Even though the area tree is mentioned in the XSL-FO specification, this part is not
- standardized. Therefore, the area tree XML format is a FOP-proprietary XML file format.
- The area tree XML can be generated through the area tree XML Renderer (the XMLRenderer).
- </p>
- <p>
- The second intermediate format (which we shall name exactly like this: the intermediate
- format)
- is a recent addition which tries to meet a slightly different set of goals. It is highly
- optimized for speed.
- </p>
- <p>
- The intermediate format can be used to generate intermediate documents that are modified
- before they are finally rendered to their ultimate output format. Modifications include
- adjusting and changing trait values, adding or modifying area objects, inserting prefabricated
- pages, overlays, imposition (n-up, rotation, scaling etc.). Multiple IF files can be combined
- to a single output file.
- </p>
- </section>
- <section id="which-if">
- <title>Which Intermediate Format to choose?</title>
- <p>
- Both formats have their use cases, so the choice you will make will depend on your
- particular situation. Here is a list of strengths and use cases for both formats:
- </p>
- <section id="strengths-at">
- <title>Area Tree XML (AT XML)</title>
- <ul>
- <li>1:1 representation of FOP's area tree in XML.</li>
- <li>Contains more structure information than the new intermediate format.</li>
- <li>Used in FOP's layout engine test suite for regression testing.</li>
- </ul>
- </section>
- <section id="strengths-if">
- <title>Intermediate Format (IF)</title>
- <ul>
- <li>Highly optimized for speed.</li>
- <li>Smaller XML files.</li>
- <li>Easier to post-process.</li>
- <li>XML Schema is available.</li>
- <li>
- Recommended for use cases where documents are formatted concurrently and later
- concatenated to a single print job.
- </li>
- </ul>
- </section>
- <p>
- More technical information about the two formats can be found on the
- <a href="http://wiki.apache.org/xmlgraphics-fop/AreaTreeIntermediateXml/NewDesign">FOP Wiki</a>.
- </p>
- </section>
- <section id="architecture">
- <title>Architectural Overview</title>
- <figure src="images/if-architecture-overview.png"
- alt="Diagram with an architectural overview over the intermediate formats"/>
- </section>
- <section id="usage">
- <title>Usage of the Area Tree XML format (AT XML)</title>
- <p>
- As already mentioned, the area tree XML format is generated by using the
- <strong>XMLRenderer</strong> (MIME type: <strong>application/X-fop-areatree</strong>).
- So, you basically set the right MIME type for the output format and process your FO files
- as if you would create a PDF file.
- </p>
- <p>
- However, there is an important detail to consider: The
- various Renderers don't all use the same font sources. To be able to create the right
- area tree for the ultimate output format, you need to create the area tree XML file using
- the right font setup. This is achieved by telling the XMLRenderer to mimic another
- renderer. This is done by calling the XMLRenderer's mimicRenderer() method with an
- instance of the ultimate target renderer as the single parameter. This has a consequence:
- An area tree XML file rendered with the Java2DRenderer may not look as expected when it
- was actually generated for the PDF renderer. For renderers that use the same font setup,
- this restriction does not apply (PDF and PS, for example). Generating the area tree XML
- format file is the first step.
- </p>
- <p>
- The second step is to reparse the file using the <strong>AreaTreeParser</strong> which is
- found in the org.apache.fop.area package. The pages retrieved from the area tree XML file
- are added to an AreaTreeModel instance from where they are normally rendered using one of
- the available Renderer implementations. You can find examples for the area tree XML
- processing in the
- <a href="http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/examples/embedding/java/embedding/intermediate/"><code>examples/embedding</code></a>
- directory in the FOP distribution.
- </p>
- <p>
- The basic pattern to parse the area tree XML format looks like this:
- </p>
- <source><![CDATA[
- FopFactory fopFactory = FopFactory.newInstance();
-
- // Setup output
- OutputStream out = new java.io.FileOutputStream(pdffile);
- out = new java.io.BufferedOutputStream(out);
- try {
- //Setup fonts and user agent
- FontInfo fontInfo = new FontInfo();
- FOUserAgent userAgent = fopFactory.newFOUserAgent();
-
- //Construct the AreaTreeModel that will received the individual pages
- AreaTreeModel treeModel = new RenderPagesModel(userAgent,
- MimeConstants.MIME_PDF, fontInfo, out);
-
- //Parse the area tree file into the area tree
- AreaTreeParser parser = new AreaTreeParser();
- Source src = new StreamSource(myIFFile);
- parser.parse(src, treeModel, userAgent);
-
- //Signal the end of the processing. The renderer can finalize the target document.
- treeModel.endDocument();
- } finally {
- out.close();
- }]]></source>
- <p>
- This example simply reads an area tree file and renders it to a PDF file. Please note, that in normal
- FOP operation you're shielded from having to instantiate the FontInfo object yourself. This
- is normally a task of the AreaTreeHandler which is not present in this scenario. The same
- applies to the AreaTreeModel instance, in this case an instance of a subclass called
- RenderPagesModel. RenderPagesModel is ideal in this case as it has very little overhead
- processing the individual pages. An important line in the example is the call to
- <code>endDocument()</code> on the AreaTreeModel. This lets the Renderer know that the processing
- is now finished.
- </p>
- <p>
- The area tree XML format can also be used from the <a href="running.html#standalone-start">command-line</a>
- by using the "-atin" parameter for specifying the area tree XML as input file. You can also
- specify a "mimic renderer" by inserting a MIME type between "-at" and the output file.
- </p>
- <section id="concat">
- <title>Concatenating Documents</title>
- <p>
- This initial example is obviously not very useful. It would be faster to create the PDF file
- directly. As the <a href="http://svn.apache.org/repos/asf/xmlgraphics/fop/trunk/examples/embedding/java/embedding/atxml/ExampleConcat.java">ExampleConcat.java</a>
- example shows you can easily parse multiple area tree files in a row and add the parsed pages to the
- same AreaTreeModel instance which essentially concatenates all the input document to one single
- output document.
- </p>
- </section>
- <section id="modifying">
- <title>Modifying Documents</title>
- <p>
- One of the most important use cases for this format is obviously modifying the area
- tree XML before finally rendering it to the target format. You can easily use XSLT to process
- the AT XML file according to your needs. Please note, that we will currently not formally describe
- the area tree XML format. You need to have a good understanding its structure so you don't
- create any non-parseable files. We may add an XML Schema and more detailed documentation at a
- later time. You're invited to help us with that.
- </p>
- <note>
- The area tree XML format is sensitive to changes in whitespace. If you're not careful,
- the modified file may not render correctly.
- </note>
- </section>
- <section id="advanced">
- <title>Advanced Use</title>
- <p>
- The generation of the area tree format as well as it parsing process has been designed to allow
- for maximum flexibility and optimization. Please note that you can call <code>setTransformerHandler()</code> on
- XMLRenderer to give the XMLRenderer your own TransformerHandler instance in case you would like to
- do custom serialization (to a W3C DOM, for example) and/or to directly modify the area tree using
- XSLT. The AreaTreeParser on the other side allows you to retrieve a ContentHandler instance where
- you can manually send SAX events to to start the parsing process (see <code>getContentHandler()</code>).
- </p>
- </section>
- </section>
- <section id="usage-if">
- <title>Usage of the Intermediate Format (IF)</title>
- <p>
- The Intermediate Format (IF) is generated by the <strong>IFSerializer</strong>
- (MIME type: <strong>application/X-fop-intermediate-format</strong>).
- So, you basically set the right MIME type for the output format and process your FO files
- as if you would create a PDF file.
- </p>
- <p>
- The IFSerializer is an implementation of the <strong>IFDocumentHandler</strong> and
- <strong>IFPainter</strong> interfaces. The <strong>IFRenderer</strong> class is responsible
- for converting FOP's area tree into calls against these two interfaces.
- </p>
- <ul>
- <li>
- IFDocumentHandler: This interface is used on the document-level and defines the
- overall structure of the Intermediate Format.
- </li>
- <li>
- IFPainter: This interface is used to generate graphical page content like text, images
- and borders.
- </li>
- </ul>
- <p>
- As with the AT XML, there is an important detail to consider: The various output
- implementations don't all use the same font sources. To be able
- to create the right IF for the ultimate output file, you need to create the IF file using
- the right font setup. This is achieved by telling the IFRenderer (responsible for
- converting the area tree into calls to the IFDocumentHandler and IFPainter interfaces)
- to mimic another renderer. This is done by calling the IFSerializer's
- mimicDocumentHandler() method with an instance of the ultimate target document handler
- as the single parameter. This has a consequence: An IF file rendered with the
- Java2DDocumentHandler may not look as expected when it was actually generated for the PDF
- implementation. For implementations that use the same font setup,
- this restriction does not apply (PDF and PS, for example). Generating the Intermediate
- Format file is the first step.
- </p>
- <p>
- The second step is to reparse the file using the <strong>IFParser</strong> which is
- found in the org.apache.fop.render.intermediate package. The IFParser simply takes an
- IFDocumentHandler instance against which it generates the appropriate calls. The IFParser
- is implemented as a SAX ContentHandler so you're free to choose the method for
- post-processing the IF file(s). You can use XSLT or write SAX- or DOM-based code to
- manipulate the contents. You can find examples for the Intermediate Format
- processing in the
- <a href="http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/examples/embedding/java/embedding/intermediate/"><code>examples/embedding</code></a>
- directory in the FOP distribution.
- </p>
- <p>
- The basic pattern to parse the intermediate format looks like this:
- </p>
- <source><![CDATA[
- FopFactory fopFactory = FopFactory.newInstance();
-
- // Setup output
- OutputStream out = new java.io.FileOutputStream(pdffile);
- out = new java.io.BufferedOutputStream(out);
- try {
- //Setup user agent
- FOUserAgent userAgent = fopFactory.newFOUserAgent();
-
- //Create IFDocumentHandler instance
- IFDocumentHandler targetHandler;
- String mime = MimeConstants.MIME_PDF;
- targetHandler = fopFactory.getRendererFactory().createDocumentHandler(
- userAgent, mime);
-
- //Setup fonts
- IFUtil.setupFonts(targetHandler);
-
- //Tell the target handler where to write the PDF to
- targetHandler.setResult(new StreamResult(pdffile));
-
- //Parse the IF file
- IFParser parser = new IFParser();
- Source src = new StreamSource(myIFFile);
- parser.parse(src, targetHandler, userAgent);
-
- } finally {
- out.close();
- }]]></source>
- <p>
- This example simply reads an intermediate file and renders it to a PDF file. Here
- IFParser.parse() is used, but you can also just get a SAX ContentHandler by using the
- IFParser.getContentHandler() method.
- </p>
- <section id="concat-if">
- <title>Concatenating Documents</title>
- <p>
- This initial example is obviously not very useful. It would be faster to create the PDF file
- directly (without the intermediate step). As the
- <a href="http://svn.apache.org/repos/asf/xmlgraphics/fop/trunk/examples/embedding/java/embedding/intermediate/ExampleConcat.java">ExampleConcat.java</a>
- example shows you can easily parse multiple intermediate files in a row and use the
- IFConcatenator class to concatenate page sequences from multiple source files to a single
- output file. This particular example does the concatenation on the level of the
- IFDocumentHandler interface. You could also do this in XSLT or using SAX on the XML level.
- Whatever suits your process best.
- </p>
- </section>
- <section id="modifying-if">
- <title>Modifying Documents</title>
- <p>
- One of the most important use cases for this format is obviously modifying the
- intermediate format before finally rendering it to the target format. You can easily use
- XSLT to process the IF file according to your needs.
- </p>
- <p>
- There is an XML Schema (located under
- <a href="http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/documentation/intermediate-format-ng/">src/documentation/intermediate-format-ng</a>)
- that helps you verify that your modified content is correct.
- </p>
- <p>
- For certain output formats there's a caveat: Formats like AFP and PCL do not support
- arbitrary transformations on the IF's "viewport" and "g" elements. Possible are
- only rotations in 90 degree steps and translations.
- </p>
- </section>
- <section id="advanced-if">
- <title>Advanced Use</title>
- <p>
- The generation of the intermediate format as well as it parsing process has been
- designed to allow for maximum flexibility and optimization. So rather than just passing
- in a StreamResult to IFSerializer's setResult() method, you can also use a SAXResult
- or a DOMResult. And as you've already seen , the IFParser on the other side allows you
- to retrieve a ContentHandler instance where you can manually send SAX events to
- start the parsing process (see <code>getContentHandler()</code>).
- </p>
- </section>
- </section>
- </body>
- </document>
|