aboutsummaryrefslogtreecommitdiffstats
path: root/docs/design/understanding/xml_parsing.xml
blob: a7c8d4a85891a6535be7f09b56caf0adfbe66088 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
<?xml version="1.0"?>
<document> 
  <header> 
	 <title>XML Parsing</title> 
	 <subtitle>All you wanted to know about XML Parsing !</subtitle> 
	 <authors> <person name="Keiron Liddle" email="keiron@aftexsw.com"/> 
	 </authors> 
  </header> 
  <body>
  
<s1 title="XML Parsing"><p>Since everyone knows the basics we can get
                  into the various stages starting with the XML handling.</p> 
                <s2 title="XML Input"><p>FOP can take the input XML in a number of ways:
                         </p>
        <ul>
          <li>SAX Events through SAX Handler
            <ul>
              <li>
                <code>FOTreeBuilder</code> is the SAX Handler which is
                obtained through <code>getContentHandler</code> on
                <code>Driver</code>.
              </li>
            </ul>
          </li>
          <li>
            DOM which is converted into SAX Events
            <ul>
              <li>
                The conversion of a DOM tree is done via the
                <code>render(Document)</code> method on
                <code>Driver</code>.
              </li>
            </ul>
          </li>  
          <li>
            data source which is parsed and converted into SAX Events
            <ul>
              <li>
                The <code>Driver</code> can take an
                <code>InputSource</code> as input.  This can use a
                <code>Stream</code>, <code>String</code> etc.
              </li>
            </ul>
          </li> 
          <li>
            XML+XSLT which is transformed using an XSLT Processor and
            the result is fired as SAX Events
            <ul>
              <li>
                <code>XSLTInputHandler</code> is used as an
                <code>InputSource</code> in the
                render(<code>XMLReader</code>,
                <code>InputSource</code>) method on
                <code>Driver</code>
              </li>
            </ul>
          </li>
        </ul>
                                
                  <p>The SAX Events which are fired on the SAX Handler, class
                         <code>FOTreeBuilder</code>, must represent an XSL:FO document. If not there will be an
                         error. Any problems with the XML being well formed are handled here.</p></s2> 
                <s2 title="Element Mappings"><p> The element mapping is a hashmap of all
                         the elements in a particular namespace. This makes it easy to create a
                         different object for each element. Element mappings are static to save on
                         memory. </p><p>To add an extension a developer can put in the classpath a jar
                         that contains the file <code>/META-INF/services/org.apache.fop.fo.ElementMapping</code>.
                         This must contain a line with the fully qualified name of a class that
                         implements the <em>org.apache.fop.fo.ElementMapping</em> interface. This will then be
                         loaded automatically at the start. Internal mappings are: FO, SVG and Extension
                         (pdf bookmarks)</p></s2> 
                <s2 title="Tree Building"><p>The SAX Events will fire all the information
                         for the document with start element, end element, text data etc. This
                         information is used to build up a representation of the FO document. To do this
                         for a namespace there is a set of element mappings. When an element + namepsace
                         mapping is found then it can create an object for that element. If the element
                         is not found then it creates a dummy object or a generic DOM for unknown
                         namespaces.</p> 
                  <p>The object is then setup and then given attributes for the element.
                         For the FO Tree the attributes are converted into properties. The FO objects
                         use a property list mapping to convert the attributes into a list of properties
                         for the element. For other XML, for example SVG, a DOM of the XML is
                         constructed. This DOM can then be passed through to the renderer. Other element
                         mappings can be used in different ways, for example to create elements that
                         create areas during the layout process or setup information for the renderer
                         etc.</p> 
        <p>
          While the tree building is mainly about creating the FO Tree
          there are some stages that can propagate to the renderer. At
          the end of a page sequence we know that all pages in the
          page sequence can be laid out without being effected by any
          further XML. The significance of this is that the FO Tree
          for the page sequence may be able to be disposed of.  The
          end of the XML document also tells us that we can finalise
          the output document.  (The layout of individual pages is
          accomplished by the layout managers page at a time;
          i.e. they do not need to wait for the end of the page
          sequence.  The page may not yet be complete, however,
          containing forward page number references, for example.)
        </p>
      </s2> 
                <s2 title="Associated Tasks"> 
                  <ul><li>Error handling for xml not well formed.</li> 
                         <li>Error handling for other XML parsing errors.</li><li>Developer
                                info for adding namespace handlers.</li></ul></s2></s1>   
  </body></document>