You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

breakpos.xml 16KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315
  1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
  2. <!--
  3. Licensed to the Apache Software Foundation (ASF) under one or more
  4. contributor license agreements. See the NOTICE file distributed with
  5. this work for additional information regarding copyright ownership.
  6. The ASF licenses this file to You under the Apache License, Version 2.0
  7. (the "License"); you may not use this file except in compliance with
  8. the License. You may obtain a copy of the License at
  9. http://www.apache.org/licenses/LICENSE-2.0
  10. Unless required by applicable law or agreed to in writing, software
  11. distributed under the License is distributed on an "AS IS" BASIS,
  12. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. See the License for the specific language governing permissions and
  14. limitations under the License.
  15. -->
  16. <!-- $Id$ -->
  17. <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.3//EN" "http://forrest.apache.org/dtd/document-v13.dtd">
  18. <document>
  19. <header>
  20. <title>Apache™ FOP Design: Layout Managers</title>
  21. <subtitle>Break Possibility Proposal</subtitle>
  22. <version>$Revision$</version>
  23. <authors>
  24. <person name="Karen Lease" email="klease@club-internet.fr"/>
  25. </authors>
  26. </header>
  27. <body>
  28. <section id="intro">
  29. <title>Introduction</title>
  30. <p>
  31. As explained in <link href="layout.html">Layout</link>,
  32. the hierarchy of Layout Managers is responsible for building and placing
  33. areas. Each Layout Manager is responsible for creating and filling
  34. areas of a particular type, either inline or block. This document
  35. explains one potential algorithm for this process. It is based on the
  36. the generation of <em>break possibilities</em> (BP for short). The
  37. Layout Managers (LM for short), will generate one or more BP and
  38. choose the best one. The BP is then used to generate the corresponding
  39. areas.
  40. </p>
  41. </section>
  42. <section>
  43. <title>Anatomy of a Break Possibility</title>
  44. <p>A break possibility is represented by the BreakPoss class. A
  45. BreakPoss contains size information in the stacking direction and in
  46. the
  47. non-stacking direction (at least for inline areas, it must have both). Flags
  48. indicating various conditions (ISFIRST, ISLAST, CAN_BREAK_AFTER,
  49. FORCE_BREAK_AFTER, ANCHORS etc). A BreakPoss contains a reference to
  50. the top-level LayoutManager which generated it.
  51. </p>
  52. <p>A BreakPoss contains an object implementing
  53. the BreakPoss.Position interface. This object is specific to the layout
  54. manager which created the BreakPoss. It should indicate where the
  55. break occurs and allow the LM to
  56. create an area corresponding to the BP. A higher level LM Position
  57. must somehow reference or wrap the Position returned by its child LM in its
  58. BreakPoss object. The layout manager modifies the flags and dimension
  59. information in the BP to reflect its own requirements. For example an
  60. inline FO layout manager might add space-start, space-end, border and
  61. padding values to the stacking or non-stacking dimensions. It might also
  62. modify the flags based its on keep properties.</p>
  63. </section>
  64. <section>
  65. <title>Turning Break Possibilities into Areas</title>
  66. <p>Once break possibilities have been generated, the galley-level
  67. layout manager selects the best one
  68. and passes it back to the LayoutManager which generated it to create
  69. the area. A LayoutManager is responsible for
  70. storing enough information in its Position objects to be able to
  71. create the corresponding areas.</p>
  72. </section>
  73. <section>
  74. <title>A walk-through</title>
  75. <p>Layout Managers are created from the top down. First the
  76. page sequence creates a PageLM and a FlowLM. The PageLM will manage
  77. finding the right page model (with help from the PageSequenceMaster)
  78. and managing the balancing act between before-floats, footnotes and
  79. the normal text flow. The FlowLM will
  80. manage the normal content in the main flow. We can think of it as a
  81. <em>galley</em> manager.
  82. </p>
  83. <p>In general, each LM asks its child LMs to return sucessive
  84. break possibilities. It passes some
  85. information to the child in a flags object and it gets back
  86. a break possibility which contains the size in
  87. the stacking direction as well as information about such things as
  88. anchors, break conditions and span conditions which can change the
  89. reference area environment. This process continues down to the lowest
  90. level of the layout manager hierarchy which corresponds to atomic
  91. inline-level FOs such as characters or graphics.
  92. </p>
  93. <p>
  94. Each layout manager will repeatedly call getNextBreakPoss on its current
  95. child LM until the child returns a BP with the ISLAST
  96. flag set. Then the layout manager moves on to its next child LM (ie,
  97. it asks the next child FO to generate a layout manager.) Galley level
  98. layout managers which are Line and Flow will return to their parent
  99. layout managers either when they have finished their content or when
  100. they encounter a a BP which will fill one of their areas.
  101. </p>
  102. <p>The break possibilities are generated from the bottom up.
  103. All inline content must first be broken into
  104. lines which are then stacked into block areas. This is done by the
  105. LineLayoutManager, which creates line areas.
  106. The LineLM asks its child LM to generate a break possibility, which
  107. represents a place where the line can end. This
  108. initially means each potential line-end (primarily spaces or forced
  109. linefeeds and a few other potential line-end characters such as hard
  110. hyphens.) The text LM returns an object which stores the size in the
  111. stacking direction as a MinOptMax triplet
  112. and a <em>cost</em>, which is based on how well this break
  113. would satisfy the constraints. The Text LM keeps track of its position in
  114. the text content and returns the total size of the text area it would
  115. create if it were to break at a given point. The returned BP
  116. object also contains information about whether the break is forced
  117. (linefeed) or whether this is the last area which can be generated by
  118. the LM (ISLAST flag). If a textFO ends on a non-break character, the
  119. ISLAST flag is set, but the CAN_BREAK_AFTER flag isn't, since we don't
  120. know if there is any following text in another inline object for
  121. example.
  122. </p>
  123. <p>Variable size content is taken into account from
  124. the bottom up. Each LM returns a range of sizes in the stacking
  125. direction, based on property values. For text, this comes from
  126. variable word-space values or letter-space values. For other inline
  127. objects, it may include variable space-start and space-end values
  128. (after calculation of the entire sequence of space specifiers at a
  129. particular break possibility.)</p>
  130. <p>The main constraint for laying out
  131. lines is the available inline-progression-dimension (IPD) for the line
  132. area to be created. This
  133. depends on the IPD of the reference area ancestor, on the indents of the
  134. containing fo:block, and on any side-floats which may be intruding on
  135. this line.</p>
  136. <note>See below <link href="#getRefIPD">Getting the Reference
  137. IPD</link>
  138. for discussion of how the reference area IPD is
  139. transmitted to the Line LM.</note>
  140. <p>For now, let's assume that only the LineLM knows about the IPD
  141. available to it. Therefore only it can make a decision about which BP
  142. is the best one; the lower level inline layout managers can only
  143. return potential break points.</p>
  144. <note>There are certainly optimizations to this model which can be
  145. examined later.</note>
  146. <p>So the Line LM will ask its child LM(s) for break possibilities until
  147. it gets back a BP whose stacking dimension <em>could</em> fill the
  148. line. This means that the BP.stackdim.max &gt;= LineIPD.min. It can look
  149. for further BP, perhaps one whose stackdim.opt is closer to the
  150. LineIPD.opt. If it isn't happy with the choice of break possibilities,
  151. it can go past the end of the line to the next one, and then try to
  152. find a hyphenation point between the last one which fits and the first
  153. one which doesn't. If no possibility is found whose min/max values
  154. enclose the available IPD, some constraint will be violated (and
  155. reported in the log.) The actual strategy is up to the Line LM and
  156. should be able to be easily replaced without changing the architecture
  157. (Strategy pattern).
  158. </p>
  159. <p>The definition of a good break possibility depends on the
  160. properties at the block and inline level which govern things such as
  161. wrapping behavior and justification mode. For example, if lines are
  162. not to be wrapped, only an explicit linefeed can serve as a BP. If
  163. lines are wrapped but not justified then there is no requirement to
  164. completely fill the IPD on each line, but a sophisticated layout
  165. manager will try to achieve "aesthetic rag".
  166. </p>
  167. <p>Note that no areas have actually been created yet. Once the LineLM
  168. has found a potential break point for the inline content, it can
  169. calculate the total size of the line area which would be created. The
  170. size in the IPD is determined by the Line LM based on the chosen BP.
  171. The size of the line area in the the block-progression-dimension
  172. depends on the size of the text (or other inline content). These
  173. values are set by the inline-level LM
  174. in their returned BP (in terms of ascender and descender heights with
  175. respect to the baseline). The LineLM adds spacing implied by the
  176. current line-stacking strategy and line-height property values. It
  177. stores a reference to the chosen inline BP and "wraps" that in its own
  178. Position object which it stores in the BP it returns to its parent LM
  179. (the block layout manager).
  180. </p><p>The block LM now has a potential break position after its
  181. first line. It assigns that possibility a cost, based on widow, orphan
  182. and keep properties. It can also calculate the total size of the block
  183. area it would create, were it to end the area after this line. It does
  184. this by adding any padding and border (taking into account
  185. conditionality). It also calculates space-before and space-after
  186. values, or contributes to building up a sequence of such values.
  187. With this information, the block LM creates a new BP (or
  188. updates the existing one). It stores a Position object in this
  189. BP which wraps the returned BP from its child Line LM.
  190. It returns the new BP to its parent and so on, back up to the
  191. FlowLM.</p>
  192. <p>Obviously there is more complicated logic involved when dealing
  193. with lists and tables. These cases need to be walked through in detail.</p>
  194. <p>The FlowLM sees if the returned stacking dimension will still
  195. fit in its available block-progression-dimension (BPD). It repeatedly calls
  196. getNextBreakPoss on its
  197. child LMs until it reaches the maximum BPD for the flow reference area
  198. or until there is no more content to lay out. If one child LM is
  199. finished, it moves on to the next until the last child LM has returned
  200. a BP with the ISLAST flag set. If any child LM returns a
  201. BP with a FORCE_BREAK_BEFORE or SPAN flag set, the FlowLM will
  202. force layout of any pending break possibilities and return to its
  203. parent (the PageLM) in order to handle the break or span condition.</p>
  204. <p>If the returned BP has any new before-float or footnote anchors in
  205. it (ANCHOR flag in the
  206. BP), the FlowLM will also return to the PageLM. The PageLM must then
  207. try to find space to place the floats, possibly asking the FlowLM for
  208. help if the body contains multiple columns.</p>
  209. </section>
  210. <section>
  211. <title>Some issues</title>
  212. <p>Following are a few remarks on specific issues.</p>
  213. <section>
  214. <title>Where Line Layout Managers are created</title>
  215. <p>If the first child FO in a block FO is an inline-level FO
  216. such as text, the block LM creates an intermediate level LineLM
  217. to layout the
  218. sequence of inline content into Lines. Note that the whole sequence of
  219. inline FOs is managed by a single instance of LineLM. The LineLM
  220. becomes the parent to the various inline-level LM created by each
  221. individual inline FO.
  222. Since an fo:block can have both block and inline content, its LM
  223. may create a sequence of intermixed BlockLM and LineLM.</p>
  224. </section>
  225. <section id="getRefIPD">
  226. <title>Getting the reference IPD</title>
  227. <p>When the layout process starts, with the FlowLM asking its first
  228. child LM for a break possibility, the IPD isn't known, since we don't
  229. know whether
  230. the first FO might be spanning, or on which page it might start. (Of
  231. course, if all page masters in the sequence have the same region-body IPD
  232. and all have only a single column, the IPD will never change
  233. and could already be calculated before starting layout.)
  234. The FlowLM gets its
  235. first child LM and calls its getNextBreakPoss method. That is a child LM for
  236. some block-level FO. For now, suppose it's an fo:block. The BlockLM
  237. will create its first child LM, which may be another block-level LM in
  238. the case of nested blocks or a LineLM as explained above. (Question:
  239. do we need a START flag for layout status?)
  240. </p>
  241. <p>We keep calling getNextBreakPoss on lower level layout managers until we
  242. get down to the inline level or to a level which cannot have break-before
  243. properties, such as a list-item-label. At that point, we assume we are
  244. going to have to layout some actual content. But we can't do that yet
  245. since we don't know the inline-progression-dimension. So we return a
  246. BP object which has 0 size in the stacking dimension, but which
  247. has flags set to signal to
  248. higher-level layout managers what needs to be done. If it has a break-before
  249. property or a span property, it stores these in the BP. If
  250. no reference IPD is yet defined, it sets a flag to get that. It then
  251. returns to its parent. The parent LM will inspect the BP object
  252. returned. In general, it "wraps" it with information about its own
  253. needs. If the returned BP is not actually returning any potential
  254. areas, the LM can still add information about its own break or span
  255. requirements. This return path continues back up to the PageLM. It
  256. will then check break and span requirements and create a new page
  257. if necessary using the appropriate page-master. At that point, the
  258. reference IPD for the main
  259. flow is known and is set in the flags object used for
  260. the next getNextBreakPoss call to the lower level LM.
  261. </p><p>Using this information, the BlockLM parent can now calculate
  262. the available IPD for its LineLM child, based on its indents.
  263. (If there are any
  264. side-floats information about the intrusion must be passed down by the
  265. FlowLM to lower level managers.) The LineLM can now generate a series
  266. of BreakPoss objects, which it passes back to its parent LM.
  267. </p>
  268. </section>
  269. <section>
  270. <title>Hyphenation</title>
  271. <p>
  272. The LineLM is responsible for initiating hyphenation if it is allowed
  273. by the properties and if no satisfactory BP can be found without
  274. hyphenating. The hyphenation manager is passed two break
  275. possibilities, one whose IPD is less than the desired line area IPD
  276. and one whose IPD is greater. These break possibilities might have
  277. been generated by different inline-level layout managers (text + a
  278. wrapper with a color change for example), though
  279. frequently they represent two positions in a single text run.
  280. If hyphenation is successful, a new BP is
  281. returned. The LineLM may look for several intermediate BP
  282. based on the "cost" of the returned possibilities. If no intermediate
  283. BP is found, the line will be "short", the white-space stretch will be
  284. exceeded, or perhaps the content will be overflowed or clipped,
  285. depending on various property settings.</p>
  286. </section>
  287. <section>
  288. <title>Optimizing</title>
  289. <p>It obviously seems inefficient to go down to the lowest level
  290. LM and back up to the FlowLM for every possible line-break
  291. decision. It seems like it would be possible to optimize by letting
  292. the lower level layout managers run until they had exceeded the
  293. current limit in
  294. the stacking direction. They would then return control to the "galley"
  295. level (LineLM or FlowLM) which would fine-tune the break decision by
  296. asking the lower level LM to find a previous BP which would fit. At
  297. the inline level, this means hyphenation as described above.</p>
  298. <p>Another interesting question is at what point pending break
  299. possibilities can be turned into areas.The idea is to wait until we
  300. are sure we won't have to redo the breaking. This depends on the
  301. sophistication of the layout strategy. For example, if a
  302. linebreak can be considered final if the line is full and there are no
  303. anchors on the line, we could create the LineArea at that point. But
  304. if we are willing to change a previous line-end decision to get a
  305. better overall composition of a whole group of lines (to prevent multiple
  306. hyphens for example), we might wait until the LineLM had finished
  307. laying out all its material and then make all the Lines at once.</p>
  308. </section>
  309. </section>
  310. </body>
  311. </document>