diff options
author | Vincent Hennebert <vhennebert@apache.org> | 2008-10-22 11:49:30 +0000 |
---|---|---|
committer | Vincent Hennebert <vhennebert@apache.org> | 2008-10-22 11:49:30 +0000 |
commit | 6fd33a0ca40a436f66a03fe0a05bec465e20d3fc (patch) | |
tree | 72166a0c4f0faa8073b225c315acd581a21312c4 | |
parent | e847663d9605155c44ff2a923588d77f05cc73c9 (diff) | |
download | xmlgraphics-fop-6fd33a0ca40a436f66a03fe0a05bec465e20d3fc.tar.gz xmlgraphics-fop-6fd33a0ca40a436f66a03fe0a05bec465e20d3fc.zip |
Completed section about breaking tables over pages
git-svn-id: https://svn.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Interleaved_Page_Line_Breaking@707044 13f79535-47bb-0310-9956-ffa450edef68
-rw-r--r-- | prototype/doc/src/docbook/prototype.xml | 218 |
1 files changed, 183 insertions, 35 deletions
diff --git a/prototype/doc/src/docbook/prototype.xml b/prototype/doc/src/docbook/prototype.xml index 62e77045f..ba578f6bb 100644 --- a/prototype/doc/src/docbook/prototype.xml +++ b/prototype/doc/src/docbook/prototype.xml @@ -71,7 +71,7 @@ her face. box w=12</literallayout> <titleabbrev>Same paragraph on slightly wider page.</titleabbrev> <literallayout class="monospaced">In olden times when wishing still helped one, there lived a king -whose daughters were all beautiful, +whose daughters were all beautiful, but the youngest was so beautiful that the sun itself, which has seen so much, was astonished whenever @@ -106,7 +106,7 @@ it shone in her face.</literallayout> <itemizedlist> <listitem> <para>how to deal with paragraphs? We need to know the width(s) of the page(s) they will - by typesetted on before breaking them. That means that the list of boxes and penalties + be typeset on before breaking them. That means that the list of boxes and penalties representing the lines like we saw above cannot be determined before we start page breaking. But if it’s not available, how to determine if the bottom of a page will be reached while we are inside a paragraph, and if we must consider to continue it on the @@ -180,16 +180,10 @@ it shone in her face.</literallayout> <section> <title>Beyond the Box/Glue/Penalty Model?</title> <para>As said above the box/glue/penalty model shows its limits when it comes to implementing - tables. Another representation may be necessary to deal with the certain degree of - desynchronization that can happen between columns. Even if we consider that the lists of - elements for the columns’ contents are available, there is no accurate way to merge them - into one list: either the resulting elements will fail to represent all the flexibility - offered by the column layouts; or the demerits won’t match the sum of the individual column - demerits. Moreover, it’s not possible to represent the column desynchronization by using - only one list of merged elements (see the <ulink - url="http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnownProblems/">wiki page</ulink> - about known issues in the current approach). Several lists must be computed depending on the - breaks chosen for the preceding pages, which is cumbersome.</para> + tables. Still, this model has proved to be quite flexible and powerful. Let’s have a look at + how it behaves in tables.</para> + <section> + <title>The Spring Paradigm</title> <para>Glues can be seen as springs that can be both compressed and stretched (in the general case). Only, instead of having a <ulink url="http://en.wikipedia.org/wiki/Spring_(device)">spring constant</ulink> @@ -200,7 +194,7 @@ it shone in her face.</literallayout> was designed in a way that small displacements would be considered as almost equivalent to the natural length, while big displacements would quickly become unacceptable; <xref linkend="fig_demerits"/> shows an example where the natural length is 10, and the content - can be shrinked down to 8 and stretched up to 14.</para> + can be shrunk down to 8 and stretched up to 14.</para> <figure id="fig_demerits"> <title>Demerits, functions of the size of the part (line, page, etc.)</title> <titleabbrev>Demerits function</titleabbrev> @@ -210,13 +204,13 @@ it shone in her face.</literallayout> </imageobject> </mediaobject> </figure> - <para>However, the spring image still applies; we just have to imagine that it will be even - more difficult than usual to compress/extend the spring to its maximum. Then we can see - pages or tables as frames to which we attach springs representing the body or the columns. - In the case of a table we can imagine to start with the first column, and successively - attach a spring for every column. Depending on what the column contains we will have to - compress or extend the spring, which will have an effect on the other columns as well as the - possible text preceding the table on the page (see <xref linkend="fig_springs"/>).</para> + <para>The spring image still applies though; we just have to imagine that it will be even more + difficult than usual to compress/extend the spring to its maximum. Then we can see pages or + tables as frames to which we attach springs representing the region body or the columns. In + the case of a table we can imagine to start with the first column, and successively attach a + spring for every column. Depending on what the column contains we will have to compress or + extend the spring, which will have an effect on the other columns as well as the possible + text preceding the table on the page (see <xref linkend="fig_springs"/>).</para> <figure id="fig_springs"> <title>Seeing content as springs</title> <mediaobject> @@ -227,17 +221,20 @@ it shone in her face.</literallayout> </figure> <para>By using this representation it is relatively easy to ‘feel’ what happens: the springs will adjust such that the total force applied on them and the frame will be minimal. From an - analytical point of vue the adjustment ratio for every spring will be computed such that the - sum of the demerits will be minimal. This is not easy with the demerits function as it is - used currently, but it is possible to choose another function that will behave similarly to - Knuth’s original one, yet will simplify the computation of the minimum.</para> + analytical point of view the adjustment ratio for every spring will be computed such that + the sum of the demerits will be minimal. This is not easy with the demerits function as it + is used currently, but it is possible to choose another function that will behave similarly + to Knuth’s original one, yet will simplify the computation of the minimum.</para> + </section> + <section> + <title>Challenges</title> <para>The problem is that there may be many ways to break every column, corresponding to more or less content on the first page. In other words, there may be many different springs to play with for each column, and trying every combination of them obviously leads to - combinatorial explosion. Some combinations are obviously non-sensical and should not even be + combinatorial explosion. Some combinations are obviously nonsensical and should not even be tested; for example, when there is only one line of content in the first column, three lines in the second and it’s obvious that the first column may also be filled with three lines - (see <xref linkend="fig_table_silly-combination"/>)</para> + (see <xref linkend="fig_table_silly-combination"/>).</para> <figure id="fig_table_silly-combination"> <title>An obviously wrong combination of table columns</title> <mediaobject> @@ -270,17 +267,28 @@ it shone in her face.</literallayout> </imageobject> </mediaobject> </figure> - <para>All that said, and back to the title of this section, maybe that the box/glue/penalty - model simply needs to be dropped, and that another model can be found that will be able to - represent tables in an elegant way and allow to solve the problem easily. Needless to say - that such a model has yet to be discovered…</para> + <para>Another problem is to create a list of elements that will represent all the columns. In + other words, replace a set of springs with a single one that will behave the same way. + That’s probably not possible: either the resulting spring will fail to represent all the + flexibility offered by the individual springs; or the demerits won’t match the sum of the + individual column demerits. On top of that there is the desynchronization that can happen + inside a row (see the <ulink + url="http://wiki.apache.org/xmlgraphics-fop/TableLayout/KnownProblems/">wiki page</ulink> + about known issues in the current approach). Several lists will probably have to be computed + depending on the breaks chosen for the preceding pages.</para> + <para>So maybe that the box/glue/penalty model simply needs to be dropped, and that another + model can be found that will be able to represent tables in an elegant way and allow to + solve the problem easily. Needless to say that such a model has yet to be discovered…</para> + </section> </section> <section> - <title>Breaking Paragraphs Before Merging Columns</title> + <title>Breaking Paragraphs and Merging Columns</title> + <section> + <title>Multiple Layouts for a Paragraph</title> <para>With the new approach it is possible to take into account several ways of typesetting a paragraph, differing by the number of lines that it occupies. For example, the optimal way will be five lines, but there are also a four-line possibility and a six-line possibility. - In the four-line version spaces are likely to be shrinked a lot, while on the six-line they + In the four-line version spaces are likely to be shrunk a lot, while on the six-line they will be much stretched. While those possibilities look less good than the five-line version, they may be preferred in the final layout because they would allow to avoid a widow or orphan line somewhere else, leading to less global demerits; or even lead to a feasible @@ -301,7 +309,7 @@ it shone in her face.</literallayout> an unfinished page (there is not enough elasticity to stretch the content up to the end of the page); maybe the six-line version makes the content perfectly fit in the page (no shrinking or stretching). Then it would be chosen over the five-line version, because the - absence of stretching at the page level would compensate for the sligthly sub-optimal layout + absence of stretching at the page level would compensate for the slightly sub-optimal layout of that paragraph.</para> <figure id="fig_paragraph_different-numbers-of-lines"> <title>Three different paragraph layouts to consider</title> @@ -323,12 +331,152 @@ it shone in her face.</literallayout> 5 4 1.5+0−1 1 - ──────── ────────── + ──────── ────────── Total: 11+1−1 11.5+1−2</literallayout> <para>The two possibilities are overlapping, and would both fit in a page with a height of 12; how to order them? By the total optimum value? By the minimum value? This may be a problem for the merging algorithm.</para> - <para><emphasis>To be continued…</emphasis></para> + <para>Maybe there’s a possibility to find a merging process for which this wouldn’t be a + problem, but a simpler way to go is to simply disable this feature when we are inside + tables, and always work with the optimal paragraph layout. Along with restriction 1 of + Knuth’s algorithm (the value <inlineequation><mathphrase>total length − total + shrink</mathphrase></inlineequation> increases when iterating over the elements) this + gives a total ordering of the layouts. Then it’s probably possible to re-use the merging + method currently used in FOP.</para> + </section> + <section> + <title>Adapting the Current Merging Algorithm</title> + <para>As a short and simplified reminder, the merging algorithm works the following way: + <orderedlist> + <listitem> + <para>Compute the smallest step for every column (the length of the content before the + first legal break).</para> + </listitem> + <listitem> + <para>Select the maximum of those steps (it has been agreed that every column should + contain some content before a break).</para> + </listitem> + <listitem> + <para>In the other columns, progress along the legal breaks to get as near as this + maximum step without overtaking it. That gives the first legal break in the + row.</para> + </listitem> + <listitem> + <para>From there on, go to the smallest next legal break.</para> + </listitem> + <listitem> + <para>Select every column that has such a break; they make part of the next legal break + inside the table.</para> + </listitem> + <listitem> + <para>Go on like this, until the end of the row is reached…</para> + </listitem> + </orderedlist> + </para> + <para>Again, this approach works because the elements for the paragraphs are known in advance. + Plus it doesn’t take elastic spaces into account; while this might look like a reasonable + trade-off for tables, the same algorithm is also applied to lists where it becomes an + annoying limitation.</para> + <para>When introducing elastic spaces, the legal breaks are no longer at a given length but + may cover an interval. Then a layout from one column may be compatible with several layouts + from another column. The difference will be in the amount of content after the break, which + means that the row may end at different distances from the top of the following page (<xref + linkend="fig_table_distances-from-top"/>). To respect the dynamic programming approach, + all of the layouts must be kept, because the layout that will be part of the optimal + solution may not be the optimal one at the table level (the optimal layout, in the + <emphasis>table context</emphasis>, may not be part of the optimal solution, in the + <emphasis>whole document context</emphasis>).</para> + <figure id="fig_table_distances-from-top"> + <title>Combining one layout on the first column with two different layouts on the second + one, making the table end at different positions</title> + <titleabbrev>Multiple column combinations leading to multiple table layouts</titleabbrev> + <mediaobject> + <imageobject> + <imagedata fileref="table_distances-from-top"/> + </imageobject> + </mediaobject> + </figure> + <para>So two different sets will be available: layouts that will be said + <emphasis>compatible</emphasis>; that is, it’s possible to play with their elastic spaces so + that all of the columns reach the bottom of the page before the break; and the other ones, + for which at least one column is <quote>underfull</quote>: either it doesn’t provide + stretchability and its height is inferior to the height of the row, or it doesn’t have + enough of stretchability to reach the bottom of the page. Compatible layouts are perfect; + incompatible ones are ok, provided that this is not a situation like in <xref + linkend="fig_table_silly-combination"/>, where a better alternative is available. Let’s + call those latter <emphasis>unacceptable</emphasis>. The whole challenge is to make the + difference between layouts that are incompatible but acceptable, and those that are both + incompatible <emphasis>and</emphasis> unacceptable. Open question: if compatible layouts are + available, should incompatible ones even be considered?</para> + <para>Since we want to keep only the best layout for a given paragraph, the column combination + process can’t be started before the ends of the column contents are reached. Indeed, when we + are in the middle of the paragraph, some of the currently active layouts may be specific to + a non-optimal version of the paragraph. If they are selected as parts of a certain column + combination, and then removed because they aren’t part of the optimal paragraph layout, that + will invalidate the corresponding column combination.</para> + <para>So we would wait that the paragraphs are typeset before merging the columns. In fact, + the column contents would be typeset separately up to the end, and then the merging process + would be started. The process would be similar to the usual page-breaking mechanism: every + column would be considered as a page from the point of view of the column content. Only, in + the general case the first <quote>page</quote> would not have a set height, because of the + content preceding the table that may have elastic spaces (see the springs from <xref + linkend="fig_springs"/>). This is not really a problem, though, simply a generalization of + the test: check that two intervals intersect instead of that a point belongs to an interval + (actually a point may be seen as a singleton interval). Note that the point of view taken + here is the opposite of the usual one: instead of testing if a layout fits in a page, i.e., + that it can be shrunk or stretched enough so that its actual dimension matches the page + height, we check that the page height belongs to the interval [min, max] formed by the + layout. In the end, this is the same test. In the generalized case the <quote>page</quote> + height would be an interval instead of a single point.</para> + <para>Another modification is that underfull layouts must be considered: indeed, a column is + allowed to be underfull if there’s another one that fills the page. The only thing to pay + attention to is to not build a column combination out of underfull layouts only: at least + one of them must be complete. Underfull layouts will therefore have to be properly + flagged.</para> + <para>A potential issue with that approach is that too much work may be performed. In general + several layouts will be available for every column, and it’s possible that some layouts will + not be parts of any final combination. For example, if paragraphs in the first column have + an orphan setting of 1, and 2 in all other columns. See <xref + linkend="fig_table-unused-column-layout"/>: using the first possible layout for column 1 + will lead to an unacceptable combination.</para> + <figure id="fig_table-unused-column-layout"> + <title>One of the layouts in the first column will never be used in any final + combination</title> + <titleabbrev>Unused column layouts</titleabbrev> + <mediaobject> + <imageobject> + <imagedata fileref="table_unused-column-layout"/> + </imageobject> + </mediaobject> + </figure> + <para>So it may be better to start the combining process as soon as possible, but as seen + above paragraphs must be typeset first. Or maybe not? The problem was to define an ordering + on the layouts. When a page break can occur in the middle of a paragraph, is it really a + problem if that paragraph is not finished yet? As long as the previous paragraphs have their + optimal layouts, the total order should be available. All that can happen inside the + paragraph is that there may be a few lines more or less on the page; that means a few more + boxes added to the set of layouts that’s available at the beginning of the paragraph. If + that set was totally ordered, it should remain so when adding the boxes corresponding to the + lines of the current paragraph. That said, it’s not totally clear what happens when not all + the lines have the same height. Let’s imagine that a paragraph contains two inline images + whose heights are 1.5 times the normal line height. Let’s assume there’s one layout in which + both images are on the same line; and another one where spaces are slightly more shrunk so + that the first image can be placed on the preceding line (<xref + linkend="fig_paragraph-with-images"/>). The first layout will be 4.5 units high while the + second one will be 5 units high. Maybe that won’t be a problem, but…</para> + <figure id="fig_paragraph-with-images"> + <title>Layouts of a paragraph containing two inline images</title> + <mediaobject> + <imageobject> + <imagedata fileref="paragraph-with-images"/> + </imageobject> + </mediaobject> + </figure> + <para>Another thing to keep in mind that may complicate things is that not all columns may end + on the same page; some with fewer content may end on earlier pages. They must be considered + in the combination, but they won’t participate in the computation of the resulting + min-opt-max.</para> + </section> </section> </section> |