aboutsummaryrefslogtreecommitdiffstats
path: root/src/documentation/content/xdocs/1.1rc1/hyphenation.xml
diff options
context:
space:
mode:
authorGlenn Adams <gadams@apache.org>2012-07-21 03:22:26 +0000
committerGlenn Adams <gadams@apache.org>2012-07-21 03:22:26 +0000
commit56517e87d298dea311c231fc446072ea5a58cb21 (patch)
tree553ce8673b2f689a712d9eb5ca5690a0e6f9b10c /src/documentation/content/xdocs/1.1rc1/hyphenation.xml
parentadbd5aac98fc107156ee108092ab25cf502c1491 (diff)
downloadxmlgraphics-fop-56517e87d298dea311c231fc446072ea5a58cb21.tar.gz
xmlgraphics-fop-56517e87d298dea311c231fc446072ea5a58cb21.zip
Merge from ^/branches/fop-1_1.
git-svn-id: https://svn.apache.org/repos/asf/xmlgraphics/fop/trunk@1364036 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'src/documentation/content/xdocs/1.1rc1/hyphenation.xml')
-rw-r--r--src/documentation/content/xdocs/1.1rc1/hyphenation.xml237
1 files changed, 237 insertions, 0 deletions
diff --git a/src/documentation/content/xdocs/1.1rc1/hyphenation.xml b/src/documentation/content/xdocs/1.1rc1/hyphenation.xml
new file mode 100644
index 000000000..e6f666826
--- /dev/null
+++ b/src/documentation/content/xdocs/1.1rc1/hyphenation.xml
@@ -0,0 +1,237 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<!-- $Id$ -->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+<document>
+ <header>
+ <title>Apache™ FOP: Hyphenation</title>
+ <version>$Revision$</version>
+ </header>
+ <body>
+ <section id="support">
+ <title>Hyphenation Support</title>
+ <section id="intro">
+ <title>Introduction</title>
+ <p>Apache™ FOP uses Liang's hyphenation algorithm, well known from TeX. It needs
+ language specific pattern and other data for operation.</p>
+ <p>Because of <a href="#license-issues">licensing issues</a> (and for
+ convenience), all hyphenation patterns for FOP are made available through
+ the <a class="fork" href="http://offo.sourceforge.net/hyphenation/index.html">Objects For
+ Formatting Objects</a> project.</p>
+ <note>If you have made improvements to an existing Apache™ FOP hyphenation pattern,
+ or if you have created one from scratch, please consider contributing these
+ to OFFO so that they can benefit other FOP users as well.
+ Please inquire on the <a href="../maillist.html#fop-user">FOP User
+ mailing list</a>.</note>
+ </section>
+ <section id="license-issues">
+ <title>License Issues</title>
+ <p>Many of the hyphenation files distributed with TeX and its offspring are
+ licenced under the <a class="fork" href="http://www.latex-project.org/lppl.html">LaTeX
+ Project Public License (LPPL)</a>, which prevents them from being
+ distributed with Apache software. The LPPL puts restrictions on file names
+ in redistributed derived works which we feel can't guarantee. Some
+ hyphenation pattern files have other or additional restrictions, for
+ example against use for commercial purposes.</p>
+ <p>Although Apache FOP cannot redistribute hyphenation pattern files that do
+ not conform with its license scheme, that does not necessarily prevent users
+ from using such hyphenation patterns with FOP. However, it does place on
+ the user the responsibility for determining whether the user can rightly use
+ such hyphenation patterns under the hyphenation pattern license.</p>
+ <warning>The user is responsible to settle license issues for hyphenation
+ pattern files that are obtained from non-Apache sources.</warning>
+ </section>
+ <section id="sources">
+ <title>Sources of Custom Hyphenation Pattern Files</title>
+ <p>The most important source of hyphenation pattern files is the
+ <a class="fork" href="http://www.ctan.org/tex-archive/language/hyphenation/">CTAN TeX
+ Archive</a>.</p>
+ </section>
+ <section id="install">
+ <title>Installing Custom Hyphenation Patterns</title>
+ <p>To install a custom hyphenation pattern for use with FOP:</p>
+ <ol>
+ <li>Convert the TeX hyphenation pattern file to the FOP format. The FOP
+ format is an xml file conforming to the DTD found at
+ <code>{fop-dir}/hyph/hyphenation.dtd</code>.</li>
+ <li>Name this new file following this schema:
+ <code>languageCode_countryCode.xml</code>. The country code is
+ optional, and should be used only if needed. For example:
+ <ul>
+ <li><code>en_US.xml</code> would be the file name for American
+ English hyphenation patterns.</li>
+ <li><code>it.xml</code> would be the file name for Italian
+ hyphenation patterns.</li>
+ </ul>
+ The language and country codes must match the XSL-FO input, which
+ follows <a href="http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt">ISO
+ 639</a> (languages) and <a href="http://www.ics.uci.edu/pub/ietf/http/related/iso3166.txt">ISO
+ 3166</a> (countries). NOTE: The ISO 639/ISO 3166 convention is that
+ language names are written in lower case, while country codes are written
+ in upper case. FOP does not check whether the language and country specified
+ in the FO source are actually from the current standard, but it relies
+ on it being two letter strings in a few places. So you can make up your
+ own codes for custom hyphenation patterns, but they should be two
+ letter strings too (patches for proper handling extensions are welcome)</li>
+ <li>There are basically three ways to make the FOP-compatible hyphenation pattern
+ file(s) accessible to FOP:
+ <ul>
+ <li>Download the precompiled JAR from <a class="fork" href="http://offo.sourceforge.net/hyphenation/index.html">OFFO
+ </a> and place it either in the <code>{fop-dir}/lib</code> directory, or
+ in a directory of your choice (and append the full path to the JAR to
+ the environment variable <code>FOP_HYPHENATION_PATH</code>).</li>
+ <li>Download the desired FOP-compatible hyphenation pattern file(s) from
+ <a class="fork" href="http://offo.sourceforge.net/hyphenation/index.html">OFFO</a>,
+ and/or take your self created hyphenation pattern file(s),
+ <ul>
+ <li>place them in the directory <code>{fop-dir}/hyph</code>, </li>
+ <li>or place them in a directory of your choice and set the Ant variable
+ <code>user.hyph.dir</code> to point to that directory (in
+ <code>build-local.properties</code>),</li>
+ </ul>
+ and run Ant with build target
+ <code>jar-hyphenation</code>. This will create a JAR containing the
+ compiled patterns in <code>{fop-dir}/build</code> that will be added to the
+ classpath on the next run.
+ (When FOP is built from scratch, and there are pattern source file(s)
+ present in the directory pointed to by the
+ <code>user.hyph.dir</code> variable, this JAR will automatically
+ be created from the supplied pattern(s)).</li>
+ <li>Put the pattern source file(s) into a directory of your choice and
+ configure FOP to look for custom patterns in this directory, by setting the
+ <a href="configuration.html">&lt;hyphenation-base&gt;</a>
+ configuration option.</li>
+ </ul>
+ </li>
+ </ol>
+ <warning>
+ Either of these three options will ensure hyphenation is working when using
+ FOP from the command-line. If FOP is being embedded, remember to add the location(s)
+ of the hyphenation JAR(s) to the CLASSPATH (option 1 and 2) or to set the
+ <a href="configuration.html#hyphenation-dir">&lt;hyphenation-dir&gt;</a>
+ configuration option programmatically (option 3).
+ </warning>
+ </section>
+ </section>
+ <section id="patterns">
+ <title>Hyphenation Patterns</title>
+ <p>If you would like to build your own hyphenation pattern files, or modify
+ existing ones, this section will help you understand how to do so. Even
+ when creating a pattern file from scratch, it may be beneficial to start
+ with an existing file and modify it. See <a class="fork" href="http://offo.sourceforge.net/hyphenation/index.html">
+ OFFO's Hyphenation page</a> for examples.
+ Here is a brief explanation of the contents of FOP's hyphenation patterns:</p>
+ <warning>The remaining content of this section should be considered "draft"
+ quality. It was drafted from theoretical literature, and has not been
+ tested against actual FOP behavior. It may contain errors or omissions.
+ Do not rely on these instructions without testing everything stated here.
+ If you use these instructions, please provide feedback on the
+ <a href="../maillist.html#fop-user">FOP User mailing list</a>, either
+ confirming their accuracy, or raising specific problems that we can
+ address.</warning>
+ <ul>
+ <li>The root of the pattern file is the &lt;hyphenation-info&gt; element.</li>
+ <li>&lt;hyphen-char&gt;: its attribute "value" contains the character signalling
+ a hyphen in the &lt;exceptions&gt; section. It has nothing to do with the
+ hyphenation character used in FOP, use the XSLFO hyphenation-character
+ property for defining the hyphenation character there. At some points
+ a dash U+002D is hardwired in the code, so you'd better use this too
+ (patches to rectify the situation are welcome). There is no default,
+ if you declare exceptions with hyphenations, you must declare the
+ hyphen-char too.</li>
+ <li>&lt;hyphen-min&gt; contains two attributes:
+ <ul>
+ <li>before: the minimum number of characters in a word allowed to exist
+ on a line immediately preceding a hyphenated word-break.</li>
+ <li>after: the minimum number of characters in a word allowed to exist
+ on a line immediately after a hyphenated word-break.</li>
+ </ul>
+ This element is unused and not even read. It should be considered a
+ documentation for parameters used during pattern generation.
+ </li>
+ <li>&lt;classes&gt; contains whitespace-separated character sets. The members
+ of each set should be treated as equivalent for purposes of hyphenation,
+ usually upper and lower case of the same character. The first character
+ of the set is the canonical character, the patterns and exceptions
+ should only contain these canonical representation characters (except
+ digits for weight, the period (.) as word delimiter in the patterns and
+ the hyphen char in exceptions, of course).</li>
+ <li>&lt;exceptions&gt; contains whitespace-separated words, each of which
+ has either explicit hyphen characters to denote acceptable breakage
+ points, or no hyphen characters, to indicate that this word should
+ never be hyphenated, or contain explicit &lt;hyp&gt; elements for specifying
+ changes of spelling due to hyphenation (like backen -&gt; bak-ken or
+ Stoffarbe -&gt; Stoff-farbe in the old german spelling). Exceptions override
+ the patterns described below. Explicit &lt;hyp&gt; declarations don't work
+ yet (patches welcome). Exceptions are generally a bit brittle, test
+ carefully.</li>
+ <li>&lt;patterns&gt; includes whitespace-separated patterns, which are what
+ drive most hyphenation decisions. The characters in these patterns are
+ explained as follows:
+ <ul>
+ <li>non-numeric characters represent characters in a sub-word to be
+ evaluated</li>
+ <li>the period character (.) represents a word boundary, i.e. either
+ the beginning or ending of a word</li>
+ <li>numeric characters represent a scoring system for indicating the
+ acceptability of a hyphen in this location. Odd numbers represent an
+ acceptable location for a hyphen, with higher values overriding lower
+ inhibiting values. Even numbers indicate an unacceptable location, with
+ higher values overriding lower values indicating an acceptable position.
+ A value of zero (inhibiting) is implied when there is no number present.
+ Generally patterns are constructed so that valuse greater than 4 are rare.
+ Due to a bug currently patterns with values of 8 and greater don't
+ have an effect, so don't wonder.</li>
+ </ul>
+ Here are some examples from the English patterns file:
+ <ul>
+ <li>Knuth (<em>The TeXBook</em>, Appendix H) uses the example <strong>hach4</strong>, which indicates that it is extremely undesirable to place a hyphen after the substring "hach", for example in the word "toothach-es".</li>
+ <li><strong>.leg5e</strong> indicates that "leg-e", when it occurs at the beginning of a word, is a very good place to place a hyphen, if one is needed. Words like "leg-end" and "leg-er-de-main" fit this pattern.</li>
+ </ul>
+ Note that the algorithm that uses this data searches for each of the word's substrings in the patterns, and chooses the <em>highest</em> value found for letter combination.
+ </li>
+ </ul>
+ <p>If you want to convert a TeX hyphenation pattern file, you have to undo
+ the TeX encoding for non-ASCII text. FOP uses Unicode, and the patterns
+ must be proper Unicode too. You should be aware of the XML encoding issues,
+ preferably use a good Unicode editor.</p>
+ <p>Note that FOP does not do Unicode character normalization. If you use
+ combining chars for accents and other character decorations, you must
+ declare character classes for them, and use the same sequence of base character
+ and combining marks in the XSLFO source, otherwise the pattern wouldn't match.
+ Fortunately, Unicode provides precomposed characters for all important cases
+ in common languages, until now nobody run seriously into this issue. Some dead
+ languages and dialects, especially ancient ones, may pose a real problem
+ though.</p>
+ <p>If you want to generate your own patterns, an open-source utility called
+ patgen can be used to assist in creating pattern files from dictionaries.
+ It is available in many Unix/Linux distributions and every TeX distribution.
+ Pattern creation for languages like english or german is an art. Read
+ Frank Liang's original paper <a class="fork" href="http://www.tug.org/docs/liang/">"Word
+ Hy-phen-a-tion by Com-pu-ter"</a> (yes, with hyphens) for details.
+ The original patgen.web source, included in the TeX source distributions,
+ contains valuable comments, unfortunately technical details often obscure the
+ high level issues. Another important source of information is
+ <a class="fork" href="http://mirrors.ctan.org/systems/knuth/dist/tex/texbook.tex">The
+ TeX Book</a>, appendix H (either read the TeX source, or run it through
+ TeX to typeset it). Secondary articles, for example the works by Petr Sojka,
+ may also give some much needed insight into problems arising in automated
+ hyphenation.</p>
+ </section>
+ </body>
+</document> \ No newline at end of file