|
|
@@ -28,7 +28,7 @@ |
|
|
|
<section id="intro"> |
|
|
|
<title>Introduction</title> |
|
|
|
<p>FOP uses Liang's hyphenation algorithm, well known from TeX. It needs |
|
|
|
language specific pattern and other data for operation.</p> |
|
|
|
language specific patterns and other data for operation.</p> |
|
|
|
<p>Because of <link href="#license-issues">licensing issues</link> (and for |
|
|
|
convenience), all hyphenation patterns for FOP are made available through |
|
|
|
the <fork href="http://offo.sourceforge.net/hyphenation/index.html">Objects For |
|
|
@@ -39,6 +39,79 @@ |
|
|
|
Please inquire on the <link href="../maillist.html#fop-user">FOP User |
|
|
|
mailing list</link>.</note> |
|
|
|
</section> |
|
|
|
<section id="using"> |
|
|
|
<title>Using Hyphenation</title> |
|
|
|
<p> |
|
|
|
In order to get words hyphenated, hyphenation has to be |
|
|
|
enabled explicitely (set property hyphenation="true") and a |
|
|
|
language has to be defined (e.g. language="en"). Optionally, a |
|
|
|
country can be specified (e.g. country="GB"). |
|
|
|
</p> |
|
|
|
<p> |
|
|
|
If hyphenation is requested, at first a serialized instance |
|
|
|
containing precompiled hyphenation patterns is looked up in |
|
|
|
the classpath. If only a language is specified, a ressource |
|
|
|
named <code>hyph/<language>.hyp</code> is loaded. If both |
|
|
|
language and country are specified, the ressource |
|
|
|
<code>hyph/<language>_<country>.hyp</code> is looked up, |
|
|
|
and if this fails, the loader looks also for |
|
|
|
<code>hyph/<language>.hyp</code>. |
|
|
|
</p> |
|
|
|
<p> |
|
|
|
If no precompiled patterns are found, FOP tries to load raw |
|
|
|
patterns from the an XML file name |
|
|
|
<code>/hyph/<language>.xml</code> respective |
|
|
|
<code>/hyph/<language>_<country>.xml</code> . The /hyph |
|
|
|
prefix is hardcoded and can't be configured. Note that this |
|
|
|
usually constitues an absolute file path. FOP can't load raw |
|
|
|
patterns from other sources than files. |
|
|
|
</p> |
|
|
|
<p> |
|
|
|
If you think hyphenation is enabled but words aren't |
|
|
|
hyphenated, check whether FOP finds the relevant hyphenation |
|
|
|
patterns: |
|
|
|
</p> |
|
|
|
<ol> |
|
|
|
<li>Did you download and install the hyphenation patterns |
|
|
|
properly? In case you downloaded the files from OFFO, check |
|
|
|
whether you have downloaded the patterns for the correct FOP |
|
|
|
version (0.20.5 or the development version), and check whether |
|
|
|
you followed the installation instructions.</li> |
|
|
|
<li>Check whether you have spelled the language code and |
|
|
|
optionally the country code correctly. Note that the country |
|
|
|
codes are in uppercase, by convention. This matters.</li> |
|
|
|
</ol> |
|
|
|
<p> |
|
|
|
If hyphenation works in general, but specific words aren't |
|
|
|
hyphenated, or aren't hyphenated as expected, you may have one |
|
|
|
of the following problems: |
|
|
|
</p> |
|
|
|
<ol> |
|
|
|
<li>The patters contain a bug, or simply wont do as you |
|
|
|
expect. In order to reduce the amount of patters, they are |
|
|
|
usually cut some slack.</li> |
|
|
|
<li>The patterns may be for an unexpected, unofficial or |
|
|
|
outdated dialect of the language. For example, the turkish |
|
|
|
patterns were (and maybe still are) made for 17c Osman rather |
|
|
|
than modern turkish.</li> |
|
|
|
<li>The word may contain invisible characters which prevent it |
|
|
|
from being parsed properly from the content stream, or from |
|
|
|
being properly matched. Examples of such characters are the |
|
|
|
soft hyphen (U+00AD) and the zero width joiner (U+200D). You |
|
|
|
have to remove them in order to get the words hyphenated |
|
|
|
properly. OTOH, you can use them in order to prevent certain |
|
|
|
(unwanted, spurious or incorrect) hyphenations</li> |
|
|
|
<li>If the word contains characters which can be composed from |
|
|
|
other Unicode characters, or vice versa (e.g. U+00E4 "latin |
|
|
|
small a with diaresis" and U+0061 U+0308 "latin small a" |
|
|
|
"combining diaresis"), the patterns may just contain the |
|
|
|
opposite form. FOP doesn't run <link |
|
|
|
href="http://www.unicode.org/reports/tr15/">Unicode |
|
|
|
normalization</link> on either the content nor on the |
|
|
|
patterns. You have no choice but to check which form the |
|
|
|
patterns use and adapt your FO source.</li> |
|
|
|
</ol> |
|
|
|
</section> |
|
|
|
<section id="license-issues"> |
|
|
|
<title>License Issues</title> |
|
|
|
<p>Many of the hyphenation files distributed with TeX and its offspring are |