Apache™ FOP: Complex Scripts
Overview

This page describes the complex scripts features of Apache™ FOP, which include:

Disabling complex scripts

Complex script features are enabled by default. If some application of FOP does not require this support, then it can be disabled in three ways:

  1. Command line: The command line option -nocs turns off complex script features: fop -nocs -fo mydocument.fo -pdf mydocument.pdf
  2. Embedding: userAgent.setComplexScriptFeaturesEnabled(false);
  3. Optional setting in fop.xconf file:
    <fop version="1.0">
      <complex-scripts disabled="true"/>
      ...
    </fop>
              

When complex scripts features are enabled, additional information related to bidirectional level resolution, the association between characters and glyphs, and glyph position adjustments are added to the internal, parsed representation of the XSL-FO tree and its corresponding formatted area tree. This additional information will somewhat increase the memory requirements for processing documents that use these features.

A document author need not make explicit use of any complex scripts feature in order for this additional information to be created. For example, if the author makes use of a font that contains OpenType GSUB and/or GPOS tables, then those tables will be automatically used unless complex scripts features are disabled.
Changes to your XSL-FO input files

In most circumstances, XSL-FO content does not need to change in order to make use of complex scripts features; however, in certain contexts, fully automatic processing is not sufficient. In these cases, an author may make use of the following XSL-FO constructs:

Authoring Details

The complex scripts related effects of the above enumerated XSL-FO constructs are more fully described in the following sub-sections.

Script Property

In order to apply font specific complex script features, it is necessary to know the script that applies to the text undergoing layout processing. This script is determined using the following algorithm:

  1. If the FO element that governs the text specifies a script property and its value is not the empty string or "auto", then that script is used.
  2. Otherwise, the dominant script of the text is determined automatically by finding the script whose constituent characters appear most frequently in the text.

In case the automatic algorithm does not produce the desired results, an author may explicitly specify a script property with the desired script. If specified, it must be one of the four-letter script code specified in ISO 15924 Code List or in the Extended Script Codes table. Comparison of script codes is performed in a case-insensitive manner, so it does not matter what case is used when specifying these codes in an XSL-FO document.

Standard Script Codes

The following table enumerates the standard ISO 15924 4-letter codes recognized by FOP.

Code Script
arab Arabic
beng Bengali
bopo Bopomofo
cyrl Cyrillic
deva Devanagari
ethi Ethiopic
geor Georgian
grek Greek
gujr Gujarati
guru Gurmukhi
hang Hangul
hani Han
hebr Hebrew
hira Hiragana
kana Katakana
knda Kannada
khmr Khmer
laoo Lao
latn Latin
mlym Malayalam
mymr Burmese
mong Mongolian
orya Oriya
sinh Sinhalese
taml Tamil
telu Telugu
thai Thai
tibt Tibetan
zmth Math
zsym Symbol
zyyy Undetermined
zzzz Uncoded
Extended Script Codes

The following table enumerates a number of non-standard extended script codes recognized by FOP.

Code Script Comments
bng2 Bengali OpenType Indic Version 2 (May 2008 and following) behavior.
dev2 Devanagari OpenType Indic Version 2 (May 2008 and following) behavior.
gur2 Gurmukhi OpenType Indic Version 2 (May 2008 and following) behavior.
gjr2 Gujarati OpenType Indic Version 2 (May 2008 and following) behavior.
knd2 Kannada OpenType Indic Version 2 (May 2008 and following) behavior.
mlm2 Malayalam OpenType Indic Version 2 (May 2008 and following) behavior.
ory2 Oriya OpenType Indic Version 2 (May 2008 and following) behavior.
tml2 Tamil OpenType Indic Version 2 (May 2008 and following) behavior.
tel2 Telugu OpenType Indic Version 2 (May 2008 and following) behavior.
Explicit use of one of the above extended script codes is not portable, and should be limited to use with FOP only. When performing automatic script determination, FOP selects the OpenType Indic Version 2 script codes by default. If the author requires Version 1 behavior, then an explicit, non-extension script code should be specified in a governing script property.
Language Property

Certain fonts that support complex script features can make use of language information in order for language specific processing rules to be applied. For example, a font designed for the Arabic script may support typographic variations according to whether the written language is Arabic, Farsi (Persian), Sindhi, Urdu, or another language written with the Arabic script. In order to apply these language specific features, the author may explicitly mark the text with a language property.

When specifying the language property, the value of the property must be either an ISO639-2 3-letter code or an ISO639-1 2-letter code. Comparison of language codes is performed in a case-insensitive manner, so it does not matter what case is used when specifying these codes in an XSL-FO document.

Writing Mode Property
Number Conversion Properties
Bidi Override Element
Bidi Control Characters
Join Control Characters
Supported Scripts

Support for specific complex scripts is enumerated in the following table. Support for those marked as not being supported is expected to be added in future revisions.

Script Support Tested Comments
Arabic full full
Bengali none none
Burmese none none
Devanagari partial partial join controls (ZWJ, ZWNJ) not yet supported
Khmer none none
Gujarati partial none pre-alpha
Gurmukhi partial none pre-alpha
Hebrew full partial
Kannada none none
Lao none none
Malayalam none none
Mongolian none none
Oriya none none
Tamil none none
Telugu none none
Tibetan none none
Thai none none
Supported Fonts

Support for specific fonts is enumerated in the following sub-sections. If a given font is not listed, then it has not been tested with these complex scripts features.

Arabic Fonts
Font Version Glyphs Comments
Arial Unicode MS 1.01 50377 limited GPOS support
Lateef 1.0 1147 language features for Kurdish (KUR), Sindhi (SND), Urdu (URD)
Scheherazade 1.0 1197 language features for Kurdish (KUR), Sindhi (SND), Urdu (URD)
Simplified Arabic 1.01 contains invalid, out of order coverage table entries
Simplified Arabic 5.00 414 lacks GPOS support
Simplified Arabic 5.92 473 includes GPOS for advanced position adjustment
Traditional Arabic 1.01 530 lacks GPOS support
Traditional Arabic 5.00 530 lacks GPOS support
Traditional Arabic 5.92 589 includes GPOS for advanced position adjustment
Devanagari Fonts
Font Version Glyphs Comments
Aparajita 1.00 706
Kokila 1.00 706
Mangal 5.01 885 designed for use in user interfaces
Utsaah 1.00 706
Other Limitations

Complex scripts support in Apache FOP is relatively new, so there are certain limitations. Please help us identify and close any gaps.