This page describes the complex scripts features of Apache™ FOP, which include:
Complex script features are enabled by default. If some application of FOP does not require this support, then it can be disabled in three ways:
-nocs
turns off complex script
features: fop -nocs -fo mydocument.fo -pdf mydocument.pdf
userAgent.setComplexScriptFeaturesEnabled(false);
<fop version="1.0"> <complex-scripts disabled="true"/> ... </fop>
When complex scripts features are enabled, additional information related to bidirectional level resolution, the association between characters and glyphs, and glyph position adjustments are added to the internal, parsed representation of the XSL-FO tree and its corresponding formatted area tree. This additional information will somewhat increase the memory requirements for processing documents that use these features.
In most circumstances, XSL-FO content does not need to change in order to make use of complex scripts features; however, in certain contexts, fully automatic processing is not sufficient. In these cases, an author may make use of the following XSL-FO constructs:
script
property.language
property.writing-mode
property.format
,
grouping-separator
,
grouping-size
,
letter-value
,
and fox:number-conversion-features
.fo:bidi-override
element.The complex scripts related effects of the above enumerated XSL-FO constructs are more fully described in the following sub-sections.
In order to apply font specific complex script features, it is necessary to know the script that applies to the text undergoing layout processing. This script is determined using the following algorithm:
script
property and its value is not the empty string or "auto"
, then that script is used.In case the automatic algorithm does not produce the desired results, an author may
explicitly specify a script
property with the desired script. If specified,
it must be one of the four-letter script code specified in
ISO 15924 Code List or
in the Extended Script Codes table. Comparison
of script codes is performed in a case-insensitive manner, so it does not matter what case
is used when specifying these codes in an XSL-FO document.
The following table enumerates the standard ISO 15924 4-letter codes recognized by FOP.
Code | Script |
---|---|
arab |
Arabic |
beng |
Bengali |
bopo |
Bopomofo |
cyrl |
Cyrillic |
deva |
Devanagari |
ethi |
Ethiopic |
geor |
Georgian |
grek |
Greek |
gujr |
Gujarati |
guru |
Gurmukhi |
hang |
Hangul |
hani |
Han |
hebr |
Hebrew |
hira |
Hiragana |
kana |
Katakana |
knda |
Kannada |
khmr |
Khmer |
laoo |
Lao |
latn |
Latin |
mlym |
Malayalam |
mymr |
Burmese |
mong |
Mongolian |
orya |
Oriya |
sinh |
Sinhalese |
taml |
Tamil |
telu |
Telugu |
thai |
Thai |
tibt |
Tibetan |
zmth |
Math |
zsym |
Symbol |
zyyy |
Undetermined |
zzzz |
Uncoded |
The following table enumerates a number of non-standard extended script codes recognized by FOP.
Code | Script | Comments |
---|---|---|
bng2 |
Bengali | OpenType Indic Version 2 (May 2008 and following) behavior. |
dev2 |
Devanagari | OpenType Indic Version 2 (May 2008 and following) behavior. |
gur2 |
Gurmukhi | OpenType Indic Version 2 (May 2008 and following) behavior. |
gjr2 |
Gujarati | OpenType Indic Version 2 (May 2008 and following) behavior. |
knd2 |
Kannada | OpenType Indic Version 2 (May 2008 and following) behavior. |
mlm2 |
Malayalam | OpenType Indic Version 2 (May 2008 and following) behavior. |
ory2 |
Oriya | OpenType Indic Version 2 (May 2008 and following) behavior. |
tml2 |
Tamil | OpenType Indic Version 2 (May 2008 and following) behavior. |
tel2 |
Telugu | OpenType Indic Version 2 (May 2008 and following) behavior. |
script
property.
Certain fonts that support complex script features can make use of language information in order for
language specific processing rules to be applied. For example, a font designed for the Arabic script may support
typographic variations according to whether the written language is Arabic, Farsi (Persian), Sindhi, Urdu, or
another language written with the Arabic script. In order to apply these language specific features, the author
may explicitly mark the text with a language
property.
When specifying the language
property, the value of the property must be either an
ISO639-2 3-letter code or an
ISO639-1 2-letter code. Comparison of language
codes is performed in a case-insensitive manner, so it does not matter what case is used when specifying these
codes in an XSL-FO document.
Support for specific complex scripts is enumerated in the following table. Support for those marked as not being supported is expected to be added in future revisions.
Script | Support | Tested | Comments |
---|---|---|---|
Arabic | full | full | |
Bengali | none | none | |
Burmese | none | none | |
Devanagari | partial | partial | join controls (ZWJ, ZWNJ) not yet supported |
Khmer | none | none | |
Gujarati | partial | none | pre-alpha |
Gurmukhi | partial | none | pre-alpha |
Hebrew | full | partial | |
Kannada | none | none | |
Lao | none | none | |
Malayalam | none | none | |
Mongolian | none | none | |
Oriya | none | none | |
Tamil | none | none | |
Telugu | none | none | |
Tibetan | none | none | |
Thai | none | none |
Support for specific fonts is enumerated in the following sub-sections. If a given font is not listed, then it has not been tested with these complex scripts features.
Font | Version | Glyphs | Comments |
---|---|---|---|
Arial Unicode MS | 1.01 | 50377 | limited GPOS support |
Lateef | 1.0 | 1147 | language features for Kurdish (KUR), Sindhi (SND), Urdu (URD) |
Scheherazade | 1.0 | 1197 | language features for Kurdish (KUR), Sindhi (SND), Urdu (URD) |
Simplified Arabic | 1.01 | contains invalid, out of order coverage table entries | |
Simplified Arabic | 5.00 | 414 | lacks GPOS support |
Simplified Arabic | 5.92 | 473 | includes GPOS for advanced position adjustment |
Traditional Arabic | 1.01 | 530 | lacks GPOS support |
Traditional Arabic | 5.00 | 530 | lacks GPOS support |
Traditional Arabic | 5.92 | 589 | includes GPOS for advanced position adjustment |
Font | Version | Glyphs | Comments |
---|---|---|---|
Aparajita | 1.00 | 706 | |
Kokila | 1.00 | 706 | |
Mangal | 5.01 | 885 | designed for use in user interfaces |
Utsaah | 1.00 | 706 |
Complex scripts support in Apache FOP is relatively new, so there are certain limitations. Please help us identify and close any gaps.
fo:character
, fo:inline
or fo:wrapper
in order to colorize
individual Arabic letters without affecting shaping behavior across the element boundary.In addition to the XSL-FO specification, a number of external resources provide guidance about authoring documents that employ complex scripts and the features described above: