Property Expression Parsing

by Peter B. West

Property expression parsing

The parsing of property value expressions is handled by two closely related classes: org.apache.fop.fo.expr.PropertyTokenizer and its subclass, org.apache.fop.fo.expr.PropertyParser, and by refineParsing(int, FONode, PropertyValue) methods in the individual property classes. PropertyTokenizer, as the name suggests, handles the tokenizing of the expression, handing tokens back to its subclass, PropertyParser. PropertyParser, in turn, returns a PropertyValueList, a list of PropertyValues.

The tokenizer and parser rely in turn on the datatype definitions from the org.apache.fop.datatypes package, which include the PropertyValue datatype constant definitions.

Data types

The data types currently defined in org.apache.fop.datatypes include:

Numbers and lengths
Numeric The fundamental length data type. Numerics of various types are constructed by the classes listed below.
Constructor classes for Numeric
Ems Relative length in ems
IntegerType
Length In centimetres(cm), millimetres(mm), inches(in), points(pt), picas(pc) or pixels(px)
Percentage
Other Numeric Other numeric vaues which do not interact with the lengths represented by Numeric values.
Angle In degrees(deg), gradients(grad) or radians(rad)
Frequency In hertz(Hz) or kilohertz(kHz)
Time In seconds(s) or milliseconds(ms)
Strings
StringType Base class for data types which result in a String.
Literal A subclass of StringType for literals which exceed the constraints of an NCName.
MimeType A subclass of StringType for literals which represent a mime type.
UriType A subclass of StringType for literals which represent a URI, as specified by the argument to url().
NCName A subclass of StringType for literals which meet the constraints of an NCName.
Country An RFC 3066/ISO 3166 country code.
Language An RFC 3066/ISO 639 language code.
Script An ISO 15924 script code.
Enumerated types
EnumType An integer representing one of the tokens in a set of enumeration values.
MappedEnumType A subclass of EnumType. Maintains a String with the value to which the associated "raw" enumeration token maps. E.g., the font-size enumeration value "medium" maps to the String "12pt".
Colors
ColorType Maintains a four-element array of float, derived from the name of a standard colour, the name returned by a call to system-color(), or an RGB specification.
Fonts
FontFamilySet Maintains an array of Strings containing a prioritized list of possibly generic font family names.
Pseudo-types
A variety of pseudo-types have been defined as convenience types for frequently appearing enumeration token values, or for other special purposes.
Inherit For values of inherit.
Auto For values of auto.
None For values of none.
Bool For values of true/false.
FromNearestSpecified Created to ensure that, when associated with a shorthand, the from-nearest-specified-value() core function is the sole component of the expression.
FromParent Created to ensure that, when associated with a shorthand, the from-parent() core function is the sole component of the expression.

Tokenizer

The tokenizer returns one of the following token values:

          static final int
          EOF = 0
          ,NCNAME = 1
          ,MULTIPLY = 2
          ,LPAR = 3
          ,RPAR = 4
          ,LITERAL = 5
          ,FUNCTION_LPAR = 6
          ,PLUS = 7
          ,MINUS = 8
          ,MOD = 9
          ,DIV = 10
          ,COMMA = 11
          ,PERCENT = 12
          ,COLORSPEC = 13
          ,FLOAT = 14
          ,INTEGER = 15
          ,ABSOLUTE_LENGTH = 16
          ,RELATIVE_LENGTH = 17
          ,TIME = 18
          ,FREQ = 19
          ,ANGLE = 20
          ,INHERIT = 21
          ,AUTO = 22
          ,NONE = 23
          ,BOOL = 24
          ,URI = 25
          ,MIMETYPE = 26
          // NO_UNIT is a transient token for internal use only.  It is
          // never set as the end result of parsing a token.
          ,NO_UNIT = 27
          ;
      

Most of these tokens are self-explanatory, but a few need further comment.

AUTO
Because of its frequency of occurrence, and the fact that it is always the initial value for any property which supports it, AUTO has been promoted into a pseudo-type with its on datatype class. Therefore, it is also reported as a token.
NONE
Similarly to AUTO, NONE has been promoted to a pseudo-type because of its frequency.
BOOL
There is a de facto boolean type buried in the enumeration types for many of the properties. It had been specified as a type in its own right in this code.
MIMETYPE
The property content-type introduces this complication. It can have two values of the form content-type:mime-type (e.g. content-type="content-type:xml/svg") or namespace-prefix:prefix (e.g. content-type="namespace-prefix:svg"). The experimental code reduces these options to the payload in each case: an NCName in the case of a namespace prefix, and a MIMETYPE in the case of a content-type specification. NCNames cannot contain a "/".

Parser

The parser returns a PropertyValueList, necessary because of the possibility that a list of PropertyValue elements may be returned from the expressions of some properties.

PropertyValueLists may contain PropertyValues or other PropertyValueLists. This latter provision is necessitated for the peculiar case of of text-shadow, which may contain whitespace separated sublists of either two or three elements, separated from one another by commas. To accommodate this peculiarity, comma separated elements are added to the top-level list, while whitespace separated values are always collected into sublists to be added to the top-level list.

Other special cases include the processing of the core functions from-parent() and from-nearest-specified-value() when these function calls are assigned to a shorthand property, or used with a shorthand property name as an argument. In these cases, the function call must be the sole component of the expression. The pseudo-element classes FromParent and FromNearestSpecified are generated in these circumstances so that an exception will be thrown if they are involved in expression evaluation with other components. (See Rec. Section 5.10.4 Property Value Functions.)

The experimental code is a simple extension of the existing parser code, which itself borrowed heavily from James Clark's XT processor.