Property Expression Parsing
by Peter B. West
Property expression parsing
The parsing of property value expressions is handled by two closely related classes: org.apache.fop.fo.expr.PropertyTokenizer and its subclass, org.apache.fop.fo.expr.PropertyParser, and by refineParsing(int, FONode, PropertyValue) methods in the individual property classes. PropertyTokenizer, as the name suggests, handles the tokenizing of the expression, handing tokens back to its subclass, PropertyParser. PropertyParser, in turn, returns a PropertyValueList, a list of PropertyValues.
The tokenizer and parser rely in turn on the datatype definitions from the org.apache.fop.datatypes package, which include the PropertyValue datatype constant definitions.
Data types
The data types currently defined in org.apache.fop.datatypes include:
Numbers and lengths | |||
---|---|---|---|
Numeric | The fundamental length data type. Numerics of various types are constructed by the classes listed below. | ||
Constructor classes for Numeric | |||
Ems | Relative length in ems | ||
IntegerType | |||
Length | In centimetres(cm), millimetres(mm), inches(in), points(pt), picas(pc) or pixels(px) | ||
Percentage | |||
Other Numeric | Other numeric vaues which do not interact with the lengths represented by Numeric values. | ||
Angle | In degrees(deg), gradients(grad) or radians(rad) | ||
Frequency | In hertz(Hz) or kilohertz(kHz) | ||
Time | In seconds(s) or milliseconds(ms) | ||
Strings | |||
StringType | Base class for data types which result in a String. | ||
Literal | A subclass of StringType for literals which exceed the constraints of an NCName. | ||
MimeType | A subclass of StringType for literals which represent a mime type. | ||
UriType | A subclass of StringType for literals which represent a URI, as specified by the argument to url(). | ||
NCName | A subclass of StringType for literals which meet the constraints of an NCName. | ||
Country | An RFC 3066/ISO 3166 country code. | ||
Language | An RFC 3066/ISO 639 language code. | ||
Script | An ISO 15924 script code. | ||
Enumerated types | |||
EnumType | An integer representing one of the tokens in a set of enumeration values. | ||
MappedEnumType | A subclass of EnumType. Maintains a String with the value to which the associated "raw" enumeration token maps. E.g., the font-size enumeration value "medium" maps to the String "12pt". | ||
Colors | |||
ColorType | Maintains a four-element array of float, derived from the name of a standard colour, the name returned by a call to system-color(), or an RGB specification. | ||
Fonts | |||
FontFamilySet | Maintains an array of Strings containing a prioritized list of possibly generic font family names. | ||
Pseudo-types | |||
A variety of pseudo-types have been defined as convenience types for frequently appearing enumeration token values, or for other special purposes. | |||
Inherit | For values of inherit. | ||
Auto | For values of auto. | ||
None | For values of none. | ||
Bool | For values of true/false. | ||
FromNearestSpecified | Created to ensure that, when associated with a shorthand, the from-nearest-specified-value() core function is the sole component of the expression. | ||
FromParent | Created to ensure that, when associated with a shorthand, the from-parent() core function is the sole component of the expression. |
Tokenizer
The tokenizer returns one of the following token values:
static final int EOF = 0 ,NCNAME = 1 ,MULTIPLY = 2 ,LPAR = 3 ,RPAR = 4 ,LITERAL = 5 ,FUNCTION_LPAR = 6 ,PLUS = 7 ,MINUS = 8 ,MOD = 9 ,DIV = 10 ,COMMA = 11 ,PERCENT = 12 ,COLORSPEC = 13 ,FLOAT = 14 ,INTEGER = 15 ,ABSOLUTE_LENGTH = 16 ,RELATIVE_LENGTH = 17 ,TIME = 18 ,FREQ = 19 ,ANGLE = 20 ,INHERIT = 21 ,AUTO = 22 ,NONE = 23 ,BOOL = 24 ,URI = 25 ,MIMETYPE = 26 // NO_UNIT is a transient token for internal use only. It is // never set as the end result of parsing a token. ,NO_UNIT = 27 ;
Most of these tokens are self-explanatory, but a few need further comment.
- AUTO
- Because of its frequency of occurrence, and the fact that it is always the initial value for any property which supports it, AUTO has been promoted into a pseudo-type with its on datatype class. Therefore, it is also reported as a token.
- NONE
- Similarly to AUTO, NONE has been promoted to a pseudo-type because of its frequency.
- BOOL
- There is a de facto boolean type buried in the enumeration types for many of the properties. It had been specified as a type in its own right in this code.
- MIMETYPE
- The property content-type introduces this complication. It can have two values of the form content-type:mime-type (e.g. content-type="content-type:xml/svg") or namespace-prefix:prefix (e.g. content-type="namespace-prefix:svg"). The experimental code reduces these options to the payload in each case: an NCName in the case of a namespace prefix, and a MIMETYPE in the case of a content-type specification. NCNames cannot contain a "/".
Parser
The parser returns a PropertyValueList, necessary because of the possibility that a list of PropertyValue elements may be returned from the expressions of some properties.
PropertyValueLists may contain PropertyValues or other PropertyValueLists. This latter provision is necessitated for the peculiar case of of text-shadow, which may contain whitespace separated sublists of either two or three elements, separated from one another by commas. To accommodate this peculiarity, comma separated elements are added to the top-level list, while whitespace separated values are always collected into sublists to be added to the top-level list.
Other special cases include the processing of the core functions from-parent() and from-nearest-specified-value() when these function calls are assigned to a shorthand property, or used with a shorthand property name as an argument. In these cases, the function call must be the sole component of the expression. The pseudo-element classes FromParent and FromNearestSpecified are generated in these circumstances so that an exception will be thrown if they are involved in expression evaluation with other components. (See Rec. Section 5.10.4 Property Value Functions.)
The experimental code is a simple extension of the existing parser code, which itself borrowed heavily from James Clark's XT processor.