This document is for developers wishing to contribute to the FormulaEvaluator API functionality.
When evaluating workbooks you may encounter an org.apache.poi.ss.formula.eval.NotImplementedException
which indicates that a function is not (yet) supported by POI. Is there a workaround?
Yes, the POI framework makes it easy to add implementation of new functions. Prior to POI-3.8
you had to checkout the source code from svn and make a custom build with your function implementation.
Since POI-3.8 you can register new functions in run-time.
Currently, contribution is desired for implementing the standard MS
Excel functions. Placeholder classes for these have been created,
contributors only need to insert implementation for the
individual evaluate()
methods that do the actual evaluation.
Briefly, a formula string (along with the sheet and workbook that
form the context in which the formula is evaluated) is first parsed
into Reverse Polish Notation (RPN) tokens using the FormulaParser
class.
(If you don't know what RPN tokens are, now is a good time to
read
Anthony Stone's description of RPN.)
RPN tokens are mapped to Eval
classes. (The class hierarchy for the Eval
s
is best understood if you view it in a class diagram
viewer.) Depending on the type of RPN token (also called Ptg
s
henceforth since that is what the FormulaParser
calls the classes), a
specific type of Eval
wrapper is constructed to wrap the RPN token and
is pushed on the stack, unless the Ptg
is an OperationPtg
. If it is an
OperationPtg
, an OperationEval
instance is created for the specific
type of OperationPtg
. And depending on how many operands it takes,
that many Eval
s are popped of the stack and passed in an array to
the OperationEval
instance's evaluate method which returns an Eval
of subtype ValueEval
. Thus an operation in the formula is evaluated.
Eval
is of subinterface ValueEval
or OperationEval
.
Operands are always ValueEval
s, and operations are always OperationEval
s.
OperationEval.evaluate(Eval[])
returns an Eval
which is supposed
to be an instance of one of the implementations of
ValueEval
. The ValueEval
resulting from evaluate()
is pushed on the
stack and the next RPN token is evaluated. This continues until
eventually there are no more RPN tokens, at which point, if the formula
string was correctly parsed, there should be just one Eval
on the
stack — which contains the result of evaluating the formula.
Two special Ptg
s — AreaPtg
and ReferencePtg
—
are handled a little differently, but the code should be self
explanatory for that. Very briefly, the cells included in AreaPtg
and
RefPtg
are examined and their values are populated in individual
ValueEval
objects which are set into the implementations of
AreaEval
and RefEval
.
OperationEval
s for the standard operators have been implemented and tested.
As of release 5.2.0, POI implements 202 built-in functions, see Appendix A for the list of supported functions with an implementation. You can programmatically list supported / unsupported functions using the following helper methods:
If you need a function that POI doesn't currently support, you have two options. You can create the function yourself, and have your program add it to POI at run-time. Doing this will help you get the function you need as soon as possible. The other option is to create the function yourself, and build it into the POI library, possibly contributing the code to the POI project. Doing this will help you get the function you need, but you'll have to build POI from source yourself. And if you contribute the code, you'll help others who need the function in the future, because it will already be supported in the next release of POI. The two options require almost identical code, but the process of deploying the function is different. If your function is a User Defined Function, you'll always take the run-time option, as POI doesn't distribute UDFs.
In the sections ahead, we'll implement the Excel SQRTPI()
function, first
at run-time, and then we'll show how change it to a library-based implementation.
All Excel formula function classes implement either the
org.apache.poi.hssf.record.formula.functions.Function
or the
org.apache.poi.hssf.record.formula.functions.FreeRefFunction
interface.
Function
is a common interface for the functions defined in the Binary Excel File Format (BIFF8): these are "classic" Excel functions like SUM
, COUNT
, LOOKUP
, etc.
FreeRefFunction
is a common interface for the functions from the Excel Analysis ToolPak, for User Defined Functions that you create,
and for Excel built-in functions that have been defined since BIFF8 was defined.
In the future these two interfaces are expected be unified into one, but for now you have to start your implementation from two slightly different roots.
You are about to implement a function and don't know which interface to start from: Function
or FreeRefFunction
.
You should use Function
if the function is part of the Excel BIFF8
definition, and FreeRefFunction
for a function that is part of the Excel Analysis ToolPak, was added to Excel after BIFF8, or that you are creating yourself.
You can check the list of Analysis ToolPak functions defined in org.apache.poi.ss.formula.atp.AnalysisToolPak.createFunctionsMap()
to see if the function is part of the Analysis ToolPak.
The list of BIFF8 functions is defined as a text file, in the
src/resources/main/org/apache/poi/ss/formula/function/functionMetadata.txt
file.
You can also use the following code to check which base class your function should implement, if it is not a User Defined function (UDFs must implement FreeRefFunction
):
Here is the fun part: let's walk through the implementation of the Excel function SQRTPI()
,
which POI doesn not currently support.
AnalysisToolPak.isATPFunction("SQRTPI")
returns true, so this is an Analysis ToolPak function.
Thus the base interface must be FreeRefFunction
. The same would be true if we were implementing
a UDF.
Because we're taking the run-time deployment option, we'll create this new function in a source
file in our own program. Our function will return an Eval
that is either
it's proper result, or an ErrorEval
that describes the error. All that work
is done in the function's evaluate()
method:
If our function had been one of the BIFF8 Excel built-ins, it would have been based on
the Function
interface instead.
There are sub-interfaces of Function
that make life easier when implementing numeric functions
or functions
with a small, fixed number of arguments:
org.apache.poi.hssf.record.formula.functions.NumericFunction
org.apache.poi.hssf.record.formula.functions.Fixed0ArgFunction
org.apache.poi.hssf.record.formula.functions.Fixed1ArgFunction
org.apache.poi.hssf.record.formula.functions.Fixed2ArgFunction
org.apache.poi.hssf.record.formula.functions.Fixed3ArgFunction
org.apache.poi.hssf.record.formula.functions.Fixed4ArgFunction
Since SQRTPI()
takes exactly one argument, we would start our implementation from
Fixed1ArgFunction
. The differences for a BIFF8 Fixed1ArgFunction
are pretty small:
Now when the implementation is ready we need to register it with the formula evaluator. This is the same no matter which kind of function we're creating. We simply add the following line to the program that is using POI:
Voila! The formula evaluator now recognizes SQRTPI()
!
If we choose instead to implement our function as part of the POI
library, the code is nearly identical. All POI functions
are part of one of two Java packages: org.apache.poi.ss.formula.functions
for BIFF8 Excel built-in functions, and org.apache.poi.ss.formula.atp
for Analysis ToolPak functions. The function still needs to implement the
appropriate base class, just as before. To implement our SQRTPI()
function in the POI library, we need to move the source code to
poi/src/main/java/org/apache/poi/ss/formula/atp/SqrtPi.java
in
the POI source code, change the package
statement, and add a
singleton instance:
If our function had been one of the BIFF8 Excel built-ins, we would instead have moved
the source code to
poi/src/main/java/org/apache/poi/ss/formula/functions/SqrtPi.java
in
the POI source code, and changed the package
statement to:
POI library functions are registered differently from run-time-deployed functions.
Again, the techniques differ for the two types of library functions (remembering
that POI never releases the third type, UDFs).
For our Analysis ToolPak function, we have to update the list of functions in
org.apache.poi.ss.formula.atp.AnalysisToolPak.createFunctionsMap()
:
If our function had been one of the BIFF8 Excel built-ins,
the registration instead would require updating an entry in the formula-function table,
poi/src/main/resources/org/apache/poi/ss/formula/function/functionMetadata.txt
:
and also updating the list of function implementation list in
org.apache.poi.ss.formula.eval.FunctionEval.produceFunctions()
:
Excel uses the IEEE Standard for Double Precision Floating Point numbers except two cases where it does not adhere to IEEE 754:
Be aware of these two cases when saving results of your scientific calculations in Excel: “where are my Infinities and NaNs? They are gone!”
Automated testing of the implemented Function is easy.
The source code for this is in the file: org.apache.poi.hssf.record.formula.GenericFormulaTestCase.java
.
This class has a reference to the test xls file (not a test xls, the test xls :) )
which may need to be changed for your environment. Once you do that, in the test xls,
locate the entry for the function that you have implemented and enter different tests
in a cell in the FORMULA row. Then copy the "value of" the formula that you entered in the
cell just below it (this is easily done in excel as:
[copy the formula cell] > [go to cell below] > Edit > Paste Special > Values > "ok").
You can enter multiple such formulas and paste their values in the cell below and the
test framework will automatically test if the formula evaluation matches the expected
value (Again, hard to put in words, so if you will, please take time to quickly look
at the code and the currently entered tests in the patch attachment "FormulaEvalTestData.xls"
file).
Functions supported by POI (as of v5.2.0 release)