diff options
Diffstat (limited to 'src/documentation/content/xdocs/components/spreadsheet/eval-devguide.xml')
-rw-r--r-- | src/documentation/content/xdocs/components/spreadsheet/eval-devguide.xml | 591 |
1 files changed, 591 insertions, 0 deletions
diff --git a/src/documentation/content/xdocs/components/spreadsheet/eval-devguide.xml b/src/documentation/content/xdocs/components/spreadsheet/eval-devguide.xml new file mode 100644 index 0000000000..2d49b0aa09 --- /dev/null +++ b/src/documentation/content/xdocs/components/spreadsheet/eval-devguide.xml @@ -0,0 +1,591 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + ==================================================================== + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + ==================================================================== +--> +<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd"> + +<document> + <header> + <title>Developing Formula Evaluation</title> + <authors> + <person email="amoweb@yahoo.com" name="Amol Deshmukh" id="AD"/> + <person email="yegor@apache.org" name="Yegor Kozlov" id="YK"/> + </authors> + </header> + <body> + <section><title>Introduction</title> + <p> + This document is for developers wishing to contribute to the + FormulaEvaluator API functionality. + </p> + <p> + When evaluating workbooks you may encounter an <code>org.apache.poi.ss.formula.eval.NotImplementedException</code> + which indicates that a function is not (yet) supported by POI. Is there a workaround? + Yes, the POI framework makes it easy to add implementation of new functions. Prior to POI-3.8 + you had to checkout the source code from svn and make a custom build with your function implementation. + Since POI-3.8 you can register new functions in run-time. + </p> + <p> + Currently, contribution is desired for implementing the standard MS + Excel functions. Placeholder classes for these have been created, + contributors only need to insert implementation for the + individual <code>evaluate()</code> methods that do the actual evaluation. + </p> + </section> + <section><title>Overview of FormulaEvaluator </title> + <p> + Briefly, a formula string (along with the sheet and workbook that + form the context in which the formula is evaluated) is first parsed + into Reverse Polish Notation (RPN) tokens using the <code>FormulaParser</code> class. + (If you don't know what RPN tokens are, now is a good time to + read <a href="http://www-stone.ch.cam.ac.uk/documentation/rrf/rpn.html"> + Anthony Stone's description of RPN</a>.) + </p> + <section><title> The big picture</title> + <p> + RPN tokens are mapped to <code>Eval</code> classes. (The class hierarchy for the <code>Eval</code>s + is best understood if you view it in a class diagram + viewer.) Depending on the type of RPN token (also called <code>Ptg</code>s + henceforth since that is what the <code>FormulaParser</code> calls the classes), a + specific type of <code>Eval</code> wrapper is constructed to wrap the RPN token and + is pushed on the stack, unless the <code>Ptg</code> is an <code>OperationPtg</code>. If it is an + <code>OperationPtg</code>, an <code>OperationEval</code> instance is created for the specific + type of <code>OperationPtg</code>. And depending on how many operands it takes, + that many <code>Eval</code>s are popped of the stack and passed in an array to + the <code>OperationEval</code> instance's evaluate method which returns an <code>Eval</code> + of subtype <code>ValueEval</code>. Thus an operation in the formula is evaluated. + </p> + <note> An <code>Eval</code> is of subinterface <code>ValueEval</code> or <code>OperationEval</code>. + Operands are always <code>ValueEval</code>s, and operations are always <code>OperationEval</code>s.</note> + <p> + <code>OperationEval.evaluate(Eval[])</code> returns an <code>Eval</code> which is supposed + to be an instance of one of the implementations of + <code>ValueEval</code>. The <code>ValueEval</code> resulting from <code>evaluate()</code> is pushed on the + stack and the next RPN token is evaluated. This continues until + eventually there are no more RPN tokens, at which point, if the formula + string was correctly parsed, there should be just one <code>Eval</code> on the + stack — which contains the result of evaluating the formula. + </p> + <p> + Two special <code>Ptg</code>s — <code>AreaPtg</code> and <code>ReferencePtg</code> — + are handled a little differently, but the code should be self + explanatory for that. Very briefly, the cells included in <code>AreaPtg</code> and + <code>RefPtg</code> are examined and their values are populated in individual + <code>ValueEval</code> objects which are set into the implementations of + <code>AreaEval</code> and <code>RefEval</code>. + </p> + <p> + <code>OperationEval</code>s for the standard operators have been implemented and tested. + </p> + </section> + </section> + <section><title>What functions are supported?</title> + <p> + As of release 5.2.0, POI implements 202 built-in functions, + see <a href="#appendixA">Appendix A</a> for the list of supported functions with an implementation. + You can programmatically list supported / unsupported functions using the following helper methods: + </p> +<source>import org.apache.poi.ss.formula.ss.formula.WorkbookEvaluator; + +// list of functions that POI can evaluate +Collection<String> supportedFuncs = WorkbookEvaluator.getSupportedFunctionNames(); + +// list of functions that are not supported by POI +Collection<String> unsupportedFuncs = WorkbookEvaluator.getNotSupportedFunctionNames(); +</source> + <section><title>I need a function that isn't supported!</title> + <p> + If you need a function that POI doesn't currently support, you have two options. + You can create the function yourself, and have your program add it to POI at + run-time. Doing this will help you get the function you need as soon as possible. + The other option is to create the function yourself, and build it into the POI library, + possibly contributing the code to the POI project. Doing this will help you get the + function you need, but you'll have to build POI from source yourself. And if you + contribute the code, you'll help others who need the function in the future, because + it will already be supported in the next release of POI. The two options require + almost identical code, but the process of deploying the function is different. + If your function is a User Defined Function, you'll always take the run-time option, + as POI doesn't distribute UDFs. + </p> + <p> + In the sections ahead, we'll implement the Excel <code>SQRTPI()</code> function, first + at run-time, and then we'll show how change it to a library-based implementation. + </p> + </section> + </section> + <section><title>Two base interfaces to start your implementation</title> + <p> + All Excel formula function classes implement either the + <code>org.apache.poi.hssf.record.formula.functions.Function</code> or the + <code>org.apache.poi.hssf.record.formula.functions.FreeRefFunction</code> interface. + <code>Function</code> is a common interface for the functions defined in the Binary Excel File Format (BIFF8): these are "classic" Excel functions like <code>SUM</code>, <code>COUNT</code>, <code>LOOKUP</code>, <em>etc</em>. + <code>FreeRefFunction</code> is a common interface for the functions from the Excel Analysis ToolPak, for User Defined Functions that you create, + and for Excel built-in functions that have been defined since BIFF8 was defined. + In the future these two interfaces are expected be unified into one, but for now you have to start your implementation from two slightly different roots. + </p> + + <section><title>Which interface to start from?</title> + <p> + You are about to implement a function and don't know which interface to start from: <code>Function</code> or <code>FreeRefFunction</code>. + You should use <code>Function</code> if the function is part of the Excel BIFF8 + definition, and <code>FreeRefFunction</code> for a function that is part of the Excel Analysis ToolPak, was added to Excel after BIFF8, or that you are creating yourself. + </p> + <p> + You can check the list of Analysis ToolPak functions defined in <code>org.apache.poi.ss.formula.atp.AnalysisToolPak.createFunctionsMap()</code> + to see if the function is part of the Analysis ToolPak. + The list of BIFF8 functions is defined as a text file, in the + <code>src/resources/main/org/apache/poi/ss/formula/function/functionMetadata.txt</code> file. + </p> + <p> + You can also use the following code to check which base class your function should implement, if it is not a User Defined function (UDFs must implement <code>FreeRefFunction</code>): + </p> +<source>import org.apache.poi.hssf.record.formula.atp.AnalysisToolPak; + +if (!AnalysisToolPak.isATPFunction(functionName)){ + // the function must implement org.apache.poi.hssf.record.formula.functions.Function +} else { + // the function must implement org.apache.poi.hssf.record.formula.functions.FreeRefFunction +} +</source> + </section> + </section> + <section><title>Implementing a function.</title> + <p> + Here is the fun part: let's walk through the implementation of the Excel function <code>SQRTPI()</code>, + which POI doesn not currently support. + </p> + <p> + <code>AnalysisToolPak.isATPFunction("SQRTPI")</code> returns true, so this is an Analysis ToolPak function. + Thus the base interface must be <code>FreeRefFunction</code>. The same would be true if we were implementing + a UDF. + </p> + <p> + Because we're taking the run-time deployment option, we'll create this new function in a source + file in our own program. Our function will return an <code>Eval</code> that is either + it's proper result, or an <code>ErrorEval</code> that describes the error. All that work + is done in the function's <code>evaluate()</code> method: + </p> +<source>package ...; +import org.apache.poi.ss.formula.eval.EvaluationException; +import org.apache.poi.ss.formula.eval.ErrorEval; +import org.apache.poi.ss.formula.eval.NumberEval; +import org.apache.poi.ss.formula.eval.OperandResolver; +import org.apache.poi.ss.formula.eval.ValueEval; +import org.apache.poi.ss.formula.functions.FreeRefFunction; + +public final class SqrtPi implements FreeRefFunction { + + public ValueEval evaluate(ValueEval[] args, OperationEvaluationContext ec) { + ValueEval arg0 = args[0]; + int srcRowIndex = ec.getRowIndex(); + int srcColumnIndex = ec.getColumnIndex(); + try { + // Retrieves a single value from a variety of different argument types according to standard + // Excel rules. Does not perform any type conversion. + ValueEval ve = OperandResolver.getSingleValue(arg0, srcRowIndex, srcColumnIndex); + + // Applies some conversion rules if the supplied value is not already a number. + // Throws EvaluationException(#VALUE!) if the supplied parameter is not a number + double arg = OperandResolver.coerceValueToDouble(ve); + + // this where all the heavy-lifting happens + double result = Math.sqrt(arg*Math.PI); + + // Excel uses the error code #NUM! instead of IEEE NaN and Infinity, + // so when a numeric function evaluates to Double.NaN or Double.Infinity, + // be sure to translate the result to the appropriate error code + if (Double.isNaN(result) || Double.isInfinite(result)) { + throw new EvaluationException(ErrorEval.NUM_ERROR); + } + + return new NumberEval(result); + } catch (EvaluationException e){ + return e.getErrorEval(); + } + } +} +</source> + <p> + If our function had been one of the BIFF8 Excel built-ins, it would have been based on + the <code>Function</code> interface instead. + There are sub-interfaces of <code>Function</code> that make life easier when implementing numeric functions + or functions + with a small, fixed number of arguments: + </p> + <ul> + <li><code>org.apache.poi.hssf.record.formula.functions.NumericFunction</code></li> + <li><code>org.apache.poi.hssf.record.formula.functions.Fixed0ArgFunction</code></li> + <li><code>org.apache.poi.hssf.record.formula.functions.Fixed1ArgFunction</code></li> + <li><code>org.apache.poi.hssf.record.formula.functions.Fixed2ArgFunction</code></li> + <li><code>org.apache.poi.hssf.record.formula.functions.Fixed3ArgFunction</code></li> + <li><code>org.apache.poi.hssf.record.formula.functions.Fixed4ArgFunction</code></li> + </ul> + <p> + Since <code>SQRTPI()</code> takes exactly one argument, we would start our implementation from + <code>Fixed1ArgFunction</code>. The differences for a BIFF8 <code>Fixed1ArgFunction</code> + are pretty small: + </p> +<source>package ...; +import org.apache.poi.ss.formula.eval.EvaluationException; +import org.apache.poi.ss.formula.eval.ErrorEval; +import org.apache.poi.ss.formula.eval.NumberEval; +import org.apache.poi.ss.formula.eval.OperandResolver; +import org.apache.poi.ss.formula.eval.ValueEval; +import org.apache.poi.ss.formula.functions.Fixed1ArgFunction; + +public final class SqrtPi extends Fixed1ArgFunction { + + public ValueEval evaluate(int srcRowIndex, int srcColumnIndex, ValueEval arg0) { + try { + ... + } +} +</source> + <p> + Now when the implementation is ready we need to register it with the formula evaluator. + This is the same no matter which kind of function we're creating. We simply add the + following line to the program that is using POI: + </p> +<source>WorkbookEvaluator.registerFunction("SQRTPI", SqrtPi); +</source> + <p> + Voila! The formula evaluator now recognizes <code>SQRTPI()</code>! + </p> + <section><title>Moving the function into the library</title> + <p> + If we choose instead to implement our function as part of the POI + library, the code is nearly identical. All POI functions + are part of one of two Java packages: <code>org.apache.poi.ss.formula.functions</code> + for BIFF8 Excel built-in functions, and <code>org.apache.poi.ss.formula.atp</code> + for Analysis ToolPak functions. The function still needs to implement the + appropriate base class, just as before. To implement our <code>SQRTPI()</code> + function in the POI library, we need to move the source code to + <code>poi/src/main/java/org/apache/poi/ss/formula/atp/SqrtPi.java</code> in + the POI source code, change the <code>package</code> statement, and add a + singleton instance: + </p> +<source>package org.apache.poi.ss.formula.atp; +... +public final class SqrtPi implements FreeRefFunction { + + public static final FreeRefFunction instance = new SqrtPi(); + + private SqrtPi() { + // Enforce singleton + } + ... +} +</source> + <p> + If our function had been one of the BIFF8 Excel built-ins, we would instead have moved + the source code to + <code>poi/src/main/java/org/apache/poi/ss/formula/functions/SqrtPi.java</code> in + the POI source code, and changed the <code>package</code> statement to: + </p> +<source>package org.apache.poi.ss.formula.functions; +</source> + <p> + POI library functions are registered differently from run-time-deployed functions. + Again, the techniques differ for the two types of library functions (remembering + that POI never releases the third type, UDFs). + For our Analysis ToolPak function, we have to update the list of functions in + <code>org.apache.poi.ss.formula.atp.AnalysisToolPak.createFunctionsMap()</code>: + </p> +<source>... +private Map<String, FreeRefFunction> createFunctionsMap() { + Map<String, FreeRefFunction> m = new HashMap<>(114); + ... + r(m, "SQRTPI", SqrtPi.instance); + ... +} +... +</source> + <p> + If our function had been one of the BIFF8 Excel built-ins, + the registration instead would require updating an entry in the formula-function table, + <code>poi/src/main/resources/org/apache/poi/ss/formula/function/functionMetadata.txt</code>: + </p> +<source>... +#Columns: (index, name, minParams, maxParams, returnClass, paramClasses, isVolatile, hasFootnote ) +... +359 SQRTPI 1 1 V V +... +</source> + <p> + and also updating the list of function implementation list in + <code>org.apache.poi.ss.formula.eval.FunctionEval.produceFunctions()</code>: + </p> +<source>... +private static Function[] produceFunctions() { + ... + retval[359] = new SqrtPi(); + ... +} +... +</source> + </section> + <section><title>Floating Point Arithmetic in Excel</title> + <p> + Excel uses the IEEE Standard for Double Precision Floating Point numbers + except two cases where it does not adhere to IEEE 754: + </p> + <ol> + <li>Positive and Negative Infinities: Infinities occur when you divide by 0. + Excel does not support infinities, rather, it gives a #DIV/0! error in these cases. + </li> + <li>Not-a-Number (NaN): NaN is used to represent invalid operations + (such as infinity/infinity, infinity-infinity, or the square root of -1). + NaNs allow a program to continue past an invalid operation. + Excel instead immediately generates an error such as #NUM! or #DIV/0!. + </li> + </ol> + <p> + Be aware of these two cases when saving results of your scientific calculations in Excel: + “where are my Infinities and NaNs? They are gone!” + </p> + </section> + <section><title>Testing Framework</title> + <p> + Automated testing of the implemented Function is easy. + The source code for this is in the file: <code>org.apache.poi.hssf.record.formula.GenericFormulaTestCase.java</code>. + This class has a reference to the test xls file (not <em>a</em> test xls, <em>the</em> test xls :) ) + which may need to be changed for your environment. Once you do that, in the test xls, + locate the entry for the function that you have implemented and enter different tests + in a cell in the FORMULA row. Then copy the "value of" the formula that you entered in the + cell just below it (this is easily done in excel as: + [copy the formula cell] > [go to cell below] > Edit > Paste Special > Values > "ok"). + You can enter multiple such formulas and paste their values in the cell below and the + test framework will automatically test if the formula evaluation matches the expected + value (Again, hard to put in words, so if you will, please take time to quickly look + at the code and the currently entered tests in the patch attachment "FormulaEvalTestData.xls" + file). + </p> + <note>This style of testing appears to have been abandoned. This section needs to be completely rewritten.</note> + </section> + </section> + <anchor id="appendixA"/> + <section> + <title>Appendix A — Functions supported by POI</title> + <p> + Functions supported by POI (as of v5.2.0 release) + </p> +<source>ABS +ACOS +ACOSH +ADDRESS +AND +AREAS +ASIN +ASINH +ATAN +ATAN2 +ATANH +AVEDEV +AVERAGE +AVERAGEIFS +BIN2DEC +CEILING +CHAR +CHOOSE +CLEAN +CODE +COLUMN +COLUMNS +COMBIN +COMPLEX +CONCAT +CONCATENATE +COS +COSH +COUNT +COUNTA +COUNTBLANK +COUNTIF +COUNTIFS +DATE +DATEVALUE +DAY +DAYS360 +DEC2BIN +DEC2HEX +DEGREES +DELTA +DEVSQ +DGET +DMAX +DMIN +DOLLAR +DSUM +EDATE +EOMONTH +ERROR.TYPE +EVEN +EXACT +EXP +FACT +FACTDOUBLE +FALSE +FIND +FIXED +FLOOR +FREQUENCY +FV +GEOMEAN +HEX2DEC +HLOOKUP +HOUR +HYPERLINK +IF +IFERROR +IFNA +IFS +IMAGINARY +IMREAL +INDEX +INDIRECT +INT +INTERCEPT +IPMT +IRR +ISBLANK +ISERR +ISERROR +ISEVEN +ISLOGICAL +ISNA +ISNONTEXT +ISNUMBER +ISODD +ISREF +ISTEXT +LARGE +LEFT +LEN +LN +LOG +LOG10 +LOOKUP +LOWER +MATCH +MAX +MAXA +MAXIFS +MDETERM +MEDIAN +MID +MIN +MINA +MINIFS +MINUTE +MINVERSE +MIRR +MMULT +MOD +MODE +MONTH +MROUND +NA +NETWORKDAYS +NOT +NOW +NPER +NPV +OCT2DEC +ODD +OFFSET +OR +PERCENTILE +PERCENTRANK +PERCENTRANK.EXC +PERCENTRANK.INC +PI +PMT +POISSON +POWER +PPMT +PRODUCT +PROPER +PV +QUOTIENT +RADIANS +RAND +RANDBETWEEN +RANK +RATE +REPLACE +REPT +RIGHT +ROMAN +ROUND +ROUNDDOWN +ROUNDUP +ROW +ROWS +SEARCH +SECOND +SIGN +SIN +SINGLE +SINH +SLOPE +SMALL +SQRT +STDEV +SUBSTITUTE +SUBTOTAL +SUM +SUMIF +SUMIFS +SUMPRODUCT +SUMSQ +SUMX2MY2 +SUMX2PY2 +SUMXMY2 +SWITCH +T +T.DIST +T.DIST.2T +T.DIST.RT +TAN +TANH +TDIST +TEXT +TEXTJOIN +TIME +TIMEVALUE +TODAY +TRANSPOSE +TREND +TRIM +TRUE +TRUNC +UPPER +VALUE +VAR +VARP +VLOOKUP +WEEKDAY +WEEKNUM +WORKDAY +XLOOKUP +XMATCH +YEAR +YEARFRAC</source> + </section> + </body> +</document> |