git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353679 13f79535-47bb-0310-9956-ffa450edef68

19 years ago · 14d7002a0a
--- a/src/documentation/content/xdocs/hssf/book.xml
+++ b/src/documentation/content/xdocs/hssf/book.xml
@@ -15,6 +15,8 @@
        <menu-item label="Quick Guide" href="quick-guide.html"/>
        <menu-item label="HOWTO" href="how-to.html"/>
        <menu-item label="Formula Support" href="formula.html" />
        <menu-item label="Formula Evaluation" href="eval.html" />
 		<menu-item label="Eval Dev Guide" href="eval-devguide.html" />
        <menu-item label="Use Case" href="use-case.html"/>
        <menu-item label="Pictorial Docs" href="diagrams.html"/>
        <menu-item label="Alternatives" href="alternatives.html"/>
--- a/src/documentation/content/xdocs/hssf/eval-devguide.xml
+++ b/src/documentation/content/xdocs/hssf/eval-devguide.xml
@@ -0,0 +1,184 @@
 <?xml version="1.0" encoding="UTF-8"?>
 <!-- Copyright (C) 2005 The Apache Software Foundation. All rights reserved. -->
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">

 <document>
    <header>
        <title>Developing Formula Evaluation</title>
        <authors>
 			<person email="amoweb@yahoo.com" name="Amol Deshmukh" id="AD"/>
        </authors>
    </header>
    <body>
 	<section><title>Introduction</title>
 		<p>This document is for developers wishing to contribute to the 
 			FormulaEvaluator API functionality.</p>
 		<p>Currently, contribution is desired for implementing the standard MS 
 			excel functions. Place holder classes for these have been created, 
 			contributors only need to insert implementation for the 
 			individual "evaluate()" methods that do the actual evaluation.</p>
 	</section>
 	<section><title>Overview of FormulaEvaluator </title>
 		<p>Briefly, a formula string (along with the sheet and workbook that 
 			form the context in which the formula is evaluated) is first parsed 
 			into RPN tokens using the FormulaParser class in POI-HSSF main. 
 			(If you dont know what RPN tokens are, now is a good time to 
 			read <link href="http://www-stone.ch.cam.ac.uk/documentation/rrf/rpn.html">
 			this</link>.) 
 		</p>
 		<section><title> The big picture</title>
 			<p>RPN tokens are mapped to Eval classes. (Class hierarchy for the Evals 
 				is best understood if you view the class diagram in a class diagram 
 				viewer.) Depending on the type of RPN token (also called as Ptgs 
 				henceforth since that is what the FormulaParser calls the classes) a 
 				specific type of Eval wrapper is constructed to wrap the RPN token and 
 				is pushed on the stack.... UNLESS the Ptg is an OperationPtg. If it is an 
 				OperationPtg, an OperationEval instance is created for the specific 
 				type of OperationPtg. And depending on how many operands it takes, 
 				that many Evals are popped of the stack and passed in an array to 
 				the OperationEval instance's evaluate method which returns an Eval 
 				of subtype ValueEval.Thus an operation in the formula is evaluated. </p>
 				<note> An Eval is of subinterface ValueEval or OperationEval. 
 				Operands are always ValueEvals, Operations are always OperationEvals.</note> 
 				<p><code>OperationEval.evaluate(Eval[])</code> returns an Eval which is supposed 
 				to be of type ValueEval (actually since ValueEval is an interface, 
 				the return value is instance of one of the implementations of 
 				ValueEval). The valueEval resulting from evaluate() is pushed on the 
 				stack and the next RPN token is evaluated.... this continues till 
 				eventually there are no more RPN tokens at which point, if the formula 
 				string was correctly parsed, there should be just one Eval on the 
 				stack - which contains the result of evaluating the formula.</p>
 			<p>Ofcourse I glossed over the details of how AreaPtg and ReferencePtg 
 				are handled a little differently, but the code should be self 
 				explanatory for that. Very briefly, the cells included in AreaPtg and 
 				RefPtg are examined and their values are populated in individual 
 				ValueEval objects which are set into the AreaEval and RefEval (ok, 
 				since AreaEval and RefEval are interfaces, the implementations of 
 				AreaEval and RefEval - but you'll figure all that out from the code)</p>
 			<p>OperationEvals for the standard operators have been implemented and 
 				basic testing has been done </p>
 		</section>
 		<section><title> FunctionEval and FuncVarEval</title>
 			<p>FunctionEval is an abstract super class of FuncVarEval. The reason for this is that in the FormulaParser Ptg classes, there are two Ptgs, FuncPtg and FuncVarPtg. In my tests, I did not see FuncPtg being used so there is no corresponding FuncEval right now. But in case the need arises for a FuncVal class, FuncEval and FuncVarEval need to be isolated with a common interface/abstract class, hence FunctionEval.</p>
 			<p>FunctionEval also contains the mapping of which function class maps to which function index. This mapping has been done for all the functions, so all you really have to do is implement the evaluate method in the function class that has not already been implemented. The Function indexes are defined in AbstractFunctionPtg class in POI main.</p>
 		</section>
 	</section>
 	<section><title>Walkthrough of an "evaluate()" implementation.</title>
 		<p>So here is the fun part - lets walk through the implementation of the excel 
 			function... <strong>AVERAGE()</strong> </p>
 		<section><title>The Code</title>
 		<source>
 public Eval evaluate(Eval[] operands) {
    double d = 0;
    int count = 0;
    ValueEval retval = null;
    for (int i = 0, iSize = operands.length; i &lt; iSize; i++) {
        if (operands[i] == null) continue;
        if (operands[i] instanceof AreaEval) {
            AreaEval ap = (AreaEval) operands[i];
            Object[] values = ap.getValues();
            for (int j = 0, jSize = values.length; j &lt; jSize; j++) {
                if (values[j] == null) continue;
                if (values[j] instanceof NumberEval) {
                //inside areas, ignore bools
                    d += ((NumberEval) values[j]).getNumberValue();
                    count++;
                }
                else if (values[j] instanceof RefEval) {
                    RefEval re = (RefEval) values[j];
                    ValueEval ve = re.getInnerValueEval();
                    if (ve != null &amp;&amp; ve instanceof NumberEval) {
                        d += ((NumberEval) ve).getNumberValue();
                        count++;
                    }
                }
            }
        }
        else if (operands[i] instanceof NumericValueEval) { 
            // for direct operands evaluate bools
            NumericValueEval np = (NumericValueEval) operands[i];
            d += np.getNumberValue();
            count++;
        }
        else if (operands[i] instanceof RefEval) {
            RefEval re = (RefEval) operands[i];
            ValueEval ve = re.getInnerValueEval();
            if (ve instanceof NumberEval) { 
                //if it is a reference, ignore bools
                NumberEval ne = (NumberEval) ve;
                d += ne.getNumberValue();
                count++;
            }
        }
    }

    if (retval == null) {
        retval = (Double.isNaN(d)) ? 
          (ValueEval) ErrorEval.ERROR_503 : new NumberEval(d/count);
    }
    return retval;
 }
 	</source>
 		</section>
 		<section><title>Implementation Details</title>
 		<ul>
 			<li>The implementation of the AVERAGE function lives in package 
 				o.a.p.hssf.record.formula.functions named  Average.java. 
 				(Every excel function has a corresponding java source file 
 				in the above package) </li>
 			<li>If you open the file for a function thats not yet implemented, you will see one un-implemented method: 
 				<code>public Eval evaluate(Eval[] operands) {}</code> </li>
 			<li>Since the excel Average() function can take 1 or more operands, we iterate over all operands that are passed in the evaluate method:
 				<code>for (int i=0, iSize=operands.length; i&lt;iSize; i++) {...}</code></li>
 			<li>inside the loop, you will do the following
 				<ol>
 					<li>Do a null check: <code>if (operands[i] == null) continue;</code></li>
 					<li>Figure out the actual subtype of ValueEval that the operands 
 						implements. The possible types that you will encounter in an 
 						evaluate() are: NumberEval, BoolEval, StringEval, ErrorEval, 
 						AreaEval, RefEval, BlankEval.</li>
 					<li>Implement the function. See the next section for some
 						caveats on implementing the Excel semantics. </li>
 				</ol>
 			</li>
 			<li>Finally before returning the NumberEval wrapping the double value that 
 				you computed, do one final check to see if the double is a NaN, 
 				if it is return ErrorEval.ERROR_503 (see the javadoc in ErrorEval.java 
 				for description of error codes - it is html so you might as well 
 				generate the javadocs)</li>
 			
 		</ul>
 		</section>
 		<section><title>Modelling Excel Semantics</title>
 			<p>Strings are ignored. Booleans are ignored!!! (damn Oo.o!  I was almost misled here - nevermind). Actually here's the info on Bools: 
 				if you have formula: "=TRUE+1", it evaluates to 2. 
 				So also, when you use TRUE like this: "=SUM(1,TRUE)", you see the result is: 2. 
 				So TRUE means 1 when doing numeric calculations, right? 
 				Wrong!
 				Because when you use TRUE in referenced cells with arithmetic functions, it evaluates to blank - meaning it is not evaluated - as if it was string or a blank cell. 
 				eg. "=SUM(1,A1)" when A1 is TRUE evaluates to 1.
 				So you have to do this kind of check for every possible data type as a function argument for any function before you understand the behaviour of the function. The operands can be entered in excel as comma separated or as a region specified like: A2:D4. Regions are treated as a single token by the parser hence we have AreaEval which stores the ValueEval at each cell in a region in a 1D array. So in our function if the operand is of type AreaEval we need to get the array of ValueEvals in the region of the AreaEval and iterate over each of them as if each of them were individual operands to the AVERAGE function. 
 				</p>
 			<p>Thus, since sometimes, Excel treats 
 						Booleans as the numbers 0 and 1 (for F and T respectively). 
 						Hence BoolEval and NumberEval both implement a common interface: 
 						NumericValueEval (since numbers and bools are also valid string 
 						values, they also implement StringValueEval interface which is 
 						also implemented by StringEval).</p>
 			<p>
 			The ValueEval inside an AreaEval can be one of: 
 				NumberEval, BoolEval, StringEval, ErrorEval, BlankEval. 
 			So you must handle each of these cases. 
 			Similarly, RefEvals have a property: innerValueEval that returns the ValueEval at the referenced cell. The ValueEval inside a RefEval can be one of: NumberEval, BoolEval, StringEval, ErrorEval, BlankEval. So you must handle each of these cases  - see how excel treats each one of them.
 			</p>				

 		</section>
 	</section>
 	<section><title>Testing Framework</title>
 	<fixme author="AD">TODO! FormulaEval comes with a testing framework, where you add 
 		formula's and their expected values to an Excel sheet, and the test code
 		automatically validates them. Since this is still in flux, the docs
 		will be put online once the system is stable </fixme>	
 	</section>
 	</body>
 </document>
--- a/src/documentation/content/xdocs/hssf/eval.xml
+++ b/src/documentation/content/xdocs/hssf/eval.xml
@@ -0,0 +1,151 @@
 <?xml version="1.0" encoding="UTF-8"?>
 <!-- Copyright (C) 2005 The Apache Software Foundation. All rights reserved. -->
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">

 <document>
    <header>
        <title>Formula Evaluation</title>
        <authors>
 			<person email="amoweb@yahoo.com" name="Amol Deshmukh" id="AD"/>
        </authors>
    </header>
    <body>
 		<section><title>Introduction</title>
 			<p>The POI formula evaluation code enables you to calculate the result of 
 				formulas in Excels sheets read-in, or created in POI. This document explains
 				how to use the API to evaluate your formulas. 
 			</p>
 			<warning> This code currently lives in Bugzilla as 
 				<link href="http://issues.apache.org/bugzilla/show_bug.cgi?id=34828"> 
 				bug 34828 </link>. It is expected to land in POI CVS in the scratchpad 
 				area soon.
 			</warning>
 		</section>
 		<section><title>Status</title>
 			<p>	The code currently provides implementations for all the arithmatic operators.
 				It also provides implementations for about 30 built in 
 				functions in Excel. The framework however makes is easy to add 
 			 	implementation of new functions. See the <link href="eval-devguide.html"> Formula
 				evaluation development guide</link> for details. </p>
 			<p> Note that user-defined functions are not supported, and is not likely to done
 				any time soon... at least, not till there is a VB implementation in Java!
 			</p>
 		</section>
 		<section><title>User API How-TO</title>
 			<p>The following code demonstrates how to use the HSSFFormulaEvaluator 
 				in the context of other POI excel reading code.
 			</p>
 			<p>There are two ways in which you can use the HSSFFormulaEvalutator API.</p>
 			<section><title>Using HSSFFormulaEvaluator.<strong>evaluate</strong>(HSSFCell cell)</title>
 				<source>
 FileInputStream fis = new FileInputStream("c:/temp/test.xls");
 HSSFWorkbook wb = new HSSFWorkbook(fis);
 HSSFSheet sheet = wb.getSheetAt(0);
 HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(sheet, wb);

 // suppose your formula is in B3
 CellReference cellReference = new CellReference("B3"); 
 HSSFRow row = sheet.getRow(cellReference.getRow());
 HSSFCell cell = row.getCell(cellReference.getCol()); 
 String formulaString = c.getCellFormula();
 HSSFFormulaEvaluator.CellValue cellValue = 
        evaluator.evaluate(formulaString);

 switch (cellValue.getCellType()) {
 	case HSSFCell.CELL_TYPE_BOOLEAN:
    	System.out.println(cellValue.getBooleanCellValue());
    	break;
 	case HSSFCell.CELL_TYPE_NUMERIC:
    	System.out.println(cellValue.getNumberCellValue());
    	break;
 	case HSSFCell.CELL_TYPE_STRING:
    	System.out.println(cellValue.getStringCellValue());
    	break;
 	case HSSFCell.CELL_TYPE_BLANK:
    	break;
 	case HSSFCell.CELL_TYPE_ERROR:
    	break;

 	// CELL_TYPE_FORMULA will never happen
 	case HSSFCell.CELL_TYPE_FORMULA: 
    	break;
 }				
 				</source>
 				<p>Thus using the retrieved value (of type 
 					HSSFFormulaEvaluator.CellValue - a nested class) returned 
 					by HSSFFormulaEvaluator is similar to using a HSSFCell object 
 					containing the value of the formula evaluation. CellValue is 
 					a simple value object and does not maintain reference 
 					to the original cell.
 				</p>
 				
 			</section>
 			<section><title>Using HSSFFormulaEvaluator.<strong>evaluateInCell</strong>(HSSFCell cell)
 				</title>
 				<source>
 FileInputStream fis = new FileInputStream("c:/temp/test.xls");
 HSSFWorkbook wb = new HSSFWorkbook(fis);
 HSSFSheet sheet = wb.getSheetAt(0);
 HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(sheet, wb);

 // suppose your formula is in B3
 CellReference cellReference = new CellReference("B3"); 
 HSSFRow row = sheet.getRow(cellReference.getRow());
 HSSFCell cell = row.getCell(cellReference.getCol()); 
 String formulaString = c.getCellFormula();

 if (cell!=null) {
 	switch (<strong>evaluator.evaluateInCell</strong>(cell).getCellType()) {
 		case HSSFCell.CELL_TYPE_BOOLEAN:
 		    System.out.println(cell.getBooleanCellValue());
 		    break;
 		case HSSFCell.CELL_TYPE_NUMERIC:
 		    System.out.println(cell.getNumberCellValue());
 		    break;
 		case HSSFCell.CELL_TYPE_STRING:
 		    System.out.println(cell.getStringCellValue());
 		    break;
 		case HSSFCell.CELL_TYPE_BLANK:
 		    break;
 		case HSSFCell.CELL_TYPE_ERROR:
 		    System.out.println(cell.getErrorCellValue());
 		    break;
 		
 		// CELL_TYPE_FORMULA will never occur
 		case HSSFCell.CELL_TYPE_FORMULA: 
 		    break;
 	}
 }
 					</source>

 				</section>
 		</section>
 		
 		<section><title></title>
 			
 		</section>
 		
 		<section><title>Performance Notes</title>
 			<ul>
 				<li>Generally you should have to create only one HSSFFormulaEvaluator 
 					instance per sheet, but there really is no overhead in creating 
 					multiple HSSFFormulaEvaluators per sheet other than that of the 
 					HSSFFormulaEvaluator object creation. 
 				</li>
 				<li>Also note that HSSFFormulaEvaluator maintains a reference to 
 					the sheet and workbook, so ensure that the evaluator instance 
 					is available for garbage collection when you are done with it 
 					(in other words don't maintain long lived reference to 
 					HSSFFormulaEvaluator if you don't really need to - unless 
 					all references to the sheet and workbook are removed, these 
 					don't get garbage collected and continue to occupy potentially 
 					large amounts of memory). 
 				</li>	
 				<li>CellValue instances however do not maintain reference to the 
 					HSSFCell or the sheet or workbook, so these can be long-lived 
 					objects without any adverse effect on performance.
 				</li>
 			</ul>
 		</section>
 	</body>
 </document>