--- /dev/null
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" "dtd/document-v10.dtd">
+
+<document>
+ <header>
+ <title>POI 2.0 Vision Document</title>
+ <authors>
+ <person name="Andrew C. Oliver" email="acoliver2@users.sourceforge.net"/>
+ <person name="Marcus W. Johnson" email="mjohnson@apache.org"/>
+ <person name="Glen Stampoultzis" email="gstamp@iprimus.com.au"/>
+ <person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
+ </authors>
+ </header>
+
+ <body>
+
+ <s1 title="Preface">
+ <p>
+ This is the POI 2.0 cycle vision document. Although the vision
+ has not changed and this document is certainly not out of date and
+ the vision has not changed, the structure of the project has
+ changed a bit. We're not going to change the vision document to
+ reflect this (however proper that may be) because it would only
+ involve deletion. There is no purpose in providing less
+ information provded we give clarification.
+ </p>
+ <p>
+ This document was created before the POI components for
+ <link href="http://xml.apache.org/cocoon">Apache Cocoon</link>
+ were accepted into the Cocoon project itself. It was also
+ written before POI was accepted into Jakarta. So while the
+ vision hasn't changed some of the components are actually now
+ part of other projects. We'll still be working on them on the
+ same timeline roughly (minus the overhead of coordination with
+ other groups), but they are no longer technically part of the
+ POI project itself.
+ </p>
+ </s1>
+
+ <s1 title="1. Introduction">
+ <s2 title="1.1 Purpose of this document">
+ <p>
+ The purpose of this document is to
+ collect, analyze and define high-level requirements, user needs,
+ and features of the second release of the POI project software.
+ The POI project currently consists of the following components:
+ the HSSF Serializer, the HSSF library and the POIFS library.
+ </p>
+ <ul>
+ <li>
+ The HSSF Serializer is a set of Java classes whose main
+ class supports the Serializer interface from the Cocoon
+ 2 project and outputs the serialized data in a format
+ compatible with the spreadsheet program Microsoft Excel
+ '97.
+ </li>
+ <li>
+ The HSSF library is a set of classes for reading and
+ writing Microsoft Excel 97 file format using pure Java.
+ </li>
+ <li>
+ The POIFS library is a set of classes for reading and
+ writing Microsoft's OLE 2 Compound Document format using
+ pure Java.
+ </li>
+ </ul>
+ <p>By the completion of this release cycle the POI project will also
+ include the HSSF Generator and the HDF library.
+ </p>
+ <ul>
+ <li>The HSSF Generator will be responsible for using HSSF to read
+ in the XLS (Excel 97) file format and create SAX events. The HSSF
+ Generator will support the applicable interfaces specified by the
+ Apache Cocoon 2 project.
+ </li>
+ <li>The HDF library will provide a set of high level interfaces
+ for reading and writing Microsoft Word 97 file format using pure
+ Java.</li>
+ </ul>
+
+ </s2>
+
+
+ <s2 title="1.2 Project Overview">
+ <p>
+ The first release of the POI project
+ was an astounding success. This release seeks to build on that
+ success by:
+ </p>
+ <ul>
+ <li>
+ Refactoring POIFS into imput and
+ output classes as well as an event-driven API for reading.
+ </li>
+ <li>
+ Refactor HSSF for greater
+ performance as well as an event-driven API for reading
+ </li>
+ <li>
+ Extend HSSF by adding the ability to read and write formulas.
+ </li>
+ <li>
+ Extend HSSF by adding the ability to read and write
+ user-defined styles.
+ </li>
+ <li>
+ Create a Cocoon 2 Generator for HSSF using the same tags
+ as the HSSF Serializer.
+ </li>
+ <li>
+ Create a new library (HDF) for reading and writing
+ Microsoft Word DOC format.
+ </li>
+ <li>
+ Refactor the HSSFSerializer into a separate extensible
+ POIFSSerializer and HSSFSerializer
+ </li>
+ <li>
+ Providing the create excel charts. (write only)
+ </li>
+ </ul>
+ </s2>
+ </s1>
+ <s1 title="2. User Description">
+ <s2 title="2.1 User/Market Demographics">
+ <p>
+ There are a number of enthusiastic
+ users of XML, UNIX and Java technology. Furthermore, the Microsoft
+ solution for outputting Office Document formats often involves
+ actually manipulating the software as an OLE Server. This method
+ provides extremely low performance, extremely high overhead and is
+ only capable of handing one document at a time.
+ </p>
+ <ol>
+ <li>
+ Our intended audience for the HSSF
+ Serializer portion of this project are developers writing reports or
+ data extracts in XML format.
+ </li>
+ <li>
+ Our intended audience for the HSSF
+ library portion of this project is ourselves as we are developing
+ the HSSF serializer and anyone who needs to read and write Excel
+ spreadsheets in a non-XML Java environment, or who has specific
+ needs not addressed by the Serializer
+ </li>
+ <li>
+ Our intended audience for the
+ POIFS library is ourselves as we are developing the HSSF and HDF
+ libraries and anyone wishing to provide other libraries for
+ reading/writing other file formats utilizing the OLE 2 Compound
+ Document Format in Java.
+ </li>
+ <li>
+ Our intended audience for the HSSF
+ generator are developers who need to export Excel spreadsheets to
+ XML in a non-proprietary environment.
+ </li>
+ <li>
+ Our intended audience for the HDF
+ library is ourselves, as we will be developing a HDF Serializer in a
+ later release, and anyone wishing to add .DOC file processing and
+ creation to their projects.
+ </li>
+ </ol>
+ </s2>
+ <s2 title="2.2. User environment">
+ <p>
+ The users of this software shall be
+ developers in a Java environment on any operating system, or power
+ users who are capable of XML document generation/deployment.
+ </p>
+ </s2>
+ <s2 title="2.3. Key User Needs">
+ <p>
+ The HSSF library currently requires a
+ full object representation to be created before reading values. This
+ results in very high memory utilization. We need to reduce this
+ substantially for reading. It would be preferable to do this for
+ writing, but it may not be possible due to the constraints imposed by
+ the file format itself. Memory utilization during read is our top
+ user complaint.
+ </p>
+ <p>
+ The POIFS library currently requires a
+ full object representation to be created before reading values. This
+ results in very high memory utilization. We need to reduce this
+ substantially for reading.
+ </p>
+ <p>
+ The HSSF library currently ignores
+ formula cells and identifies them as "UnknownRecord" at the
+ lower level of the API. We must provide a way to read and write
+ formulas. This is now the top requested feature.
+ </p>
+ <p>
+ The HSSF library currently does not support
+ charts. This is a key requirement of some users who wish to use HSSF
+ in a reporting engine.
+ </p>
+ <p>
+ The HSSF Serializer currently does not
+ provide serialization for cell styling. User's will want stylish
+ spreadsheets to result from their XML.
+ </p>
+ <p>
+ There is currently no way to generate
+ the XML from an XLS that is consistent with the format used by the
+ HSSF Serializer.
+ </p>
+ <p>
+ There should be a way to read and write
+ the DOC file format using pure Java.
+ </p>
+
+ </s2>
+ <s2 title="2.4. Alternatives and Competition">
+ <p>
+ Alternatives to using HSSF to manipulate Excel files include:
+ </p>
+ <ol>
+ <li>Buy the $10,000 Formula 1 library
+ (<link href="http://www.f1j.com/">www.tidestone.com</link>)
+ now owned by Actuate and accept its crude api and limitations.
+ </li>
+ <li>Give up XML, Java, and operating system independence, and
+ write Visual Basic code in a Microsoft Windows based environment
+ </li>
+ <li>Try writing output in Microsoft's poorly documented XHTML
+ for Office format.
+ </li>
+ </ol>
+ <p>
+ There is also a decent library for
+ reading Excel documents written by Andy Khan called xlReader
+ (<link href="http://www.sourceforge.net/projects/xlrd">http://www.sourceforge.net/projects/xlrd</link>).
+ It does not provide write ability.
+ </p>
+ <p>
+ There are a number of PERL and C alternatives.
+ None are consistent.
+ </p>
+ </s2>
+ </s1>
+ <s1 title="3. Project Overview">
+ <s2 title="3.1. Project Perspective">
+ <p>
+ The produced code shall be licensed by
+ the Apache License as used by the Cocoon 2 project (APL 1.1) and
+ maintained on at <link href="http://poi.sourceforge.net/">http://poi.sourceforge.net</link>
+ and <link href="http://sourcefoge.net/projects/poi">http://sourcefoge.net/projects/poi</link>.
+ It is our hope to at some point integrate with the various Apache
+ projects (xml.apache.org and jakarta.apache.org), at which point we'd
+ turn the copyright over to them.
+ </p>
+ </s2>
+ <s2 title="3.2. Project Position Statement">
+ <p>
+ For developers on a Java and/or XML
+ environment this project will provide all the tools necessary for
+ outputting XML data in the Microsoft Excel format. This project seeks
+ to make the use of Microsoft Windows based servers unnecessary for
+ file format considerations and to fully document the OLE 2 Compound
+ Document format. The project aims not only to provide the tools for
+ serializing XML to Excel and Word file formats and the tools for
+ writing to those file formats from Java, but also to provide the
+ tools for later projects to convert other OLE 2 Compound Document
+ formats to pure Java APIs.
+ </p>
+ </s2>
+ <s2 title="3.3. Summary of Capabilities">
+ <p>
+ HSSF Serializer for Apache Cocoon 2
+ </p>
+ <table>
+ <tr>
+ <td>
+ Benefit
+ </td>
+ <td>
+ Supporting Features
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Ability to serialize styles from XML spreadsheets.
+ </td>
+ <td>
+ HSSFSerialzier will support styles.
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Ability to read and write formulas in XLS files.
+ </td>
+ <td>
+ HSSF will support reading/writing formulas.
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Ability to output in MS Word on any platform using Java.
+ </td>
+ <td>
+ The project will develop an API that outputs in Word format
+ using pure Java.
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Enhance performance for reading and writing XLS files.
+ </td>
+ <td>
+ HSSF will undergo a number of performance enhancements. HSSF
+ will include a new event-based API for reading XLS files. POIFS
+ will support a new event-based API for reading OLE2 CDF files.
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Ability to generate XML from XLS files
+ </td>
+ <td>
+ The project will develop an HSSF Generator.
+ </td>
+ </tr>
+ <tr>
+ <td>
+ The ability to generate charts
+ </td>
+ <td>
+ HSSF will provide low level support for chart records as well
+ as high level API support for generating charts. The ability
+ to read chart information will not initially be provided.
+ </td>
+ </tr>
+
+ </table>
+ </s2>
+ <s2 title="3.4. Assumptions and Dependencies">
+ <ul>
+ <li>
+ The HSSF Serializer and Generator
+ will support the Gnumeric 1.0 XML tag language.
+ </li>
+ <li>
+ The HSSF Generator and HSSF
+ Serializer will be mutually validating. It should be possible to
+ have an XLS file created by the Serializer run through the Generator
+ and the output back through the Serializer (via the Cocoon pipeline)
+ and get the same file or a reasonable facimille (no one cares if it
+ differs by the order of the binary records in some minor but
+ non-visually recognizable manner).
+ </li>
+ <li>
+ The HSSF Generator will run on any
+ Java 2 supporting platform with Apache Cocoon 2 installed along with
+ the HSSF and POIFS APIs.
+ </li>
+ <li>
+ The HSSF Serializer will run on
+ any Java 2 supporting platform with Apache Cocoon 2 installed along
+ with the HSSF and POIFS APIs.
+ </li>
+ <li>
+ The HDF API requires a Java 2
+ implementation and the POIFS API.
+ </li>
+ <li>
+ The HSSF API requires a Java 2
+ implementation and the POIFS API.
+ </li>
+ <li>
+ The POIFS API requires a Java 2
+ implementation.
+ </li>
+
+ </ul>
+ </s2>
+ </s1>
+ <s1 title="4. Project Features">
+ <p>
+ Enhancements to the POIFS API will
+ include:
+ </p>
+ <ul>
+ <li>
+ An event driven API for reading
+ POIFS Filesystems.
+ </li>
+ <li>
+ A low-level API for
+ creating/manipulating POI filesystems.
+ </li>
+ <li>
+ Code improvements supporting
+ greater separation between read and write structures.
+ </li>
+ </ul>
+ <p>
+ Enhancements to the HSSF API will
+ include:
+ </p>
+ <ul>
+ <li>
+ An event driven API for reading
+ XLS files.
+ </li>
+ <li>
+ Performance improvements.
+ </li>
+ <li>
+ Formula support (read/write)
+ </li>
+ <li>
+ Support for user-defined data
+ formats
+ </li>
+ <li>
+ Better documentation of the file
+ format and structure.
+ </li>
+ <li>
+ An API for creation of charts.
+ </li>
+ </ul>
+ <p>
+ The HSSF Generator will include:
+ </p>
+ <ul>
+ <li>
+ A set of classes supporting the
+ Cocoon 2 Generator interfaces providing a method for reading XLS
+ files and outputting SAX events.
+ </li>
+ <li>
+ The same tag format used by the
+ HSSFSerializer in any given release.
+ </li>
+ </ul>
+ <p>
+ The HDF API will include:
+ </p>
+ <ul>
+ <li>
+ An event driven API for reading
+ DOC files.
+ </li>
+ <li>
+ A set of high and low level APIs
+ for reading and writing DOC files.
+ </li>
+ <li>
+ Documentation of the DOC file
+ format or enhancements to existing documentation.
+ </li>
+ </ul>
+ </s1>
+ <s1 title="5. Other Product Requirements">
+ <s2 title="5.1. Applicable Standards">
+ <p>
+ All Java code will be 100% pure Java.
+ </p>
+ </s2>
+ <s2 title="5.2. System Requirements">
+ <p>
+ The minimum system requirements for the POIFS API are:
+ </p>
+ <ul>
+ <li>64 Mbytes memory</li>
+ <li>Java 2 environment</li>
+ <li>Pentium or better processor (or equivalent on other platforms)</li>
+ </ul>
+ <p>
+ The minimum system requirements for the the HSSF API are:
+ </p>
+ <ul>
+ <li>64 Mbytes memory</li>
+ <li>Java 2 environment</li>
+ <li>Pentium or better processor (or equivalent on other platforms)</li>
+ <li>POIFS API</li>
+ </ul>
+ <p>
+ The minimum system requirements for the the HDF API are:
+ </p>
+ <ul>
+ <li>64 Mbytes memory</li>
+ <li>Java 2 environment</li>
+ <li>Pentium or better processor (or equivalent on other platforms)</li>
+ <li>POIFS API</li>
+ </ul>
+
+ <p>
+ The minimum system requirements for the HSSF Serializer are:
+ </p>
+ <ul>
+ <li>64 Mbytes memory</li>
+ <li>Java 2 environment</li>
+ <li>Pentium or better processor (or equivalent on other platforms)</li>
+ <li>Cocoon 2</li>
+ <li>HSSF API</li>
+ <li>POI API</li>
+ </ul>
+ </s2>
+ <s2 title="5.3. Performance Requirements">
+ <p>
+ All components must perform well enough
+ to be practical for use in a webserver environment (especially
+ the "killer trio": Cocoon2/Tomcat/Apache combo)
+ </p>
+ </s2>
+ <s2 title="5.4. Environmental Requirements">
+ <p>
+ The software will run primarily in
+ developer environments. We should make some allowances for
+ not-highly-technical users to write XML documents for the HSSF
+ Serializer. All other components will assume intermediate Java 2
+ knowledge. No XML knowledge will be required except for using the
+ HSSF Serializer. As much documentation as is practical shall be
+ required for all components as XML is relatively new, and the
+ concepts introduced for writing spreadsheets and to POI filesystems
+ will be brand new to Java and many Java developers.
+ </p>
+ </s2>
+ </s1>
+ <s1 title="6. Documentation Requirements">
+ <s2 title="6.1 POI Filesystem">
+ <p>
+ The filesystem as read and written by
+ POI shall be fully documented and explained so that the average Java
+ developer can understand it.
+ </p>
+ </s2>
+ <s2 title="6.2. POI API">
+ <p>
+ The POI API will be fully documented
+ through Javadoc. A walkthrough of using the high level POI API shall
+ be provided. No documentation outside of the Javadoc shall be
+ provided for the low-level POI APIs.
+ </p>
+ </s2>
+ <s2 title="6.3. HSSF File Format">
+ <p>
+ The HSSF File Format as implemented by
+ the HSSF API will be fully documented. No documentation will be
+ provided for features that are not supported by HSSF API that are
+ supported by the Excel 97 File Format. Care will be taken not to
+ infringe on any "legal stuff". Additionally, we are
+ collaborating with the fine folks at OpenOffice.org on
+ *free* documentation of the format.
+ </p>
+ </s2>
+ <s2 title="6.4. HSSF API">
+ <p>
+ The HSSF API will be documented by
+ javadoc. A walkthrough of using the high level HSSF API shall be
+ provided. No documentation outside of the Javadoc shall be provided
+ for the low level HSSF APIs.
+ </p>
+ </s2>
+ <s2 title="6.5 HDF API">
+ <p>
+ The HDF API will be documented by
+ javadoc. A walkthrough of using the high level HDF API shall be
+ provided. No documentation outside of the Javadoc shall be provided
+ for the low level HDF APIs.
+ </p>
+ </s2>
+ <s2 title="6.6 HSSF Serializer">
+ <p>
+ The HSSF Serializer will be documented
+ by javadoc.
+ </p>
+ </s2>
+ <s2 title="6.7 HSSF Generator">
+ <p>
+ The HSSF Generator will be documented
+ by javadoc.
+ </p>
+ </s2>
+ <s2 title="6.8 HSSF Serializer Tag language">
+ <p>
+ The XML tag language along with
+ function and usage shall be fully documented. Examples will be
+ provided as well.
+ </p>
+ </s2>
+ </s1>
+ <s1 title="7. Terminology">
+ <s2 title="7.1 Filesystem">
+ <p>
+ filesystem shall refer only to the POI formatted archive.
+ </p>
+ </s2>
+ <s2 title="7.2 File">
+ <p>
+ file shall refer to the embedded data stream within a
+ POI filesystem. This will be the actual embedded document.
+ </p>
+ </s2>
+ </s1>
+</body>
+</document>
+