aboutsummaryrefslogtreecommitdiffstats
path: root/src/documentation/content/xdocs/components
diff options
context:
space:
mode:
Diffstat (limited to 'src/documentation/content/xdocs/components')
-rw-r--r--src/documentation/content/xdocs/components/configuration.xml232
-rw-r--r--src/documentation/content/xdocs/components/diagram/index.xml107
-rw-r--r--src/documentation/content/xdocs/components/document/docoverview.xml113
-rw-r--r--src/documentation/content/xdocs/components/document/index.xml235
-rw-r--r--src/documentation/content/xdocs/components/document/projectplan.xml392
-rw-r--r--src/documentation/content/xdocs/components/document/quick-guide-xwpf.xml89
-rw-r--r--src/documentation/content/xdocs/components/document/quick-guide.xml88
-rw-r--r--src/documentation/content/xdocs/components/hmef/index.xml216
-rw-r--r--src/documentation/content/xdocs/components/hpbf/file-format.xml197
-rw-r--r--src/documentation/content/xdocs/components/hpbf/index.xml77
-rw-r--r--src/documentation/content/xdocs/components/hpsf/how-to.xml1477
-rw-r--r--src/documentation/content/xdocs/components/hpsf/index.xml73
-rw-r--r--src/documentation/content/xdocs/components/hpsf/internals.xml1079
-rw-r--r--src/documentation/content/xdocs/components/hpsf/thumbnails.xml198
-rw-r--r--src/documentation/content/xdocs/components/hpsf/todo.xml77
-rw-r--r--src/documentation/content/xdocs/components/hsmf/index.xml65
-rw-r--r--src/documentation/content/xdocs/components/index.xml423
-rw-r--r--src/documentation/content/xdocs/components/logging.xml290
-rw-r--r--src/documentation/content/xdocs/components/oxml4j/index.xml45
-rw-r--r--src/documentation/content/xdocs/components/poi-jvm-languages.xml351
-rw-r--r--src/documentation/content/xdocs/components/poi-ruby.xml151
-rw-r--r--src/documentation/content/xdocs/components/poifs/design.xml1099
-rw-r--r--src/documentation/content/xdocs/components/poifs/embeded.xml95
-rw-r--r--src/documentation/content/xdocs/components/poifs/fileformat.xml703
-rw-r--r--src/documentation/content/xdocs/components/poifs/how-to.xml649
-rw-r--r--src/documentation/content/xdocs/components/poifs/index.xml58
-rw-r--r--src/documentation/content/xdocs/components/poifs/usecases.xml653
-rw-r--r--src/documentation/content/xdocs/components/slideshow/how-to-shapes.xml642
-rw-r--r--src/documentation/content/xdocs/components/slideshow/index.xml72
-rw-r--r--src/documentation/content/xdocs/components/slideshow/ppt-file-format.xml367
-rw-r--r--src/documentation/content/xdocs/components/slideshow/ppt-wmf-emf-renderer.xml209
-rw-r--r--src/documentation/content/xdocs/components/slideshow/quick-guide.xml133
-rw-r--r--src/documentation/content/xdocs/components/slideshow/xslf-cookbook.xml304
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/chart.xml1532
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/converting.xml232
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/diagram1.xml40
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/diagrams.xml56
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/eval-devguide.xml591
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/eval.xml410
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/examples.xml274
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/excelant.xml317
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/formula.xml120
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/hacking-hssf.xml89
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/how-to.xml884
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/index.xml119
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/limitations.xml99
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/quick-guide.xml2455
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/record-generator.xml212
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/use-case.xml200
-rw-r--r--src/documentation/content/xdocs/components/spreadsheet/user-defined-functions.xml414
50 files changed, 19003 insertions, 0 deletions
diff --git a/src/documentation/content/xdocs/components/configuration.xml b/src/documentation/content/xdocs/components/configuration.xml
new file mode 100644
index 0000000000..71a1557164
--- /dev/null
+++ b/src/documentation/content/xdocs/components/configuration.xml
@@ -0,0 +1,232 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Apache POI™ - Configuration</title>
+ <authors>
+ <person id="POI" name="POI Developers" email="dev@poi.apache.org"/>
+ </authors>
+ </header>
+
+ <body>
+ <section><title>Overview</title>
+ <p>The best way to learn about using Apache POI is to read through the <a href="index.html">feature documentation</a>
+ and other online examples online.
+ </p>
+ <p>To keep the features documentation focused on the APIs, there is little mention of some of the configuration
+ settings that can be enabled that may prove useful to users who have to handle very large documents or very
+ large throughput.
+ </p>
+ </section>
+ <section><title>Configuration via Java-code when calling Apache POI</title>
+ <p>These API methods allow to configure behavior of Apache POI for special needs, e.g. when processing excessively
+ large files.
+ </p>
+ <table>
+ <tr>
+ <th>Configuration Setting</th>
+ <th>Description</th>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.ooxml.POIXMLTypeLoader.DEFAULT_XML_OPTIONS</td>
+ <td>POI support for XSSF APIs relies heavily on <a href="https://xmlbeans.apache.org">XMLBeans</a>.
+ This instance can be <a href="https://xmlbeans.apache.org/docs/5.0.0/org/apache/xmlbeans/XmlOptions.html">configured</a>.
+ It is recommended to take care if you do change any of the config items.
+ In POI 5.1.0, we will disallow Doc Type parsing in the XML files embedded in xlsx/docx/pptx/etc files, by default.
+ DEFAULT_XML_OPTIONS.setDisallowDocTypeDeclaration(false) will undo this change.
+ </td>
+ </tr>
+
+ <tr>
+ <td><a href="https://poi.apache.org/apidocs/5.0/org/apache/poi/util/IOUtils.html#setByteArrayMaxOverride-int-">
+ org.apache.poi.util.IOUtils.setByteArrayMaxOverride(int maxOverride)</a>
+ </td>
+ <td>If this value is set to > 0, IOUtils.safelyAllocate(long, int) will ignore the maximum record length parameter.
+ This is designed to allow users to bypass the hard-coded maximum record lengths if they are willing to accept the risk of allocating memory up to the size specified.
+ It also allows to impose a lower limit than used for very memory constrained systems.
+ <p>
+ <strong>Note</strong>: This is a per-allocation limit and does not allow you to limit overall sum of allocations! Use -1 for using the limits specified per record-type.
+ </p>
+ </td>
+ </tr>
+
+ <tr>
+ <td><a href="https://poi.apache.org/apidocs/5.0/org/apache/poi/openxml4j/util/ZipSecureFile.html#setMinInflateRatio-double-">
+ org.apache.poi.openxml4j.util.ZipSecureFile.setMinInflateRatio(double ratio)</a>
+ </td>
+ <td>Sets the ratio between de- and inflated bytes to detect zipbomb.
+ It defaults to 1% (= 0.01d), i.e. when the compression is better than 1% for any given read package part, the parsing will fail indicating a Zip-Bomb.
+ </td>
+ </tr>
+
+ <tr>
+ <td><a href="https://poi.apache.org/apidocs/5.0/org/apache/poi/openxml4j/util/ZipSecureFile.html#setMaxEntrySize-long-">
+ org.apache.poi.openxml4j.util.ZipSecureFile.setMaxEntrySize(long maxEntrySize)</a>
+ </td>
+ <td>Sets the maximum file size of a single zip entry. It defaults to 4GB, i.e. the 32-bit zip format maximum.
+ This can be used to limit memory consumption and protect against security vulnerabilities when documents are provided by users.
+ POI 5.1.0 removes the previous limit of 4GB on this setting.
+ </td>
+ </tr>
+
+ <tr>
+ <td><a href="https://poi.apache.org/apidocs/5.0/org/apache/poi/openxml4j/util/ZipSecureFile.html#setMaxTextSize-long-">
+ org.apache.poi.openxml4j.util.ZipSecureFile.setMaxTextSize(long maxTextSize)</a>
+ </td>
+ <td>Sets the maximum number of characters of text that are extracted before an exception is thrown during extracting text from documents.
+ This can be used to limit memory consumption and protect against security vulnerabilities when documents are provided by users.
+ The default is approx 10 million chars. Prior to POI 5.1.0, the max allowed was approx 4 billion chars.
+ </td>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.setThresholdBytesForTempFiles(int thresholdBytes)
+ </td>
+ <td><strong>Added in POI 5.1.0.</strong>
+ Number of bytes at which a zip entry is regarded as too large for holding in memory
+ and the data is put in a temp file instead - defaults to -1 meaning temp files are not used
+ and that zip entries with more than 2GB of data after decompressing will fail, 0 means all
+ zip entries are stored in temp files. A threshold like 50000000 (approx 50Mb is recommended)
+ </td>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.setEncryptTempFiles(boolean encrypt)
+ </td>
+ <td><strong>Added in POI 5.1.0.</strong>
+ Whether temp files should be encrypted (default false). Only affects temp files related to zip entries.
+ </td>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.openxml4j.opc.ZipPackage.setUseTempFilePackageParts(boolean tempFilePackageParts)
+ </td>
+ <td><strong>Added in POI 5.1.0.</strong>
+ Whether to save package part data in temp files to save memory (default=false).
+ </td>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.openxml4j.opc.ZipPackage.setEncryptTempFilePackageParts(boolean encryptTempFiles)
+ </td>
+ <td><strong>Added in POI 5.1.0.</strong>
+ Whether to encrypt package part temp files (default=false).
+ </td>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.extractor.ExtractorFactory.setThreadPrefersEventExtractors(boolean preferEventExtractors) and
+ org.apache.poi.extractor.ExtractorFactory.setAllThreadsPreferEventExtractors(Boolean preferEventExtractors)
+ </td>
+ <td>
+ When creating text-extractors for documents, allows to choose a different type of extractor which parses documents
+ via an event-based parser.
+ </td>
+ </tr>
+
+ <tr>
+ <td>Various classes: setMaxRecordLength(int length)
+ </td>
+ <td>
+ Allows to override the default max record length for various classes which
+ parse input data. E.g. XMLSlideShow, XSSFBParser, HSLFSlideShow, HWPFDocument,
+ HSSFWorkbook, EmbeddedExtractor, StringUtil, ...
+ <br/>
+ This may be useful if you try to process very large files which otherwise trigger
+ the excessive-memory-allocation prevention in Apache POI.
+ </td>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.xslf.usermodel.XSLFPictureData.setMaxImageSize(int length)
+ </td>
+ <td>
+ Allows to override the default max image size allowed for XSLF pictures.
+ </td>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.xssf.usermodel.XSSFPictureData#setMaxImageSize(int length)
+ </td>
+ <td>
+ Allows to override the default max image size allowed for XSSF pictures.
+ </td>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.xwpf.usermodel.XWPFPictureData#setMaxImageSize(int length)
+ </td>
+ <td>
+ Allows to override the default max image size allowed for XWPF pictures.
+ </td>
+ </tr>
+
+ </table>
+ </section>
+ <section><title>Observed Java System Properties</title>
+ <p>Apache POI supports some Java System Properties.
+ </p>
+ <table>
+ <tr>
+ <th>System property</th>
+ <th>Description</th>
+ </tr>
+
+ <tr>
+ <td>java.io.tmpdir</td>
+ <td>
+ Apache POI uses the default mechanism of the JDK for specifying the location of
+ temporary files.
+ </td>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.hwpf.preserveBinTables and org.apache.poi.hwpf.preserveTextTable</td>
+ <td>
+ Allows to adjust how parsing Word documents via HWPF is handling tables.
+ </td>
+ </tr>
+
+ <tr>
+ <td>org.apache.poi.ss.ignoreMissingFontSystem</td>
+ <td><strong>Added in POI 5.2.3.</strong>
+ Instructs Apache POI to ignore some errors due to missing fonts and thus allows
+ to perform more functionality even when no fonts are installed.
+ <br/>
+ Note: Some functionality will still not be possible as it cannot use default-values, e.g. rendering
+ slides, drawing, ...
+ </td>
+ </tr>
+ </table>
+ </section>
+ </body>
+
+ <footer>
+ <legal>
+ Copyright (c) @year@ The Apache Software Foundation. All rights reserved.
+ <br />
+ Apache POI, POI, Apache, the Apache feather logo, and the Apache
+ POI project logo are trademarks of The Apache Software Foundation.
+ </legal>
+ </footer>
+</document>
diff --git a/src/documentation/content/xdocs/components/diagram/index.xml b/src/documentation/content/xdocs/components/diagram/index.xml
new file mode 100644
index 0000000000..5f060e8318
--- /dev/null
+++ b/src/documentation/content/xdocs/components/diagram/index.xml
@@ -0,0 +1,107 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Apache POI™ - HDGF and XDGF - Java API To Access Microsoft Visio Format Files</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person id="pd" name="POI Developers" email="dev@poi.apache.org"/>
+ </authors>
+ </header>
+
+ <body>
+ <section>
+ <title>Overview</title>
+
+ <p>HDGF is the POI Project's pure Java implementation of the
+ Visio binary (VSD) file format. XDGF is the POI Project's
+ pure Java implementation of the Visio XML (VSDX) file format.</p>
+ <!-- TODO More about XDGF here! -->
+ <p>Currently, HDGF provides a low-level, read-only api for
+ accessing Visio documents. It also provides a
+ <a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/src/main/java/org/apache/poi/hdgf/extractor/">way</a>
+ to extract the textual content from a file.
+ </p>
+ <p>At this time, there is no <em>usermodel</em> api or similar,
+ only low level access to the streams, chunks and chunk commands.
+ Users are advised to check the unit tests to see how everything
+ works. They are also well advised to read the documentation
+ supplied with
+ <a href="https://web.archive.org/web/20071212220759/https://www.gnome.ru/projects/vsdump_en.html">vsdump</a>
+ to get a feel for how Visio files are structured.</p>
+ <p>To get a feel for the contents of a file, and to track down
+ where data of interest is stored, HDGF comes with
+ <a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/src/main/java/org/apache/poi/hdgf/dev/">VSDDumper</a>
+ to print out the contents of the file. Users should also make
+ use of
+ <a href="https://web.archive.org/web/20071212220759/https://www.gnome.ru/projects/vsdump_en.html">vsdump</a>
+ to probe the structure of files.</p>
+
+ <note>
+ This code currently lives the
+ <a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/">scratchpad area</a>
+ of the POI Git repository. To use this component, ensure
+ you have the Scratchpad Jar on your classpath, or a dependency
+ defined on the <em>poi-scratchpad</em> artifact - the main POI
+ jar is not enough! See the
+ <a href="site:components">POI Components Map</a>
+ for more details.
+ </note>
+
+ <section>
+ <title>Steps required for write support</title>
+ <p>Currently, HDGF is only able to read visio files, it is
+ not able to write them back out again. We believe the
+ following are the steps that would need to be taken to
+ implement it.</p>
+ <ol>
+ <li>Re-write the decompression support in LZW4HDGF as
+ HDGFLZW, which will be much better documented, and also
+ under the ASL. <strong>Completed October 2007</strong></li>
+ <li>Add compression support to HDGFLZW.
+ <strong>In progress - works for small streams but encoding
+ goes wrong on larger ones</strong></li>
+ <li>Have HDGF just write back the raw bytes it read in, and
+ have a test to ensure the file is un-changed.</li>
+ <li>Have HDGF generate the bytes to write out from the
+ Stream stores, using the compressed data as appropriate,
+ without re-compressing. Plus test to ensure file is
+ un-changed.</li>
+ <li>Have HDGF generate the bytes to write out from the
+ Stream stores, re-compressing any streams that were
+ decompressed. Plus test to ensure file is un-changed.</li>
+ <li>Have HDGF re-generate the offsets in pointers for the
+ locations of the streams. Plus test to ensure file is
+ un-changed.</li>
+ <li>Have HDGF re-generate the bytes for all the chunks, from
+ the chunk commands. Tests to ensure the chunks are
+ serialized properly, and then that the file is un-changed</li>
+ <li>Alter the data of one command, but keep it the same
+ length, and check visio can open the file when written
+ out.</li>
+ <li>Alter the data of one command, to a new length, and
+ check that visio can open the file when written out.</li>
+ </ol>
+ </section>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/document/docoverview.xml b/src/documentation/content/xdocs/components/document/docoverview.xml
new file mode 100644
index 0000000000..621e8e9309
--- /dev/null
+++ b/src/documentation/content/xdocs/components/document/docoverview.xml
@@ -0,0 +1,113 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Apache POI™ - HWPF - Java API to Handle Microsoft Word Files</title>
+ <subtitle>Word File Format</subtitle>
+ <authors>
+ <person name="S. Ryan Ackley" email="sackley@cfl.rr.com"/>
+ </authors>
+ </header>
+
+ <body>
+ <section><title>The Word 97 File Format in semi-plain English</title>
+
+ <p>The purpose of this document is to give a brief high level overview of the
+ HWPF document format. This document does not go into in-depth technical
+ detail and is only meant as a supplement to the Microsoft Word 97-2007
+ Binary File Format freely available from
+ <a href="https://msdn.microsoft.com/en-us/library/cc313153%28v=office.12%29.aspx">Microsoft</a>.</p>
+ <p>The OLE file format is not discussed in this document. It is assumed that
+ the reader has a working knowledge of the POIFS API. </p>
+
+ <section><title>Word file structure</title>
+ <p>A Word file is made up of the document text and data structures
+ containing formatting information about the text. Of course, this is a
+ very simplified illustration. There are fields and macros and other
+ things that have not been considered. At this stage, HWPF is mainly
+ concerned with formatted text.</p>
+ </section>
+ <section><title>Reading Word files</title>
+ <p>The entry point for HWPF's reading of a Word file is the File Information
+ Block (FIB). This structure is the entry point for the locations and size
+ of a document's text and data structures. The FIB is located at the
+ beginning of the main stream.</p>
+ <section><title>Text</title>
+ <p>The document's text is also located in the main stream. Its starting
+ location is given as FIB.fcMin and its length is given in bytes by
+ FIB.ccpText. These two values are not very useful in getting the text
+ because of unicode. There may be unicode text intermingled with ASCII
+ text. That brings us to the piece table.</p>
+ <p>The piece table is used to divide the text into non-unicode and unicode
+ pieces. The size and offset are given in FIB.fcClx and FIB.lcbClx
+ respectively. The piece table may contain Property Modifiers (prm).
+ These are for complex(fast-saved) files and are skipped. Each text piece
+ contains offsets in the main stream that contain text for that piece.
+ If the piece uses unicode, the file offset is masked with a certain bit.
+ Then you have to unmask the bit and divide by 2 to get the real file
+ offset. </p>
+ </section>
+ <section><title>Text Formatting</title>
+ <section><title>Stylesheet</title>
+ <p>All text formatting is based on styles contained in the StyleSheet.
+ The StyleSheet is a data structure containing among other things, style
+ descriptions. Each style description can contain a paragraph style and
+ a character style or simply a character style. Each style description
+ is stored in a compressed version on file. Basically these are deltas
+ from another style.</p>
+ <p>Eventually, you have to chain back to the nil style which is an
+ imaginary style with certain implied values.</p>
+ </section>
+ <section><title>Paragraph and Character styles</title>
+ <p>Paragraph and Character formatting properties for a document's text are
+ stored on file as deltas from some base style in the Stylesheet. The
+ deltas are used to create a complete uncompressed style in memory.</p>
+ <p>Uncompressed paragraph styles are represented by the Pargraph
+ Properties(PAP) data structure. Uncompressed character styles are
+ represented by the Character Properties(CHP) data structure. The styles
+ for the document text are stored in compressed format in the
+ corresponding Formatted Disk Pages (FKP). A compressed PAP is referred
+ to as a PAPX and a compressed CHP is a CHPX. The FKP locations are
+ stored in the bin table. There are separate bin tables for CHPXs and
+ PAPXs. The bin tables' locations and sizes are stored in the FIB.</p>
+ <p>A FKP is a 512 byte OLE page. It contains the offsets of the beginning
+ and end of each paragraph/character run in the main stream and the
+ compressed properties for that interval. The compressed PAPX is based on
+ its base style in the StyleSheet. The compressed CHPX is based on the
+ enclosing paragraph's base style in the Stylesheet.</p>
+ </section>
+ <section><title>Uncompressing styles and other data structures</title>
+ <p>All compressed properties(CHPX, PAPX, SEPX) contain a grpprl. A grpprl
+ is an array of sprms. A sprm defines a delta from some base property.
+ There is a table of possible sprms in the Word 97 spec. Each sprm is a
+ two byte operand followed by a parameter. The parameter size depends on
+ the sprm. Each sprm describes an operation that should be performed on
+ the base style. After every sprm in the grpprl is performed on the base
+ style you will have the style for the paragraph, character run,
+ section, etc.</p>
+ </section>
+ </section>
+ </section>
+ </section>
+ </body>
+</document>
+
diff --git a/src/documentation/content/xdocs/components/document/index.xml b/src/documentation/content/xdocs/components/document/index.xml
new file mode 100644
index 0000000000..f5dd91c61c
--- /dev/null
+++ b/src/documentation/content/xdocs/components/document/index.xml
@@ -0,0 +1,235 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Apache POI™ - HWPF and XWPF - Java API to Handle Microsoft Word Files</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
+ <person name="Andrew C. Oliver" email="acoliver@apache.org"/>
+ <person name="Ryan Ackley" email="sackley@apache.org"/>
+ <person name="Rainer Klute" email="klute@apache.org"/>
+ </authors>
+ </header>
+
+ <body>
+ <section><title>Overview</title>
+
+ <p>HWPF is the name of our port of the Microsoft Word 97(-2007) file format
+ to pure Java. It also provides limited read only support for the older
+ Word 6 and Word 95 file formats.</p>
+
+ <p>The partner to HWPF for the new Word 2007 .docx format is <em>XWPF</em>.
+ Whilst HWPF and XWPF provide similar features, there is not a common
+ interface across the two of them at this time.</p>
+
+ <p>Both HWPF and XWPF could be described as "moderately functional". For some
+ use cases, especially around text extraction, support is very strong. For
+ others, support may be limited or incomplete, and it may be necessary to
+ dig down into low-level code. Error checking may be missing in places,
+ so it may be possible to accidentally generate invalid files. Enhancements
+ to fix such things are generally very well received!</p>
+
+ <p>As detailed in the <a href="site:components">Components
+ Page</a>, HWPF is contained within the poi-scratchpad-XXX.jar, while XWPF
+ is in the poi-ooxml-XXX.jar. You will need to ensure you include the appropriate
+ jars (and their dependencies!) in your classpath to use HWPF or XWPF.</p>
+
+ <p>Please note that in version 3.12, due to a bug, you might need to include
+ poi-scratchpad-XXX.jar when using XWPF. This has been fixed again for the next
+ release as there should not be such a dependency.</p>
+
+ </section>
+ <section>
+ <title>An overview of the code</title>
+ <p>
+ Source in the <em>org.apache.poi.hwpf.model</em> tree is the Java representation of
+ internal Word format structure. This code is "internal", it shall not
+ be used by your code. Code from <em>org.apache.poi.hwpf.usermodel</em>
+ package is actual public and user-friendly (as much as possible) API to access document
+ parts. Source code in the
+ <em>org.apache.poi.hwpf.extractor</em>
+ tree is a wrapper of this to facilitate easy extraction of interesting things (eg the Text),
+ and
+ <em>org.apache.poi.hwpf.converter</em>
+ package contains Word-to-HTML and Word-to-FO converters (latest can be used to generate PDF
+ from Word files when using with
+ <a href="https://xmlgraphics.apache.org/fop/">Apache FOP</a>
+ ). Also there is a small file-structure-dumping utility in
+ <em>org.apache.poi.hwpf.dev</em>
+ package, primally for developing purposes.
+ </p>
+
+ <p>
+ The main entry point to HWPF is HWPFDocument. Currently it has a lot of references both to
+ internal interfaces (
+ <em>org.apache.poi.hwpf.model</em>
+ package) and public API (
+ <em>org.apache.poi.hwpf.usermodel</em>
+ ) package. It is possible that it will be split into two different interfaces (like WordFile
+ and WordDocument) in later versions.
+ </p>
+
+ <p>
+ The main entry point to XWPF is XWPFDocument. From there, you can get the
+ paragraphs, pictures, tables, sections, headers etc.
+ </p>
+ <p>
+ Currently, there are only a handful of example programs using HWPF and XWPF
+ available. They can be found in svn in the examples section, under
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hwpf">HWPF</a>
+ and
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xwpf">XWPF</a>.
+ Both HWPF and XWPF have fairly high levels of unit test coverage, which
+ provides examples of using the various areas of functionality of both
+ modules. These can be found in svn, under
+ <a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/src/test/java/org/apache/poi/hwpf">HWPF</a>
+ and
+ <a href="https://github.com/apache/poi/tree/trunk/poi-ooxml/src/test/java/org/apache/poi/xwpf">XWPF</a>.
+ Contributions of more examples, whether inspired by the unit tests or
+ not, would be most welcomed!
+ </p>
+
+ </section>
+ <section>
+ <title>HWPF Notes</title>
+
+ <p>A .doc Word document, as handled by HWPF, can be considered as very long single
+ text buffer. The HWPF API provides "pointers"
+ to document parts, like sections, paragraphs and character runs. Usually user will iterates
+ over main document part sections, paragraphs from sections and character runs from
+ paragraph. Each such interface is a pointer to document text subrange along with additional
+ properties (and they all extends same Range parent class). There is additional Range
+ implementations like Table, TableRow, TableCell, etc. Some structures like Bookmark or Field
+ can also provide subranges pointers.
+ </p>
+
+ <p>Changing file content usually requires a lot of synchronized changes in those structures like
+ updating property boundaries, position handlers, etc. Because of that HWPF API shall be
+ considered as not thread safe. In addition, there is a "one pointer" rule for changing
+ content. It means you should not use two different Range instances at one time. More
+ precisely, if you are changing file content using some range pointer, all other range
+ pointers except parents' ones become invalid. For example if you obtain overall range (1),
+ paragraph range (2) from overall range and character run range (3) from paragraph range and
+ change text of paragraph, character run range is now invalid and should not be used, but
+ overall range pointer still valid. Each time you obtaining range (pointer) new instance is
+ created. It means if you obtained two range pointers and changed document text using first
+ range pointer, second one became invalid.
+ </p>
+
+ </section>
+ <section>
+ <title>XWPF Patches Required!</title>
+
+ <p>At the moment, XWPF covers many common use cases for reading and writing
+ .docx files. Whilst this is a great thing, it does mean that XWPF does
+ everything that the current POI committers need it to do, and so none of
+ the committers are actively adding new features.</p>
+
+ <p>If you come across a feature in XWPF that you need, and isn't currently
+ there, please do send in a patch to add the extra functionality! More details
+ on contributing patches are available on the <a
+ href="site:guidelines">"Contribution to POI" page</a>.</p>
+ </section>
+
+ <section>
+ <title>HWPF Patches Required!</title>
+
+ <p>At the moment we unfortunately do not have someone taking care for HWPF
+ and fostering its development. What we need is someone to stand up, take
+ this thing under his hood as his baby and push it forward. Ryan Ackley,
+ who put a lot of effort into HWPF, is no longer on board, so HWPF is an
+ orphan child waiting to be adopted.</p>
+
+ <p>If <strong>you</strong> are interested in becoming the new HWPF
+ pointman, you should look into the Microsoft Word internals. A good
+ starting point seems to be Ryan Ackley's <a
+ href="site:docformat">overview</a>. An introduction to the binary
+ file formats is <a
+ href="https://msdn.microsoft.com/en-us/library/cc998577%28v=office.12%29.aspx">available
+ from Microsoft</a>, which has some good references and links. After that,
+ the full details on the word format are available from
+ <a href="https://msdn.microsoft.com/en-us/library/cc313153%28v=office.12%29.aspx">Microsoft</a>,
+ but the documentation can be a little hard to get into at first... Try reading the
+ <a href="site:docformat">overview</a> first, and looking at the existing
+ code, then finally look up the documentation for specific missing features.</p>
+
+ <p>As a first step you should familiarize yourself with the source code,
+ examples, test cases, and the HWPF patches available at <a
+ href="https://issues.apache.org/">Bugzilla</a> (if any). Then you
+ should compile an overview of</p>
+
+ <ul>
+ <li>the current HWPF status,</li>
+ <li>the patches in <a
+ href="https://issues.apache.org/bugzilla/">Bugzilla</a> to be checked
+ in (and those that should better be ditched),</li>
+ <li>the available test cases and the test cases still to be written,</li>
+ <li>the available documentation and the docs to be written,</li>
+ <li>anything else that seems reasonable</li>
+ </ul>
+
+ <p>When you start coding, you will not yet have write access to the
+ SVN repository. Please submit your patches to <a
+ href="https://issues.apache.org/">Bugzilla</a> and nag <a
+ href="mailto:dev@poi.apache.org">the dev list</a> until someone commits
+ them. Besides the actual checking in of HWPF patches, current POI
+ committers will also do some minor reviews now and then of your source code
+ patches, test cases and documentation to help ensure software quality. But
+ most of the time you will be on your own. However, anyone offering useful
+ contributions over a period of time will be offered committership!</p>
+
+ <p>Please do not forget to write <a
+ href="https://www.junit.org/">JUnit</a> test cases and documentation!
+ We won't accept code that doesn't come with test cases. And please
+ consider that other contributors should be able to understand your source
+ code easily. If you need any help getting started with JUnit test cases
+ for HWPF, please ask on the developers' mailing list! If you show that you
+ are prepared to stick at it you will most likely be given SVN commit
+ access. See <a href="site:guidelines">"Contribution to POI" page</a>
+ for more details and help getting started.</p>
+
+ <p>Of course we will help you as best as we can. However, presently there
+ is no committer who is really familiar with the Word format, so you'll be
+ mostly on your own. We are looking forward for you and your contributions!
+ Honor and glory of becoming a POI committer are waiting!</p>
+ </section>
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
diff --git a/src/documentation/content/xdocs/components/document/projectplan.xml b/src/documentation/content/xdocs/components/document/projectplan.xml
new file mode 100644
index 0000000000..f43c2a1b60
--- /dev/null
+++ b/src/documentation/content/xdocs/components/document/projectplan.xml
@@ -0,0 +1,392 @@
+<?xml version="1.0"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!-- edited with XMLSPY v5 rel. 4 U (http://www.xmlspy.com) by Ryan Ackley (Myself) -->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+<document>
+ <header>
+ <title>Apache POI™ - HWPF - Java API to Handle Microsoft Word Files</title>
+ <subtitle>Project Plan</subtitle>
+ <authors>
+ <person name="Ryan Ackley" email="sackley@apache.org"/>
+ </authors>
+ </header>
+ <body>
+ <p>HWPF Milestones</p>
+ <table>
+ <tr>
+ <th>
+ Milestones
+ </th>
+ <th>
+ Target Date
+ </th>
+ <th>
+ Owner
+ </th>
+ </tr>
+ <tr>
+ <td>
+ Read in a Word document
+with minimum formatting
+(no lists, tables, footnotes,
+endnotes, headers, footers)
+and write it back out with the
+result viewable in Word
+97/2000
+ </td>
+ <td>
+ 07/11/2003
+ </td>
+ <td>
+ Ryan
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Add support for Lists and
+Tables
+ </td>
+ <td>
+ 8/15/2003
+ </td>
+ <td>
+ &#160;
+ </td>
+ </tr>
+ <tr>
+ <td>
+ HWPF 1.0-alpha release with
+documentation and examples
+ </td>
+ <td>
+ 8/18/2003
+ </td>
+ <td>
+ Praveen/Ryan
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Add support for Headers,
+Footers, endnotes, and
+footnotes
+ </td>
+ <td>
+ 8/31/2003
+ </td>
+ <td>
+ ?
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Add support for forms and
+mail merge
+ </td>
+ <td>
+ September/October 2003
+ </td>
+ <td>
+ ?
+ </td>
+ </tr>
+ </table>
+ <p>HWPF Task Lists</p>
+ <p>Read in a Word document with minimum formatting (no lists, tables, footnotes,
+endnotes, headers, footers) and write it back out with the result viewable in Word 97/2000</p>
+ <table>
+ <tr>
+ <th>
+ Task
+ </th>
+ <th>
+ Target Date
+ </th>
+ <th>
+ Owner
+ </th>
+ </tr>
+ <tr>
+ <td>
+ Create classes to read and
+write low level data
+structures with test cases
+ </td>
+ <td>
+ 7/10/2003
+ </td>
+ <td>
+ Ryan
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Create classes to read and
+write FontTable and Font
+names with test case
+ </td>
+ <td>
+ 7/10/2003
+ </td>
+ <td>
+ Praveen
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Final test
+ </td>
+ <td>
+ 7/11/2003
+ </td>
+ <td>
+ Ryan
+ </td>
+ </tr>
+ </table>
+ <p>Develop user friendly API so it is fun and easy to read and write word documents
+with java.</p>
+ <table>
+ <tr>
+ <th>
+ Task
+ </th>
+ <th>
+ Target Date
+ </th>
+ <th>
+ Owner
+ </th>
+ </tr>
+ <tr>
+ <td>
+ Develop a way for SPRMS to
+be compressed and
+uncompressed
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Override CHPAbstractType
+with a concrete class that
+exposes attributes with
+human readable names
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Override PAPAbstractType
+with a concrete class that
+exposes attributes with
+human readable names
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Override SEPAbstractType
+with a concrete class that
+exposes attributes with
+human readable names
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Override DOPAbstractType
+with a concrete class that
+exposes attributes with
+human readable names
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Override TAPAbstractType
+with a concrete class that
+exposes attributes with
+human readable names
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Override TCAbstractType
+with a concrete class that
+exposes attributes with
+human readable names
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Develop a VerifyIntegrity
+class for testing so it is easy
+to determine if a Word
+Document is well-formed.
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Develop general intuitive
+API to tie everything together
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ </table>
+ <p>Add support for lists and tables</p>
+ <table>
+ <tr>
+ <th>
+ Task
+ </th>
+ <th>
+ Target Date
+ </th>
+ <th>
+ Owner
+ </th>
+ </tr>
+ <tr>
+ <td>
+ Add data structures for
+reading and writing list data
+with test cases.
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Add data structures for
+reading and writing tables
+with test cases.
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ </table>
+ <p>HWPF 1.0-alpha release with documentation and examples</p>
+ <table>
+ <tr>
+ <th>
+ Task
+ </th>
+ <th>
+ Target Date
+ </th>
+ <th>
+ Owner
+ </th>
+ </tr>
+ <tr>
+ <td>
+ Document the user model
+API
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Document the low level
+classes
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ <tr>
+ <td>
+ Come up with detailed How-To&#8217;s
+ </td>
+ <td>
+
+ </td>
+ <td>
+
+ </td>
+ </tr>
+ </table>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/document/quick-guide-xwpf.xml b/src/documentation/content/xdocs/components/document/quick-guide-xwpf.xml
new file mode 100644
index 0000000000..a321821957
--- /dev/null
+++ b/src/documentation/content/xdocs/components/document/quick-guide-xwpf.xml
@@ -0,0 +1,89 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-XWPF - A Quick Guide</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nick Burch" email="nick at torchbox dot com"/>
+ </authors>
+ </header>
+
+ <body>
+ <p>XWPF has a fairly stable core API, providing read and write access
+ to the main parts of a Word .docx file, but it isn't complete. For
+ some things, it may be necessary to dive down into the low level XMLBeans
+ objects to manipulate the ooxml structure. If you find yourself having
+ to do this, please consider sending in a patch to enhance that, see the
+ <a href="site:guidelines">"Contribution to POI" page</a>.</p>
+
+ <section><title>Basic Text Extraction</title>
+ <p>For basic text extraction, make use of
+<code>org.apache.poi.xwpf.extractor.XWPFWordExtractor</code>. It accepts an input
+stream or a <code>XWPFDocument</code>. The <code>getText()</code>
+method can be used to
+get the text from all the paragraphs, along with tables, headers etc.
+ </p>
+ </section>
+
+ <section><title>Specific Text Extraction</title>
+ <p>To get specific bits of text, first create a
+<code>org.apache.poi.xwpf.XWPFDocument</code>. Select the <code>IBodyElement</code>
+of interest (Table, Paragraph etc), and from there get a <code>XWPFRun</code>.
+Finally fetch the text and properties from that.
+ </p>
+ </section>
+
+ <section><title>Headers and Footers</title>
+ <p>To get at the headers and footers of a word document, first create a
+<code>org.apache.poi.xwpf.XWPFDocument</code>. Next, you need to create a
+<code>org.apache.poi.xwpf.usermodel.XWPFHeaderFooter</code>, passing it your
+XWPFDocument. Finally, the XWPFHeaderFooter gives you access to the headers and
+footers, including first / even / odd page ones if defined in your
+document.</p>
+ </section>
+
+ <section><title>Changing Text</title>
+ <p>From a <code>XWPFParagraph</code>, it is possible to fetch the existing
+ <code>XWPFRun</code> elements that make up the text. To add new text,
+ the <code>createRun()</code> method will add a new <code>XWPFRun</code>
+ to the end of the list. <code>insertNewRun(int)</code> can instead be
+ used to add a new <code>XWPFRun</code> at a specific point in the
+ paragraph.
+ </p>
+ <p>Once you have a <code>XWPFRun</code>, you can use the
+ <code>setText(String)</code> method to make changes to the text. To add
+ whitespace elements such as tabs and line breaks, it is necessary to use
+ methods like <code>addTab()</code> and <code>addCarriageReturn()</code>.
+ </p>
+ </section>
+
+ <section><title>Further Examples</title>
+ <p>For now, there are a limited number of XWPF examples in the
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xwpf">Examples Package</a>.
+ Beyond those, the best source of additional examples is in the unit
+ tests. <a href="https://github.com/apache/poi/tree/trunk/poi-ooxml/src/test/java/org/apache/poi/xwpf/">
+ Browse the XWPF unit tests.</a>
+ </p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/document/quick-guide.xml b/src/documentation/content/xdocs/components/document/quick-guide.xml
new file mode 100644
index 0000000000..a15f2009a5
--- /dev/null
+++ b/src/documentation/content/xdocs/components/document/quick-guide.xml
@@ -0,0 +1,88 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-HWPF - A Quick Guide</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nick Burch" email="nick at torchbox dot com"/>
+ </authors>
+ </header>
+
+ <body>
+ <p>HWPF is still in early development. It is in the <a
+ href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/">
+ scratchpad section of the SVN.</a> You will need to ensure you
+ either have a recent SVN checkout, or a recent SVN nightly build
+ (including the scratchpad jar!)</p>
+
+ <section><title>Basic Text Extraction</title>
+ <p>For basic text extraction, make use of
+<code>org.apache.poi.hwpf.extractor.WordExtractor</code>. It accepts an input
+stream or a <code>HWPFDocument</code>. The <code>getText()</code>
+method can be used to
+get the text from all the paragraphs, or <code>getParagraphText()</code>
+can be used to fetch the text from each paragraph in turn. The other
+option is <code>getTextFromPieces()</code>, which is very fast, but
+tends to return things that aren't text from the page. YMMV.
+ </p>
+ </section>
+
+ <section><title>Specific Text Extraction</title>
+ <p>To get specific bits of text, first create a
+<code>org.apache.poi.hwpf.HWPFDocument</code>. Fetch the range
+with <code>getRange()</code>, then get paragraphs from that. You
+can then get text and other properties.
+ </p>
+ </section>
+
+ <section><title>Headers and Footers</title>
+ <p>To get at the headers and footers of a word document, first create a
+<code>org.apache.poi.hwpf.HWPFDocument</code>. Next, you need to create a
+<code>org.apache.poi.hwpf.usermodel.HeaderStores</code>, passing it your
+HWPFDocument. Finally, the HeaderStores gives you access to the headers and
+footers, including first / even / odd page ones if defined in your
+document. Additionally, HeaderStores provides a method for removing
+any macros in the text, which is helpful as many headers and footers
+do end up with macros in them.</p>
+ </section>
+
+ <section><title>Changing Text</title>
+ <p>It is possible to change the text via
+ <code>insertBefore()</code> and <code>insertAfter()</code>
+ on a <code>Range</code> object (either a <code>Range</code>,
+ <code>Paragraph</code> or <code>CharacterRun</code>).
+ It is also possible to delete a <code>Range</code>.
+ This code will work in many, but not all cases, and patches to
+ improve it are gratefully received!
+ </p>
+ </section>
+
+ <section><title>Further Examples</title>
+ <p>For now, the best source of additional examples is in the unit
+ tests. <a
+ href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/src/test/java/org/apache/poi/hwpf/">
+ Browse the HWPF unit tests.</a>
+ </p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/hmef/index.xml b/src/documentation/content/xdocs/components/hmef/index.xml
new file mode 100644
index 0000000000..168ba65408
--- /dev/null
+++ b/src/documentation/content/xdocs/components/hmef/index.xml
@@ -0,0 +1,216 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-HMEF - Java API To Access Microsoft Transport Neutral Encoding Files (TNEF)</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nick Burch" email="nick at apache dot org"/>
+ </authors>
+ </header>
+
+ <body>
+ <section>
+ <title>Overview</title>
+
+ <p>HMEF is the POI Project's pure Java implementation of Microsoft's
+ TNEF (Transport Neutral Encoding Format), aka winmail.dat,
+ which is used by Outlook and Exchange in some situations.</p>
+ <p>Currently, HMEF provides a read-only api for accessing common
+ message and attachment attributes, including the message body
+ and attachment files. In addition, it's possible to have
+ read-only access to all of the underlying TNEF and MAPI
+ attributes of the message and attachments.</p>
+ <p>HMEF also provides a command line tool for extracting out
+ the message body and attachment files from a TNEF (winmail.dat)
+ file.</p>
+ <p>Write support, both for saving changes and for creating new
+ files, is currently unavailable. Anyone interested in working
+ on these areas is advised to read the
+ <a href="site:guidelines">Contribution Guidelines</a> then
+ <a href="site:mailinglists">join the dev list</a>!</p>
+
+ <note>
+ This code currently lives the
+ <a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/">scratchpad area</a>
+ of the POI Git repository. To use this component, ensure
+ you have the Scratchpad Jar on your classpath, or a dependency
+ defined on the <em>poi-scratchpad</em> artifact - the main POI
+ jar is not enough! See the
+ <a href="site:components">POI Components Map</a>
+ for more details.
+ </note>
+ </section>
+
+ <section>
+ <title>Using HMEF to access TNEF (winmail.dat) files</title>
+
+ <section>
+ <title>Easy extraction of message body and attachment files</title>
+
+ <p>The class <em>org.apache.poi.hmef.extractor.HMEFContentsExtractor</em>
+ provides both command line and Java extraction. It allows the
+ saving of the message body (an RTF file), and all of the
+ attachment files, to a single directory as specified.</p>
+
+ <p>From the command line, simply call the class specifying the
+ TNEF file to extract, and the directory to place the extracted
+ files into, eg:</p>
+ <source>
+ java -classpath poi-5.4.1.jar:poi-scratchpad-5.4.1.jar org.apache.poi.hmef.extractor.HMEFContentsExtractor winmail.dat /tmp/extracted/
+ </source>
+
+ <p>From Java, there are two method calls on the class, one to
+ extract the message body RTF to a file, and the other to extract
+ all the attachments to a directory. A typical use would be:</p>
+ <source>
+public void extract(String winmailFilename, String directoryName) throws Exception {
+ HMEFContentsExtractor ext = new HMEFContentsExtractor(new File(winmailFilename));
+
+ File dir = new File(directoryName);
+ File rtf = new File(dir, "message.rtf");
+ if(! dir.exists()) {
+ throw new FileNotFoundException("Output directory " + dir.getName() + " not found");
+ }
+
+ System.out.println("Extracting...");
+ ext.extractMessageBody(rtf);
+ ext.extractAttachments(dir);
+ System.out.println("Extraction completed");
+}
+ </source>
+ </section>
+
+ <section>
+ <title>Attachment attributes and contents</title>
+
+ <p>To get at your attachments, simply call the
+ <em>getAttachments()</em> method on a <em>HMEFMessage</em>
+ instance, and you'll receive a list of all the attachments.</p>
+ <p>When you have a <em>org.apache.poi.hmef.Attachment</em> object,
+ there are several helper methods available. These will all
+ return the value of the appropriate underlying attachment
+ attributes, or null if for some reason the attribute isn't
+ present in your file.</p>
+ <ul>
+ <li><em>getFilename()</em> - returns the name of the attachment
+ file, possibly in 8.3 format</li>
+ <li><em>getLongFilename()</em> - returns the full name of the
+ attachment file</li>
+ <li><em>getExtension()</em> - returns the extension of the
+ attachment file, including the "."</li>
+ <li><em>getModifiedDate()</em> - returns the date that the
+ attachment file was last edited on</li>
+ <li><em>getContents()</em> - returns a byte array of the contents
+ of the attached file</li>
+ <li><em>getRenderedMetaFile()</em> - returns a byte array of
+ a windows meta file representation of the attached file</li>
+ </ul>
+ </section>
+
+ <section>
+ <title>Message attributes and message body</title>
+
+ <p>A <em>org.apache.poi.hmef.HMEFMessage</em> instance is created
+ from an <em>InputStream</em> of the underlying TNEF (winmail.dat)
+ file.</p>
+ <p>From a <em>HMEFMessage</em>, there are three main methods of
+ interest to call:</p>
+ <ul>
+ <li><em>getBody()</em> - returns a String containing the RTF
+ contents of the message body. </li>
+ <li><em>getSubject()</em> - returns the message subject</li>
+ <li><em>getAttachments()</em> - returns the list of
+ <em>Attachment</em> objects for the message</li>
+ </ul>
+ </section>
+
+ <section>
+ <title>Low level attribute access</title>
+
+ <p>Both Messages and Attachments contain two kinds of attributes.
+ These are <em>TNEFAttribute</em> and <em>MAPIAttribute</em>.</p>
+ <p>TNEFAttribute is specific to TNEF files in terms of the
+ available types and properties. In general, Attachments have a
+ few more useful ones of these then Messages.</p>
+ <p>MAPIAttributes hold standard MAPI properties and values, and
+ work in a similar way to <a href="../hsmf/">HSMF
+ (Outlook)</a> does. There are typically many of these on both
+ Messages and Attachments. <em>Note - see limitations</em></p>
+ <p>Both <em>HMEFMessage</em> and <em>Attachment</em> supports
+ support two different ways of getting to attributes of interest.
+ Firstly, they support list getters, to return all attributes
+ (either TNEF or MAPI). Secondly, they support specific getters by
+ TNEF or MAPI property.</p>
+ <source>
+HMEFMessage msg = new HMEFMessage(new FileInputStream(file));
+for(TNEFAttribute attr : msg.getMessageAttributes()) {
+ System.out.println("TNEF : " + attr);
+}
+for(MAPIAttribute attr : msg.getMessageMAPIAttributes()) {
+ System.out.println("MAPI : " + attr);
+}
+System.out.println("Subject is " + msg.getMessageMAPIAttribute(MAPIProperty.CONVERSATION_TOPIC));
+
+for(Attachment attach : msg.getAttachments()) {
+ for(TNEFAttribute attr : attach.getAttributes()) {
+ System.out.println("A.TNEF : " + attr);
+ }
+ for(MAPIAttribute attr : attach.getMAPIAttributes()) {
+ System.out.println("A.MAPI : " + attr);
+ }
+ System.out.println("Filename is " + attach.getAttribute(TNEFProperty.ID_ATTACHTITLE));
+ System.out.println("Extension is " + attach.getMAPIAttribute(MAPIProperty.ATTACH_EXTENSION));
+}
+ </source>
+ </section>
+ </section>
+
+ <section>
+ <title>Investigating a TNEF file</title>
+
+ <p>To get a feel for the contents of a file, and to track down
+ where data of interest is stored, HMEF comes with
+ <a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/src/main/java/org/apache/poi/hmef/dev/">HMEFDumper</a>
+ to print out the contents of the file.</p>
+ </section>
+
+ <section>
+ <title>Limitations</title>
+
+ <p>HMEF is currently a work-in-progress, and not everything
+ works yet. The current limitations are:</p>
+ <ul>
+ <li>Non-standard MAPI properties from the range 0x8000 to 0x8fff
+ may not be being quite correctly turned into attributes.
+ The values show up, but the name and type may not always
+ be correct.</li>
+ <li>All testing so far has been performed on a small number of
+ English documents. We think we're correctly turning bytes into
+ Java unicode strings, but we need a few non-English sample
+ files in the test suite to verify this!</li>
+ <li>There is no support for saving changes, nor for creating new
+ files</li>
+ </ul>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/hpbf/file-format.xml b/src/documentation/content/xdocs/components/hpbf/file-format.xml
new file mode 100644
index 0000000000..e23df62cc2
--- /dev/null
+++ b/src/documentation/content/xdocs/components/hpbf/file-format.xml
@@ -0,0 +1,197 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-HPBF - A Guide to the Publisher File Format</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nick Burch" email="nick at torchbox dot com"/>
+ </authors>
+ </header>
+
+ <body>
+ <section><title>Document Streams</title>
+ <p>
+ The file is made up of a number of POIFS streams. A typical
+ file will be made up as follows:
+ </p>
+<source>
+Root Entry -
+ Objects -
+ (no children)
+ SummaryInformation &lt;(0x05)SummaryInformation&gt;
+ DocumentSummaryInformation &lt;(0x05)DocumentSummaryInformation&gt;
+ Escher -
+ EscherStm
+ EscherDelayStm
+ Quill -
+ QuillSub -
+ CONTENTS
+ CompObj &lt;(0x01)CompObj&gt;
+ Envelope
+ Contents
+ Internal &lt;(0x03)Internal&gt;
+ CompObj &lt;(0x01)CompObj&gt;
+ VBA -
+ (no children)
+</source>
+ </section>
+ <section><title>Changing Text</title>
+ <p>If you make a change to the text of a file, but not change
+ how much text there is, then the <em>CONTENTS</em> stream
+ will undergo a small change, and the <em>Contents</em> stream
+ will undergo a large change.</p>
+ <p>If you make a change to the text of a file, and change the
+ amount of text there is, then both the <em>Contents</em> and
+ the <em>CONTENTS</em> streams change.</p>
+ </section>
+ <section><title>Changing Shapes</title>
+ <p>If you alter the size of a textbox, but make no text changes,
+ then both <em>Contents</em> and <em>CONTENTS</em> streams
+ change. There are no changes to the Escher streams.</p>
+ <p>If you set the background colour of a textbox, but make
+ no changes to the text, (to finish off)</p>
+ </section>
+ <section><title>Structure of CONTENTS</title>
+ <p>First we have "CHNKINK ", followed by 24 bytes.</p>
+ <p>Next we have 20 sequences of 24 bytes each. If the first two bytes
+ at 0x1800, then that sequence entry exists, but if it's 0x0000 then
+ the entry doesn't exist. If it does exist, we then have 4 bytes of
+ upper case ASCII text, followed by three little endian shorts.
+ The first of these seems to be the count of that type, the second is
+ usually 1, the third is usually zero. The we have another 4 bytes of
+ upper case ASCII text, normally but not always the same as the first
+ text. Finally, we have an unsigned little endian 32 bit offset to
+ the start of the data for this, then an unsigned little endian
+ 32 bit offset of the length of this section.</p>
+ <p>Normally, the first sequence entry is for TEXT, and the text data
+ will start at 0x200. After that is normally two or three STSH entries
+ (so the first short has values 0, then 1, then 2). After that it
+ seems to vary.</p>
+ <p>At 0x200 we have the text, stored as little endian 16 bit unicode.</p>
+ <p>After the text comes all sorts of other stuff, presumably as
+ described by the sequences.</p>
+ <p>For a contents stream of length 7168 / 0x1c00 bytes, the start
+ looks something like:</p>
+<source>
+CHNKINK // "CHNKINK "
+04 00 07 00 // Normally 04 00 07 00
+13 00 00 03 // Normally ## 00 00 03
+00 02 00 00 // Normally 00 ## 00 00
+00 1c 00 00 // Normally length of the stream
+f8 01 13 00 // Normally f8 01 11/13 00
+ff ff ff ff // Normally seems to be ffffffff
+
+18 00
+TEXT 00 00 01 00 00 00 // TEXT 0 1 0
+TEXT 00 02 00 00 d0 03 00 00 // TEXT from: 200 (512), len: 3d0 (976)
+18 00
+STSH 00 00 01 00 00 00 // STSH 0 1 0
+STSH d0 05 00 00 1e 00 00 00 // STSH from: 5d0 (1488), len: 1e (30)
+18 00
+STSH 01 00 01 00 00 00 // STSH 1 1 0
+STSH ee 05 00 00 b8 01 00 00 // STSH from: 5ee (1518), len: 1b8 (440)
+18 00
+STSH 02 00 01 00 00 00 // STSH 2 1 0
+STSH a6 07 00 00 3c 00 00 00 // STSH from: 7a6 (1958), len: 3c (60)
+18 00
+FDPP 00 00 01 00 00 00 // FDPP 0 1 0
+FDPP 00 08 00 00 00 02 00 00 // FDPP from: 800 (2048), len: 200 (512)
+18 00
+FDPC 00 00 01 00 00 00 // FDPC 0 1 0
+FDPC 00 0a 00 00 00 02 00 00 // FDPC from: a00 (2560), len: 200 (512)
+18 00
+FDPC 01 00 01 00 00 00 // FDPC 1 1 0
+FDPC 00 0c 00 00 00 02 00 00 // FDPC from: c00 (3072), len: 200 (512)
+18 00
+SYID 00 00 01 00 00 00 // SYID 0 1 0
+SYID 00 0e 00 00 20 00 00 00 // SYID from: e00 (3584), len: 20 (32)
+18 00
+SGP 00 00 01 00 00 00 // SGP 0 1 0
+SGP 20 0e 00 00 0a 00 00 00 // SGP from: e20 (3616), len: a (10)
+18 00
+INK 00 00 01 00 00 00 // INK 0 1 0
+INK 2a 0e 00 00 04 00 00 00 // INK from: e2a (3626), len: 4 (4)
+18 00
+BTEP 00 00 01 00 00 00 // BTEP 0 1 0
+PLC 2e 0e 00 00 18 00 00 00 // PLC from: e2e (3630), len: 18 (24)
+18 00
+BTEC 00 00 01 00 00 00 // BTEC 0 1 0
+PLC 46 0e 00 00 20 00 00 00 // PLC from: e46 (3654), len: 20 (32)
+18 00
+FONT 00 00 01 00 00 00 // FONT 0 1 0
+FONT 66 0e 00 00 48 03 00 00 // FONT from: e66 (3686), len: 348 (840)
+18 00
+TCD 03 00 01 00 00 00 // TCD 3 1 0
+PLC ae 11 00 00 24 00 00 00 // PLC from: 11ae (4526), len: 24 (36)
+18 00
+TOKN 04 00 01 00 00 00 // TOKN 4 1 0
+PLC d2 11 00 00 0a 01 00 00 // PLC from: 11d2 (4562), len: 10a (266)
+18 00
+TOKN 05 00 01 00 00 00 // TOKN 5 1 0
+PLC dc 12 00 00 2a 01 00 00 // PLC from: 12dc (4828), len: 12a (298)
+18 00
+STRS 00 00 01 00 00 00 // STRS 0 1 0
+PLC 06 14 00 00 46 00 00 00 // PLC from: 1406 (5126), len: 46 (70)
+18 00
+MCLD 00 00 01 00 00 00 // MCLD 0 1 0
+MCLD 4c 14 00 00 16 06 00 00 // MCLD from: 144c (5196), len: 616 (1558)
+18 00
+PL 00 00 01 00 00 00 // PL 0 1 0
+PL 62 1a 00 00 48 00 00 00 // PL from: 1a62 (6754), len: 48 (72)
+00 00 // Blank entry follows
+00 00 00 00 00 00
+00 00 00 00 00 00 00 00
+00 00 00 00 00 00 00 00
+
+(the text will then start)
+</source>
+ <p>We think that the first 4 bytes of text describes the
+ the function of the data at the offset. The first short is
+ then the count of that type, eg the 2nd will have 1. We
+ think that the second 4 bytes of text describes the format
+ of data block at the offset. The format of the text block
+ is easy, but we're still trying to figure out the others.</p>
+
+ <section><title>Structure of TEXT bit</title>
+ <p>This is very simple. All the text for the document is
+ stored in a single bit of the Quill CONTENTS. The text
+ is stored as little endian 16 bit unicode strings.</p>
+ </section>
+ <section><title>Structure of PLC bit</title>
+ <p>The first four bytes seem to hold the count of the
+ entries in the bit, and the second four bytes seem to hold
+ the type. There is then some pre-data, and then data for
+ each of the entries, the exact format dependant on the type.</p>
+ <p>Type 0 has 4 2 byte unsigned ints, then a pair of 2 byte
+ unsigned ints for each entry.</p>
+ <p>Type 4 has 4 2 byte unsigned ints, then a pair of 4 byte
+ unsigned ints for each entry.</p>
+ <p>Type 8 has 7 2 byte unsigned ints, then a pair of 4 byte
+ unsigned ints for each entry.</p>
+ <p>Type 12 holds hyperlinks, and is very much more complex.
+ See <a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/src/main/java/org/apache/poi/hpbf/model/qcbits/QCPLCBit.java?view=markup"><code>org.apache.poi.hpbf.model.qcbits.QCPLCBit</code></a>
+ for our best guess as to how the contents match up.</p>
+ </section>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/hpbf/index.xml b/src/documentation/content/xdocs/components/hpbf/index.xml
new file mode 100644
index 0000000000..92e4693d3e
--- /dev/null
+++ b/src/documentation/content/xdocs/components/hpbf/index.xml
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-HPBF - Java API To Access Microsoft Publisher Format Files</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nick Burch" email="nick at apache dot org"/>
+ </authors>
+ </header>
+
+ <body>
+ <section>
+ <title>Overview</title>
+
+ <p>HPBF is the POI Project's pure Java implementation of the
+ Publisher file format.</p>
+ <p>Currently, HPBF is in an early stage, whilst we try to
+ figure out the file format. So far, we have basic text
+ extraction support, and are able to read some parts within
+ the file. Writing is not yet supported, as we are unable
+ to make sense of the Contents stream, which we think has
+ lots of offsets to other parts of the file.</p>
+ <p>Our initial aim is to produce a text extractor for the format
+ (now done), and be able to extract hyperlinks from within
+ the document (partly supported). Additional low level
+ code to process the file format may follow, if there
+ is demand and developer interest warrants it.</p>
+ <p>Text Extraction is available via the
+ <em>org.apache.poi.hpbf.extractor.PublisherTextExtractor</em>
+ class.</p>
+ <p>At this time, there is no <em>usermodel</em> api or similar.
+ There is only low level support for certain parts of
+ the file, but by no means all of it.</p>
+ <p>Our current understanding of the file format is documented
+ <a href="site:hpbformat">here</a>.</p>
+ <p>As of 2017, we are unaware of a public format specification for
+ Microsoft Publisher .pub files. This format was not included in
+ the Microsoft Open Specifications Promise with the rest of the
+ Microsoft Office file formats.
+ As of <a href="https://social.msdn.microsoft.com/Forums/en-US/63dc6c4e-d6b2-4873-97dd-139ddb304e24/what-about-publisher-file-format?forum=os_binaryfile">2009</a> and <a href="https://social.msdn.microsoft.com/Forums/en-US/a5f55c72-5378-4dc9-944a-9973a12bfaa7/reading-viso-vsdfiles-and-publisher-pubfiles-without-office?forum=os_binaryfile">2016</a>, Microsoft had no plans to document the .pub file format.
+ If this changes in the future, perhaps we will see a spec published
+ on the <a href="https://msdn.microsoft.com/en-us/library/cc313105(v=office.12).aspx">Microsoft Office File Format Open Specification Technical Documentation</a>.
+ </p>
+
+ <note>
+ This code currently lives the
+ <a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/">scratchpad area</a>
+ of the POI Git repository. To use this component, ensure
+ you have the Scratchpad Jar on your classpath, or a dependency
+ defined on the <em>poi-scratchpad</em> artifact - the main POI
+ jar is not enough! See the
+ <a href="site:components">POI Components Map</a>
+ for more details.
+ </note>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/hpsf/how-to.xml b/src/documentation/content/xdocs/components/hpsf/how-to.xml
new file mode 100644
index 0000000000..a2c7c61acf
--- /dev/null
+++ b/src/documentation/content/xdocs/components/hpsf/how-to.xml
@@ -0,0 +1,1477 @@
+<?xml version="1.0" encoding="iso-8859-1"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>HPSF HOW-TO</title>
+ <authors>
+ <person name="Rainer Klute" email="klute@apache.org"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>How To Use the HPSF API</title>
+
+ <p>This HOW-TO is organized in four sections. You should read them
+ sequentially because the later sections build upon the earlier ones.</p>
+
+ <ol>
+ <li>
+ The <a href="#sec1">first section</a> explains how to <strong>read
+ the most important standard properties</strong> of a Microsoft Office
+ document. Standard properties are things like title, author, creation
+ date etc. It is quite likely that you will find here what you need and
+ don't have to read the other sections.
+ </li>
+
+ <li>
+ The <a href="#sec2">second section</a> goes a small step
+ further and focuses on <strong>reading additional standard
+ properties</strong>. It also talks about <strong>exceptions</strong> that
+ may be thrown when dealing with HPSF and shows how you can <strong>read
+ properties of embedded objects</strong>.
+ </li>
+
+ <li>
+ The <a href="#sec3">third section</a> explains how to <strong>write
+ standard properties</strong>. HPSF provides some high-level classes and
+ methods which make writing of standard properties easy. They are based on
+ the low-level writing functions explained in the <a href="#sec3">fifth
+ section</a>.
+ </li>
+
+ <li>
+ The <a href="#sec4">fourth section</a> tells how to <strong>read
+ non-standard properties</strong>. Non-standard properties are
+ application-specific triples consisting of an ID, a type, and a value.
+ </li>
+
+ <li>
+ The <a href="#sec5">fifth section</a> tells you how to <strong>write
+ property set streams</strong> using HPSF's low-level methods. You have to
+ understand the <a href="#sec3">fourth section</a> before you should
+ think about low-level writing properties. Check the Javadoc API
+ documentation to find out about the details!
+ </li>
+ </ol>
+
+ <note><strong>Please note:</strong> HPSF's writing functionality is
+ <strong>not</strong> present in POI releases up to and including 2.5. In
+ order to write properties you have to download a 3.0.x POI release,
+ or retrieve the POI development version from the <a
+ href="site:git">Git repository</a>.</note>
+
+
+
+ <anchor id="sec1"/>
+ <section><title>Reading Standard Properties</title>
+
+ <note>This section explains how to read the most important standard
+ properties of a Microsoft Office document. Standard properties are things
+ like title, author, creation date etc. This section introduces the
+ <strong>summary information stream</strong> which is used to keep these
+ properties. Chances are that you will find here what you need and don't
+ have to read the other sections.</note>
+
+ <p>If all you are interested in is getting the textual content of
+ all the document properties, such as for full text indexing, then
+ take a look at
+ <code>org.apache.poi.hpsf.extractor.HPSFPropertiesExtractor</code>. However,
+ if you want full access to the properties, please read on!</p>
+
+ <p>The first thing you should understand is that a Microsoft Office file is
+ not one large bunch of bytes but has an internal filesystem structure with
+ files and directories. You can access these files and directories using
+ the <a href="../poifs/index.html">POI filesystem (POIFS)</a>
+ provides. A file or document in a POI filesystem is also called a
+ <strong>stream</strong> - The properties of, say, an Excel document are
+ stored apart of the actual spreadsheet data in separate streams. The good
+ new is that this separation makes the properties independent of the
+ concrete Microsoft Office file. In the following text we will always say
+ "POI filesystem" instead of "Microsoft Office file" because a POI
+ filesystem is not necessarily created by or for a Microsoft Office
+ application, because it is shorter, and because we want to avoid the name
+ of That Redmond Company.</p>
+
+ <p>The following example shows how to read the "title" property. Reading
+ other properties is similar. Consider the API documentation of the class
+ <code>org.apache.poi.hpsf.SummaryInformation</code> to learn which methods
+ are available.</p>
+
+ <p>The standard properties this section focuses on can be found in a
+ document called <em>\005SummaryInformation</em> located in the root of the
+ POI filesystem. The notation <em>\005</em> in the document's name means
+ the character with a decimal value of 5. In order to read the "title"
+ property, an application has to perform the following steps:</p>
+
+ <ol>
+ <li>
+ Open the document <em>\005SummaryInformation</em> located in the root
+ of the POI filesystem.
+ </li>
+ <li>
+ Create an instance of the class <code>SummaryInformation</code> from
+ that document.
+ </li>
+ <li>
+ Call the <code>SummaryInformation</code> instance's
+ <code>getTitle()</code> method.
+ </li>
+ </ol>
+
+ <p>Sounds easy, doesn't it? Here are the steps in detail.</p>
+
+
+ <section><title>Open the document \005SummaryInformation in the root of the
+ POI filesystem</title>
+
+ <p>An application that wants to open a document in a POI filesystem
+ (POIFS) proceeds as shown by the following code fragment. The full
+ source code of the sample application is available in the
+ <em>examples</em> section of the POI source tree as
+ <em>ReadTitle.java</em>.</p>
+
+ <source>
+import java.io.*;
+import org.apache.poi.hpsf.*;
+import org.apache.poi.poifs.eventfilesystem.*;
+
+// ...
+
+public static void main(String[] args)
+ throws IOException
+{
+ final String filename = args[0];
+ POIFSReader r = new POIFSReader();
+ r.registerListener(new MyPOIFSReaderListener(),
+ "\005SummaryInformation");
+ r.read(new FileInputStream(filename));
+}</source>
+
+ <p>The first interesting statement is</p>
+
+ <source>POIFSReader r = new POIFSReader();</source>
+
+ <p>It creates a
+ <code>org.apache.poi.poifs.eventfilesystem.POIFSReader</code> instance
+ which we shall need to read the POI filesystem. Before the application
+ actually opens the POI filesystem we have to tell the
+ <code>POIFSReader</code> which documents we are interested in. In this
+ case the application should do something with the document
+ <em>\005SummaryInformation</em>.</p>
+
+ <source>
+r.registerListener(new MyPOIFSReaderListener(),
+ "\005SummaryInformation");</source>
+
+ <p>This method call registers a
+ <code>org.apache.poi.poifs.eventfilesystem.POIFSReaderListener</code>
+ with the <code>POIFSReader</code>. The <code>POIFSReaderListener</code>
+ interface specifies the method <code>processPOIFSReaderEvent()</code>
+ which processes a document. The class
+ <code>MyPOIFSReaderListener</code> implements the
+ <code>POIFSReaderListener</code> and thus the
+ <code>processPOIFSReaderEvent()</code> method. The eventing POI
+ filesystem calls this method when it finds the
+ <em>\005SummaryInformation</em> document. In the sample application
+ <code>MyPOIFSReaderListener</code> is a static class in the
+ <em>ReadTitle.java</em> source file.</p>
+
+ <p>Now everything is prepared and reading the POI filesystem can
+ start:</p>
+
+ <source>r.read(new FileInputStream(filename));</source>
+
+ <p>The following source code fragment shows the
+ <code>MyPOIFSReaderListener</code> class and how it retrieves the
+ title.</p>
+
+ <source>
+static class MyPOIFSReaderListener implements POIFSReaderListener
+{
+ public void processPOIFSReaderEvent(POIFSReaderEvent event)
+ {
+ SummaryInformation si = null;
+ try
+ {
+ si = (SummaryInformation)
+ PropertySetFactory.create(event.getStream());
+ }
+ catch (Exception ex)
+ {
+ throw new RuntimeException
+ ("Property set stream \"" +
+ event.getPath() + event.getName() + "\": " + ex);
+ }
+ final String title = si.getTitle();
+ if (title != null)
+ System.out.println("Title: \"" + title + "\"");
+ else
+ System.out.println("Document has no title.");
+ }
+}
+</source>
+
+ <p>The line</p>
+
+ <source>SummaryInformation si = null;</source>
+
+ <p>declares a <code>SummaryInformation</code> variable and initializes it
+ with <code>null</code>. We need an instance of this class to access the
+ title. The instance is created in a <code>try</code> block:</p>
+
+ <source>si = (SummaryInformation)
+ PropertySetFactory.create(event.getStream());</source>
+
+ <p>The expression <code>event.getStream()</code> returns the input stream
+ containing the bytes of the property set stream named
+ <em>\005SummaryInformation</em>. This stream is passed into the
+ <code>create</code> method of the factory class
+ <code>org.apache.poi.hpsf.PropertySetFactory</code> which returns
+ a <code>org.apache.poi.hpsf.PropertySet</code> instance. It is more or
+ less safe to cast this result to <code>SummaryInformation</code>, a
+ convenience class with methods like <code>getTitle()</code>,
+ <code>getAuthor()</code> etc.</p>
+
+ <p>The <code>PropertySetFactory.create()</code> method may throw all sorts
+ of exceptions. We'll deal with them in the next sections. For now we just
+ catch all exceptions and throw a <code>RuntimeException</code>
+ containing the message text of the origin exception.</p>
+
+ <p>If all goes well, the sample application retrieves the title and prints
+ it to the standard output. As you can see you must be prepared for the
+ case that the POI filesystem does not have a title.</p>
+
+ <source>final String title = si.getTitle();
+if (title != null)
+ System.out.println("Title: \"" + title + "\"");
+else
+ System.out.println("Document has no title.");</source>
+
+ <p>Please note that a POI filesystem does not necessarily contain the
+ <em>\005SummaryInformation</em> stream. The documents created by the
+ Microsoft Office suite have one, as far as I know. However, an Excel
+ spreadsheet exported from StarOffice 5.2 won't have a
+ <em>\005SummaryInformation</em> stream. In this case the applications
+ won't throw an exception but simply does not call the
+ <code>processPOIFSReaderEvent</code> method. You have been warned!</p>
+ </section>
+ </section>
+
+ <anchor id="sec2"/>
+ <section><title>Additional Standard Properties, Exceptions And Embedded
+ Objects</title>
+
+ <note>This section focusses on reading additional standard properties which
+ are kept in the <strong>document summary information</strong> stream. It
+ also talks about exceptions that may be thrown when dealing with HPSF and
+ shows how you can read properties of embedded objects.</note>
+
+ <p>A couple of <strong>additional standard properties</strong> are not
+ contained in the <em>\005SummaryInformation</em> stream explained
+ above. Examples for such properties are a document's category or the
+ number of multimedia clips in a PowerPoint presentation. Microsoft has
+ invented an additional stream named
+ <em>\005DocumentSummaryInformation</em> to hold these properties. With two
+ minor exceptions you can proceed exactly as described above to read the
+ properties stored in <em>\005DocumentSummaryInformation</em>:</p>
+
+ <ul>
+ <li>Instead of <em>\005SummaryInformation</em> use
+ <em>\005DocumentSummaryInformation</em> as the stream's name.</li>
+ <li>Replace all occurrences of the class
+ <code>SummaryInformation</code> by
+ <code>DocumentSummaryInformation</code>.</li>
+ </ul>
+
+ <p>And of course you cannot call <code>getTitle()</code> because
+ <code>DocumentSummaryInformation</code> has different query methods,
+ e.g. <code>getCategory</code>. See the Javadoc API documentation for the
+ details.</p>
+
+ <p>In the previous section the application simply caught all
+ <strong>exceptions</strong> and was in no way interested in any
+ details. However, a real application will likely want to know what went
+ wrong and act appropriately. Besides any I/O exceptions there are three
+ HPSF resp. POI specific exceptions you should know about:</p>
+
+ <dl>
+ <dt><code>NoPropertySetStreamException</code>:</dt>
+ <dd>
+ This exception is thrown if the application tries to create a
+ <code>PropertySet</code> instance from a stream that is not a
+ property set stream. (<code>SummaryInformation</code> and
+ <code>DocumentSummaryInformation</code> are subclasses of
+ <code>PropertySet</code>.) A faulty property set stream counts as not
+ being a property set stream at all. An application should be prepared to
+ deal with this case even if it opens streams named
+ <em>\005SummaryInformation</em> or
+ <em>\005DocumentSummaryInformation</em>. These are just names. A
+ stream's name by itself does not ensure that the stream contains the
+ expected contents and that this contents is correct.
+ </dd>
+
+ <dt><code>UnexpectedPropertySetTypeException</code></dt>
+ <dd>This exception is thrown if a certain type of property set is
+ expected somewhere (e.g. a <code>SummaryInformation</code> or
+ <code>DocumentSummaryInformation</code>) but the provided property
+ set is not of that type.</dd>
+
+ <dt><code>MarkUnsupportedException</code></dt>
+ <dd>This exception is thrown if an input stream that is to be parsed
+ into a property set does not support the
+ <code>InputStream.mark(int)</code> operation. The POI filesystem uses
+ the <code>DocumentInputStream</code> class which does support this
+ operation, so you are safe here. However, if you read a property set
+ stream from another kind of input stream things may be
+ different.</dd>
+ </dl>
+
+ <p>Many Microsoft Office documents contain <strong>embedded
+ objects</strong>, for example an Excel sheet within a Word
+ document. Embedded objects may have property sets of their own. An
+ application can open these property set streams as described above. The
+ only difference is that they are not located in the POI filesystem's root
+ but in a <strong>nested directory</strong> instead. Just register a
+ <code>POIFSReaderListener</code> for the property set streams you are
+ interested in.</p>
+ </section>
+
+
+
+ <anchor id="sec3"/>
+ <section><title>Writing Standard Properties</title>
+
+ <note>This section explains how to <strong>write standard
+ properties</strong>. HPSF provides some high-level classes and methods
+ which make writing of standard properties easy. They are based on the
+ low-level writing functions explained in <a href="#sec4">another
+ section</a>.</note>
+
+ <p>As explained above, standard properties are located in the summary
+ information and document summary information streams of typical POI
+ filesystems. You have already learned about the classes
+ <code>SummaryInformation</code> and
+ <code>DocumentSummaryInformation</code> and their <code>get...()</code>
+ methods for reading standard properties. These classes also provide
+ <code>set...()</code> methods for writing properties.</p>
+
+ <p>After setting properties in <code>SummaryInformation</code> or
+ <code>DocumentSummaryInformation</code> you have to write them to a disk
+ file. The following sample program shows how you can</p>
+
+ <ol>
+ <li>read a disk file into a POI filesystem,</li>
+ <li>read the document summary information from the POI filesystem,</li>
+ <li>set a property to a new value,</li>
+ <li>write the modified document summary information back to the POI
+ filesystem, and</li>
+ <li>write the POI filesystem to a disk file.</li>
+ </ol>
+
+ <p>The complete source code of this program is available as
+ <em>ModifyDocumentSummaryInformation.java</em> in the <em>examples</em>
+ section of the POI source tree.</p>
+
+ <note>Dealing with the summary information stream is analogous to handling
+ the document summary information and therefore does not need to be
+ explained here in detailed. See the HPSF API documentation to learn about
+ the <code>set...()</code> methods of the class
+ <code>SummaryInformation</code>.</note>
+
+ <p>The first step is to read the POI filesystem into memory:</p>
+
+ <source>InputStream is = new FileInputStream(poiFilesystem);
+POIFSFileSystem poifs = new POIFSFileSystem(is);
+is.close();</source>
+
+ <p>The code snippet above assumes that the variable
+ <code>poiFilesystem</code> holds the name of a disk file. It reads the
+ file from an input stream and creates a <code>POIFSFileSystem</code>
+ object in memory. After having read the file, the input stream should be
+ closed as shown.</p>
+
+ <p>In order to read the document summary information stream the application
+ must open the element <em>\005DocumentSummaryInformation</em> in the POI
+ filesystem's root directory. However, the POI filesystem does not
+ necessarily contain a document summary information stream, and the
+ application should be able to deal with that situation. The following
+ code does so by creating a new <code>DocumentSummaryInformation</code> if
+ there is none in the POI filesystem:</p>
+
+ <source>DirectoryEntry dir = poifs.getRoot();
+DocumentSummaryInformation dsi;
+try
+{
+ DocumentEntry dsiEntry = (DocumentEntry)
+ dir.getEntry(DocumentSummaryInformation.DEFAULT_STREAM_NAME);
+ DocumentInputStream dis = new DocumentInputStream(dsiEntry);
+ PropertySet ps = new PropertySet(dis);
+ dis.close();
+ dsi = new DocumentSummaryInformation(ps);
+}
+catch (FileNotFoundException ex)
+{
+ /* There is no document summary information. We have to create a
+ * new one. */
+ dsi = PropertySetFactory.newDocumentSummaryInformation();
+}
+ </source>
+
+ <p>In the source code above the statement</p>
+
+ <source>DirectoryEntry dir = poifs.getRoot();</source>
+
+ <p>gets hold of the POI filesystem's root directory as a
+ <code>DirectoryEntry</code>. The <code>getEntry()</code> method of this
+ class is used to access a file or directory entry in a directory. However,
+ if the file to be opened does not exist, a
+ <code>FileNotFoundException</code> will be thrown. Therefore opening the
+ document summary information entry should be done in a <code>try</code>
+ block:</p>
+
+ <source> DocumentEntry dsiEntry = (DocumentEntry)
+ dir.getEntry(DocumentSummaryInformation.DEFAULT_STREAM_NAME);</source>
+
+ <p><code>DocumentSummaryInformation.DEFAULT_STREAM_NAME</code> represents
+ the string "\005DocumentSummaryInformation", i.e. the standard name of a
+ document summary information stream. If this stream exists, the
+ <code>getEntry()</code> method returns a <code>DocumentEntry</code>. To
+ read the <code>DocumentEntry</code>'s contents, create a
+ <code>DocumentInputStream</code>:</p>
+
+ <source> DocumentInputStream dis = new DocumentInputStream(dsiEntry);</source>
+
+ <p>Up to this point we have used POI's <a
+ href="../poifs/index.html">POIFS component</a>. Now HPSF enters the
+ stage. A property set is created from the input stream's data:</p>
+
+ <source> PropertySet ps = new PropertySet(dis);
+ dis.close();
+ dsi = new DocumentSummaryInformation(ps); </source>
+
+ <p>If the data really constitutes a property set, a
+ <code>PropertySet</code> object is created. Otherwise a
+ <code>NoPropertySetStreamException</code> is thrown. After having read the
+ data from the input stream the latter should be closed.</p>
+
+ <p>Since we know - or at least hope - that the stream named
+ "\005DocumentSummaryInformation" is not just any property set but really
+ contains the document summary information, we try to create a new
+ <code>DocumentSummaryInformation</code> from the property set. If the
+ stream is not document summary information stream the sample application
+ fails with a <code>UnexpectedPropertySetTypeException</code>.</p>
+
+ <p>If the POI document does not contain a document summary information
+ stream, we can create a new one in the <code>catch</code> clause. The
+ <code>PropertySetFactory</code>'s method
+ <code>newDocumentSummaryInformation()</code> establishes a new and empty
+ <code>DocumentSummaryInformation</code> instance:</p>
+
+ <source> dsi = PropertySetFactory.newDocumentSummaryInformation();</source>
+
+ <p>Whether we read the document summary information from the POI filesystem
+ or created it from scratch, in either case we now have a
+ <code>DocumentSummaryInformation</code> instance we can write to. Writing
+ is quite simple, as the following line of code shows:</p>
+
+ <source>dsi.setCategory("POI example");</source>
+
+ <p>This statement sets the "category" property to "POI example". Any
+ former "category" value will be lost. If there hasn't been a "category"
+ property yet, a new one will be created.</p>
+
+ <p><code>DocumentSummaryInformation</code> of course has methods to set the
+ other standard properties, too - look into the API documentation to see
+ all of them.</p>
+
+ <p>Once all properties are set as needed, they should be stored into the
+ file on disk. The first step is to write the
+ <code>DocumentSummaryInformation</code> into the POI filesystem:</p>
+
+ <source>dsi.write(dir, DocumentSummaryInformation.DEFAULT_STREAM_NAME);</source>
+
+ <p>The <code>DocumentSummaryInformation</code>'s <code>write()</code>
+ method takes two parameters: The first is the <code>DirectoryEntry</code>
+ in the POI filesystem, the second is the name of the stream to create in
+ the directory. If this stream already exists, it will be overwritten.</p>
+
+ <note>If you not only modified the document summary information but also
+ the summary information you have to write both of them to the POI
+ filesystem.</note>
+
+ <p>Still the POI filesystem is a data structure in memory only and must be
+ written to a disk file to make it permanent. The following lines write
+ back the POI filesystem to the file it was read from before. Please note
+ that in production-quality code you should never write directly to the
+ origin file, because in case of an error everything would be lost. Here it
+ is done this way to keep the example short.</p>
+
+ <source>OutputStream out = new FileOutputStream(poiFilesystem);
+poifs.writeFilesystem(out);
+out.close();</source>
+
+ <section><title>User-Defined Properties</title>
+
+ <p>If you compare the source code excerpts above with the file containing
+ the full source code, you will notice that I left out some following
+ lines of code. The are dealing with the special topic of custom
+ properties.</p>
+
+ <source>DocumentSummaryInformation dsi = ...
+...
+CustomProperties customProperties = dsi.getCustomProperties();
+if (customProperties == null)
+ customProperties = new CustomProperties();
+
+/* Insert some custom properties into the container. */
+customProperties.put("Key 1", "Value 1");
+customProperties.put("Schlssel 2", "Wert 2");
+customProperties.put("Sample Number", new Integer(12345));
+customProperties.put("Sample Boolean", new Boolean(true));
+customProperties.put("Sample Date", new Date());
+
+/* Read a custom property. */
+Object value = customProperties.get("Sample Number");
+
+/* Write the custom properties back to the document summary
+ * information. */
+dsi.setCustomProperties(customProperties);</source>
+
+ <p>Custom properties are properties the user can define himself. Using for
+ example Microsoft Word he can define these extra properties and give
+ each of them a <strong>name</strong>, a <strong>type</strong> and a
+ <strong>value</strong>. The custom properties are stored in the document
+ information summary along with the standard properties.</p>
+
+ <p>The source code example shows how to retrieve the custom properties
+ as a whole from a <code>DocumentSummaryInformation</code> instance using
+ the <code>getCustomProperties()</code> method. The result is a
+ <code>CustomProperties</code> instance or <code>null</code> if no
+ user-defined properties exist.</p>
+
+ <p>Since <code>CustomProperties</code> implements the <code>Map</code>
+ interface you can read and write properties with the usual
+ <code>Map</code> methods. However, <code>CustomProperties</code> poses
+ some restrictions on the types of keys and values.</p>
+
+ <ul>
+ <li>The <strong>key</strong> is a string.</li>
+ <li>The <strong>value</strong> is one of <code>String</code>,
+ <code>Boolean</code>, <code>Long</code>, <code>Integer</code>,
+ <code>Short</code>, or <code>java.util.Date</code>.</li>
+ </ul>
+
+ <p>The <code>CustomProperties</code> class has been designed for easy
+ access using just keys and values. The underlying Microsoft-specific
+ custom properties data structure is more complicated. However, it does
+ not provide noteworthy additional benefits. It is possible to have
+ multiple properties with the same name or properties without a
+ name at all. When reading custom properties from a document summary
+ information stream, the <code>CustomProperties</code> class ignores
+ properties without a name and keeps only the "last" (whatever that means)
+ of those properties having the same name. You can find out whether a
+ <code>CustomProperties</code> instance dropped any properties with the
+ <code>isPure()</code> method.</p>
+
+ <p>You can read and write the full spectrum of custom properties with
+ HPSF's low-level methods. They are explained in the <a
+ href="#sec4">next section</a>.</p>
+ </section>
+ </section>
+
+
+
+ <anchor id="sec4"/>
+ <section><title>Reading Non-Standard Properties</title>
+
+ <note>This section tells how to read non-standard properties. Non-standard
+ properties are application-specific ID/type/value triples.</note>
+
+ <section><title>Overview</title>
+ <p>Now comes the real hardcode stuff. As mentioned above,
+ <code>SummaryInformation</code> and
+ <code>DocumentSummaryInformation</code> are just special cases of the
+ general concept of a property set. This concept says that a
+ <strong>property set</strong> consists of properties and that each
+ <strong>property</strong> is an entity with an <strong>ID</strong>, a
+ <strong>type</strong>, and a <strong>value</strong>.</p>
+
+ <p>Okay, that was still rather easy. However, to make things more
+ complicated, Microsoft in its infinite wisdom decided that a property set
+ shalt be broken into one or more <strong>sections</strong>. Each section
+ holds a bunch of properties. But since that's still not complicated
+ enough, a section may have an optional <strong>dictionary</strong> that
+ maps property IDs to <strong>property names</strong> - we'll explain
+ later what that means.</p>
+
+ <p>The procedure to get to the properties is the following:</p>
+
+ <ol>
+ <li>Use the <strong><code>PropertySetFactory</code></strong> class to
+ create a <code>PropertySet</code> object from a property set stream. If
+ you don't know whether an input stream is a property set stream, just
+ try to call <code>PropertySetFactory.create(java.io.InputStream)</code>:
+ You'll either get a <code>PropertySet</code> instance returned or an
+ exception is thrown.</li>
+
+ <li>Call the <code>PropertySet</code>'s method <code>getSections()</code>
+ to get the sections contained in the property set. Each section is
+ an instance of the <code>Section</code> class.</li>
+
+ <li>Each section has a format ID. The format ID of the first section in a
+ property set determines the property set's type. For example, the first
+ (and only) section of the summary information property set has a format
+ ID of <code>F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9</code>. You can
+ get the format ID with <code>Section.getFormatID()</code>.</li>
+
+ <li>The properties contained in a <code>Section</code> can be retrieved
+ with <code>Section.getProperties()</code>. The result is an array of
+ <code>Property</code> instances.</li>
+
+ <li>A property has a name, a type, and a value. The <code>Property</code>
+ class has methods to retrieve them.</li>
+ </ol>
+ </section>
+
+ <section><title>A Sample Application</title>
+ <p>Let's have a look at a sample Java application that dumps all property
+ set streams contained in a POI file system. The full source code of this
+ program can be found as <em>ReadCustomPropertySets.java</em> in the
+ <em>examples</em> area of the POI source code tree. Here are the key
+ sections:</p>
+
+ <source>import java.io.*;
+import java.util.*;
+import org.apache.poi.hpsf.*;
+import org.apache.poi.poifs.eventfilesystem.*;
+import org.apache.poi.util.HexDump;</source>
+
+ <p>The most important package the application needs is
+ <code>org.apache.poi.hpsf.*</code>. This package contains the HPSF
+ classes. Most classes named below are from the HPSF package. Of course we
+ also need the POIFS event file system's classes and <code>java.io.*</code>
+ since we are dealing with POI I/O. From the <code>java.util</code> package
+ we use the <code>List</code> and <code>Iterator</code> class. The class
+ <code>org.apache.poi.util.HexDump</code> provides a methods to dump byte
+ arrays as nicely formatted strings.</p>
+
+ <source>public static void main(String[] args)
+ throws IOException
+{
+ final String filename = args[0];
+ POIFSReader r = new POIFSReader();
+
+ /* Register a listener for *all* documents. */
+ r.registerListener(new MyPOIFSReaderListener());
+ r.read(new FileInputStream(filename));
+}</source>
+
+ <p>The <code>POIFSReader</code> is set up in a way that the listener
+ <code>MyPOIFSReaderListener</code> is called on every file in the POI file
+ system.</p>
+ </section>
+
+ <section><title>The Property Set</title>
+ <p>The listener class tries to create a <code>PropertySet</code> from each
+ stream using the <code>PropertySetFactory.create()</code> method:</p>
+
+ <source>static class MyPOIFSReaderListener implements POIFSReaderListener
+{
+ public void processPOIFSReaderEvent(POIFSReaderEvent event)
+ {
+ PropertySet ps = null;
+ try
+ {
+ ps = PropertySetFactory.create(event.getStream());
+ }
+ catch (NoPropertySetStreamException ex)
+ {
+ out("No property set stream: \"" + event.getPath() +
+ event.getName() + "\"");
+ return;
+ }
+ catch (Exception ex)
+ {
+ throw new RuntimeException
+ ("Property set stream \"" +
+ event.getPath() + event.getName() + "\": " + ex);
+ }
+
+ /* Print the name of the property set stream: */
+ out("Property set stream \"" + event.getPath() +
+ event.getName() + "\":");</source>
+
+ <p>Creating the <code>PropertySet</code> is done in a <code>try</code>
+ block, because not each stream in the POI file system contains a property
+ set. If it is some other file, the
+ <code>PropertySetFactory.create()</code> throws a
+ <code>NoPropertySetStreamException</code>, which is caught and
+ logged. Then the program continues with the next stream. However, all
+ other types of exceptions cause the program to terminate by throwing a
+ runtime exception. If all went well, we can print the name of the property
+ set stream.</p>
+ </section>
+
+ <section><title>The Sections</title>
+ <p>The next step is to print the number of sections followed by the
+ sections themselves:</p>
+
+ <source>/* Print the number of sections: */
+final long sectionCount = ps.getSectionCount();
+out(" No. of sections: " + sectionCount);
+
+/* Print the list of sections: */
+List sections = ps.getSections();
+int nr = 0;
+for (Iterator i = sections.iterator(); i.hasNext();)
+{
+ /* Print a single section: */
+ Section sec = (Section) i.next();
+
+ // See below for the complete loop body.
+}</source>
+
+ <p>The <code>PropertySet</code>'s method <code>getSectionCount()</code>
+ returns the number of sections.</p>
+
+ <p>To retrieve the sections, use the <code>getSections()</code>
+ method. This method returns a <code>java.util.List</code> containing
+ instances of the <code>Section</code> class in their proper order.</p>
+
+ <p>The sample code shows a loop that retrieves the <code>Section</code>
+ objects one by one and prints some information about each one. Here is
+ the complete body of the loop:</p>
+
+ <source>/* Print a single section: */
+Section sec = (Section) i.next();
+out(" Section " + nr++ + ":");
+String s = hex(sec.getFormatID().getBytes());
+s = s.substring(0, s.length() - 1);
+out(" Format ID: " + s);
+
+/* Print the number of properties in this section. */
+int propertyCount = sec.getPropertyCount();
+out(" No. of properties: " + propertyCount);
+
+/* Print the properties: */
+Property[] properties = sec.getProperties();
+for (int i2 = 0; i2 &lt; properties.length; i2++)
+{
+ /* Print a single property: */
+ Property p = properties[i2];
+ int id = p.getID();
+ long type = p.getType();
+ Object value = p.getValue();
+ out(" Property ID: " + id + ", type: " + type +
+ ", value: " + value);
+}</source>
+ </section>
+
+ <section><title>The Section's Format ID</title>
+ <p>The first method called on the <code>Section</code> instance is
+ <code>getFormatID()</code>. As explained above, the format ID of the
+ first section in a property set determines the type of the property
+ set. Its type is <code>ClassID</code> which is essentially a sequence of
+ 16 bytes. A real application using its own type of a custom property set
+ should have defined a unique format ID and, when reading a property set
+ stream, should check the format ID is equal to that unique format ID. The
+ sample program just prints the format ID it finds in a section:</p>
+
+ <source>String s = hex(sec.getFormatID().getBytes());
+s = s.substring(0, s.length() - 1);
+out(" Format ID: " + s);</source>
+
+ <p>As you can see, the <code>getFormatID()</code> method returns a
+ <code>ClassID</code> object. An array containing the bytes can be
+ retrieved with <code>ClassID.getBytes()</code>. In order to get a nicely
+ formatted printout, the sample program uses the <code>hex()</code> helper
+ method which in turn uses the POI utility class <code>HexDump</code> in
+ the <code>org.apache.poi.util</code> package. Another helper method is
+ <code>out()</code> which just saves typing
+ <code>System.out.println()</code>.</p>
+ </section>
+
+ <section><title>The Properties</title>
+ <p>Before getting the properties, it is possible to find out how many
+ properties are available in the section via the
+ <code>Section.getPropertyCount()</code>. The sample application uses this
+ method to print the number of properties to the standard output:</p>
+
+ <source>int propertyCount = sec.getPropertyCount();
+out(" No. of properties: " + propertyCount);</source>
+
+ <p>Now its time to get to the properties themselves. You can retrieve a
+ section's properties with the method
+ <code>Section.getProperties()</code>:</p>
+
+ <source>Property[] properties = sec.getProperties();</source>
+
+ <p>As you can see the result is an array of <code>Property</code>
+ objects. This class has three methods to retrieve a property's ID, its
+ type, and its value. The following code snippet shows how to call
+ them:</p>
+
+ <source>for (int i2 = 0; i2 &lt; properties.length; i2++)
+{
+ /* Print a single property: */
+ Property p = properties[i2];
+ int id = p.getID();
+ long type = p.getType();
+ Object value = p.getValue();
+ out(" Property ID: " + id + ", type: " + type +
+ ", value: " + value);
+}</source>
+ </section>
+
+ <section><title>Sample Output</title>
+ <p>The output of the sample program might look like the following. It
+ shows the summary information and the document summary information
+ property sets of a Microsoft Word document. However, unlike the first and
+ second section of this HOW-TO the application does not have any code
+ which is specific to the <code>SummaryInformation</code> and
+ <code>DocumentSummaryInformation</code> classes.</p>
+
+ <source>Property set stream "/SummaryInformation":
+ No. of sections: 1
+ Section 0:
+ Format ID: 00000000 F2 9F 85 E0 4F F9 10 68 AB 91 08 00 2B 27 B3 D9 ....O..h....+'..
+ No. of properties: 17
+ Property ID: 1, type: 2, value: 1252
+ Property ID: 2, type: 30, value: Titel
+ Property ID: 3, type: 30, value: Thema
+ Property ID: 4, type: 30, value: Rainer Klute (Autor)
+ Property ID: 5, type: 30, value: Test (Stichwrter)
+ Property ID: 6, type: 30, value: This is a document for testing HPSF
+ Property ID: 7, type: 30, value: Normal.dot
+ Property ID: 8, type: 30, value: Unknown User
+ Property ID: 9, type: 30, value: 3
+ Property ID: 18, type: 30, value: Microsoft Word 9.0
+ Property ID: 12, type: 64, value: Mon Jan 01 00:59:25 CET 1601
+ Property ID: 13, type: 64, value: Thu Jul 18 16:22:00 CEST 2002
+ Property ID: 14, type: 3, value: 1
+ Property ID: 15, type: 3, value: 20
+ Property ID: 16, type: 3, value: 93
+ Property ID: 19, type: 3, value: 0
+ Property ID: 17, type: 71, value: [B@13582d
+Property set stream "/DocumentSummaryInformation":
+ No. of sections: 2
+ Section 0:
+ Format ID: 00000000 D5 CD D5 02 2E 9C 10 1B 93 97 08 00 2B 2C F9 AE ............+,..
+ No. of properties: 14
+ Property ID: 1, type: 2, value: 1252
+ Property ID: 2, type: 30, value: Test
+ Property ID: 14, type: 30, value: Rainer Klute (Manager)
+ Property ID: 15, type: 30, value: Rainer Klute IT-Consulting GmbH
+ Property ID: 5, type: 3, value: 3
+ Property ID: 6, type: 3, value: 2
+ Property ID: 17, type: 3, value: 111
+ Property ID: 23, type: 3, value: 592636
+ Property ID: 11, type: 11, value: false
+ Property ID: 16, type: 11, value: false
+ Property ID: 19, type: 11, value: false
+ Property ID: 22, type: 11, value: false
+ Property ID: 13, type: 4126, value: [B@56a499
+ Property ID: 12, type: 4108, value: [B@506411
+ Section 1:
+ Format ID: 00000000 D5 CD D5 05 2E 9C 10 1B 93 97 08 00 2B 2C F9 AE ............+,..
+ No. of properties: 7
+ Property ID: 0, type: 0, value: {6=Test-JaNein, 5=Test-Zahl, 4=Test-Datum, 3=Test-Text, 2=_PID_LINKBASE}
+ Property ID: 1, type: 2, value: 1252
+ Property ID: 2, type: 65, value: [B@c9ba38
+ Property ID: 3, type: 30, value: This is some text.
+ Property ID: 4, type: 64, value: Wed Jul 17 00:00:00 CEST 2002
+ Property ID: 5, type: 3, value: 27
+ Property ID: 6, type: 11, value: true
+No property set stream: "/WordDocument"
+No property set stream: "/CompObj"
+No property set stream: "/1Table"</source>
+
+ <p>There are some interesting items to note:</p>
+
+ <ul>
+ <li>The first property set (summary information) consists of a single
+ section, the second property set (document summary information) consists
+ of two sections.</li>
+
+ <li>Each section type (identified by its format ID) has its own domain of
+ property ID. For example, in the second property set the properties with
+ ID 2 have different meanings in the two section. By the way, the format
+ IDs of these sections are <strong>not</strong> equal, but you have to
+ look hard to find the difference.</li>
+
+ <li>The properties are not in any particular order in the section,
+ although they slightly tend to be sorted by their IDs.</li>
+ </ul>
+ </section>
+
+ <section><title>Property IDs</title>
+ <p>Properties in the same section are distinguished by their IDs. This is
+ similar to variables in a programming language like Java, which are
+ distinguished by their names. But unlike variable names, property IDs are
+ simple integral numbers. There is another similarity, however. Just like
+ a Java variable has a certain scope (e.g. a member variables in a class),
+ a property ID also has its scope of validity: the section.</p>
+
+ <p>Two property IDs in sections with different section format IDs
+ don't have the same meaning even though their IDs might be equal. For
+ example, ID 4 in the first (and only) section of a summary
+ information property set denotes the document's author, while ID 4 in the
+ first section of the document summary information property set means the
+ document's byte count. The sample output above does not show a property
+ with an ID of 4 in the first section of the document summary information
+ property set. That means that the document does not have a byte
+ count. However, there is a property with an ID of 4 in the
+ <em>second</em> section: This is a user-defined property ID - we'll get
+ to that topic in a minute.</p>
+
+ <p>So, how can you find out what the meaning of a certain property ID in
+ the summary information and the document summary information property set
+ is? The standard property sets as such don't have any hints about the
+ <strong>meanings of their property IDs</strong>. For example, the summary
+ information property set does not tell you that the property ID 4 stands
+ for the document's author. This is external knowledge. Microsoft defined
+ standard meanings for some of the property IDs in the summary information
+ and the document summary information property sets. As a help to the Java
+ and POI programmer, the class <code>PropertyIDMap</code> in the
+ <code>org.apache.poi.hpsf.wellknown</code> package defines constants
+ for the "well-known" property IDs. For example, there is the
+ definition</p>
+
+ <source>public final static int PID_AUTHOR = 4;</source>
+
+ <p>These definitions allow you to use symbolic names instead of
+ numbers.</p>
+
+ <p>In order to provide support for the other way, too, - i.e. to map
+ property IDs to property names - the class <code>PropertyIDMap</code>
+ defines two static methods:
+ <code>getSummaryInformationProperties()</code> and
+ <code>getDocumentSummaryInformationProperties()</code>. Both return
+ <code>java.util.Map</code> objects which map property IDs to
+ strings. Such a string gives a hint about the property's meaning. For
+ example,
+ <code>PropertyIDMap.getSummaryInformationProperties().get(4)</code>
+ returns the string "PID_AUTHOR". An application could use this string as
+ a key to a localized string which is displayed to the user, e.g. "Author"
+ in English or "Verfasser" in German. HPSF might provide such
+ language-dependent ("localized") mappings in a later release.</p>
+
+ <p>Usually you won't have to deal with those two maps. Instead you should
+ call the <code>Section.getPIDString(int)</code> method. It returns the
+ string associated with the specified property ID in the context of the
+ <code>Section</code> object.</p>
+
+ <p>Above you learned that property IDs have a meaning in the scope of a
+ section only. However, there are two exceptions to the rule: The property
+ IDs 0 and 1 have a fixed meaning in <strong>all</strong> sections:</p>
+
+ <table>
+ <tr>
+ <th>Property ID</th>
+ <th>Meaning</th>
+ </tr>
+
+ <tr>
+ <td>0</td>
+ <td>The property's value is a <strong>dictionary</strong>, i.e. a
+ mapping from property IDs to strings.</td>
+ </tr>
+
+ <tr>
+ <td>1</td>
+ <td>The property's value is the number of a <strong>codepage</strong>,
+ i.e. a mapping from character codes to characters. All strings in the
+ section containing this property must be interpreted using this
+ codepage. Typical property values are 1252 (8-bit "western" characters,
+ ISO-8859-1), 1200 (16-bit Unicode characters, UFT-16), or 65001 (8-bit
+ Unicode characters, UFT-8).</td>
+ </tr>
+ </table>
+ </section>
+
+ <section><title>Property types</title>
+ <p>A property is nothing without its value. It is stored in a property set
+ stream as a sequence of bytes. You must know the property's
+ <strong>type</strong> in order to properly interpret those bytes and
+ reasonably handle the value. A property's type is one of the so-called
+ Microsoft-defined <strong>"variant types"</strong>. When you call
+ <code>Property.getType()</code> you'll get a <code>long</code> value
+ which denoting the property's variant type. The class
+ <code>Variant</code> in the <code>org.apache.poi.hpsf</code> package
+ holds most of those <code>long</code> values as named constants. For
+ example, the constant <code>VT_I4 = 3</code> means a signed integer value
+ of four bytes. Examples of other types are <code>VT_LPSTR = 30</code>
+ meaning a null-terminated string of 8-bit characters, <code>VT_LPWSTR =
+ 31</code> which means a null-terminated Unicode string, or <code>VT_BOOL
+ = 11</code> denoting a boolean value.</p>
+
+ <p>In most cases you won't need a property's type because HPSF does all
+ the work for you.</p>
+ </section>
+
+ <section><title>Property values</title>
+ <p>When an application wants to retrieve a property's value and calls
+ <code>Property.getValue()</code>, HPSF has to interpret the bytes making
+ out the value according to the property's type. The type determines how
+ many bytes the value consists of and what
+ to do with them. For example, if the type is <code>VT_I4</code>, HPSF
+ knows that the value is four bytes long and that these bytes
+ comprise a signed integer value in the little-endian format. This is
+ quite different from e.g. a type of <code>VT_LPWSTR</code>. In this case
+ HPSF has to scan the value bytes for a Unicode null character and collect
+ everything from the beginning to that null character as a Unicode
+ string.</p>
+
+ <p>The good new is that HPSF does another job for you, too: It maps the
+ variant type to an adequate Java type.</p>
+
+ <table>
+ <tr>
+ <th>Variant type:</th>
+ <th>Java type:</th>
+ </tr>
+
+ <tr>
+ <td>VT_I2</td>
+ <td>java.lang.Integer</td>
+ </tr>
+
+ <tr>
+ <td>VT_I4</td>
+ <td>java.lang.Long</td>
+ </tr>
+
+ <tr>
+ <td>VT_FILETIME</td>
+ <td>java.util.Date</td>
+ </tr>
+
+ <tr>
+ <td>VT_LPSTR</td>
+ <td>java.lang.String</td>
+ </tr>
+
+ <tr>
+ <td>VT_LPWSTR</td>
+ <td>java.lang.String</td>
+ </tr>
+
+ <tr>
+ <td>VT_CF</td>
+ <td>byte[]</td>
+ </tr>
+
+ <tr>
+ <td>VT_BOOL</td>
+ <td>java.lang.Boolean</td>
+ </tr>
+
+ </table>
+
+ <p>The bad news is that there are still a couple of variant types HPSF
+ does not yet support. If it encounters one of these types it
+ returns the property's value as a byte array and leaves it to be
+ interpreted by the application.</p>
+
+ <p>An application retrieves a property's value by calling the
+ <code>Property.getValue()</code> method. This method's return type is the
+ abstract <code>Object</code> class. The <code>getValue()</code> method
+ looks up the property's variant type, reads the property's value bytes,
+ creates an instance of an adequate Java type, assigns it the property's
+ value and returns it. Primitive types like <code>int</code> or
+ <code>long</code> will be returned as the corresponding class,
+ e.g. <code>Integer</code> or <code>Long</code>.</p>
+ </section>
+
+
+ <section><title>Dictionaries</title>
+ <p>The property with ID 0 has a very special meaning: It is a
+ <strong>dictionary</strong> mapping property IDs to property names. We
+ have seen already that the meanings of standard properties in the
+ summary information and the document summary information property sets
+ have been defined by Microsoft. The advantage is that the labels of
+ properties like "Author" or "Title" don't have to be stored in the
+ property set. However, a user can define custom fields in, say, Microsoft
+ Word. For each field the user has to specify a name, a type, and a
+ value.</p>
+
+ <p>The names of the custom-defined fields (i.e. the property names) are
+ stored in the document summary information second section's
+ <strong>dictionary</strong>. The dictionary is a map which associates
+ property IDs with property names.</p>
+
+ <p>The method <code>Section.getPIDString(int)</code> not only returns with
+ the well-known property names of the summary information and document
+ summary information property sets, but with self-defined properties,
+ too. It should also work with self-defined properties in self-defined
+ sections.</p>
+ </section>
+
+ <section><title>Codepage support</title>
+
+ <p>The property with ID 1 holds the number of the codepage which was used
+ to encode the strings in this section. If this property is not available
+ in a section, the platform's default character encoding will be
+ used. This works fine as long as the document being read has been written
+ on a platform with the same default character encoding. However, if you
+ receive a document from another region of the world and the codepage is
+ undefined, you are in trouble.</p>
+
+ <p>HPSF's codepage support is only as good as the character encoding
+ support of the Java Virtual Machine (JVM) the application runs on. If
+ HPSF encounters a codepage number it assumes that the JVM has a character
+ encoding with a corresponding name. For example, if the codepage is 1252,
+ HPSF uses the character encoding "cp1252" to read or write strings. If
+ the JVM does not have that character encoding installed or if the
+ codepage number is illegal, an UnsupportedEncodingException will be
+ thrown. This works quite well with Java 2 Standard Edition (J2SE)
+ versions since 1.4. However, under J2SE 1.3 or lower you are out of
+ luck. You should install a newer J2SE version to process codepages with
+ HPSF.</p>
+
+ <p>There are some exceptions to the rule saying that a character
+ encoding's name is derived from the codepage number by prepending the
+ string "cp" to it. In these cases the codepage number is mapped to a
+ well-known character encoding name. Here are a few examples:</p>
+
+ <dl>
+ <dt>Codepage 932</dt>
+ <dd>is mapped to the character encoding "SJIS".</dd>
+ <dt>Codepage 1200</dt>
+ <dd>is mapped to the character encoding "UTF-16".</dd>
+ <dt>Codepage 65001</dt>
+ <dd>is mapped to the character encoding "UTF-8".</dd>
+ </dl>
+
+ <p>More of these mappings between codepage and character encoding name are
+ hard-coded in the classes <code>org.apache.poi.hpsf.Constants</code> and
+ <code>org.apache.poi.hpsf.VariantSupport</code>. Probably there will be a
+ need to add more mappings. The HPSF author will appreciate any hints.</p>
+ </section>
+ </section>
+
+ <anchor id="sec5"/>
+ <section><title>Writing Properties</title>
+
+ <note>This section describes how to write properties.</note>
+
+ <section><title>Overview of Writing Properties</title>
+ <p>Writing properties is possible at a high level and at a low level:</p>
+
+ <ul>
+
+ <li>Most users will want to create or change entries in the summary
+ information or document summary information streams. </li>
+
+ <li>On the low level, there are no convenience classes or methods. You
+ have to deal with things like property IDs and variant types to write
+ properties. Therefore you should have read <a href="#sec3">section
+ 3</a> to understand the description of the low-level writing
+ functions.</li>
+ </ul>
+
+ <p>HPSF's writing capabilities come with the classes
+ <code>PropertySet</code>, <code>Section</code>,
+ <code>Property</code>, and some helper classes.</p>
+ </section>
+
+
+ <section><title>Low-Level Writing: An Overview</title>
+ <p>When you are going to write a property set stream your application has
+ to perform the following steps:</p>
+
+ <ol>
+ <li>Create a <code>PropertySet</code> instance.</li>
+
+ <li>Get hold of a <code>Section</code>. You can either retrieve
+ the one that is always present in a new <code>PropertySet</code>,
+ or you have to create a new <code>Section</code> and add it to
+ the <code>PropertySet</code>.
+ </li>
+
+ <li>Set any <code>Section</code> fields as you like.</li>
+
+ <li>Create as many <code>Property</code> objects as you need. Set
+ each property's ID, type, and value. Add the
+ <code>Property</code> objects to the <code>Section</code>.
+ </li>
+
+ <li>Create further <code>Section</code>s if you need them.</li>
+
+ <li>Eventually retrieve the property set as a byte stream using
+ <code>PropertySet.toInputStream()</code> and write it to a POIFS
+ document.</li>
+ </ol>
+ </section>
+
+ <section><title>Low-level Writing Functions In Details</title>
+ <p>Writing properties is introduced by an artificial but simple example: a
+ program creating a new document (aka POI file system) which contains only
+ a single document: a summary information property set stream. The latter
+ will hold the document's title only. This is artificial in that it does
+ not contain any Word, Excel or other kind of useful application document
+ data. A document containing just a property set is without any practical
+ use. However, it is perfectly fine for an example because it make it very
+ simple and easy to understand, and you will get used to writing
+ properties in real applications quickly.</p>
+
+ <p>The application expects the name of the POI file system to be written
+ on the command line. The title property it writes is "Sample title".</p>
+
+ <p>Here's the application's source code. You can also find it in the
+ "examples" section of the POI source code distribution. Explanations are
+ following below.</p>
+
+ <source>package org.apache.poi.hpsf.examples;
+
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+
+import org.apache.poi.hpsf.Property;
+import org.apache.poi.hpsf.PropertySet;
+import org.apache.poi.hpsf.Section;
+import org.apache.poi.hpsf.Section;
+import org.apache.poi.hpsf.SummaryInformation;
+import org.apache.poi.hpsf.Variant;
+import org.apache.poi.hpsf.WritingNotSupportedException;
+import org.apache.poi.hpsf.wellknown.PropertyIDMap;
+import org.apache.poi.hpsf.wellknown.SectionIDMap;
+import org.apache.poi.poifs.filesystem.POIFSFileSystem;
+
+/**
+ * &lt;p&gt;This class is a simple sample application showing how to create a property
+ * set and write it to disk.&lt;/p&gt;
+ *
+ * @author Rainer Klute
+ * @since 2003-09-12
+ */
+public class WriteTitle
+{
+ /**
+ * &lt;p&gt;Runs the example program.&lt;/p&gt;
+ *
+ * @param args Command-line arguments. The first and only command-line
+ * argument is the name of the POI file system to create.
+ * @throws IOException if any I/O exception occurs.
+ * @throws WritingNotSupportedException if HPSF does not (yet) support
+ * writing a certain property type.
+ */
+ public static void main(final String[] args)
+ throws WritingNotSupportedException, IOException
+ {
+ /* Check whether we have exactly one command-line argument. */
+ if (args.length != 1)
+ {
+ System.err.println("Usage: " + WriteTitle.class.getName() +
+ "destinationPOIFS");
+ System.exit(1);
+ }
+
+ final String fileName = args[0];
+
+ /* Create a mutable property set. Initially it contains a single section
+ * with no properties. */
+ final PropertySet mps = new PropertySet();
+
+ /* Retrieve the section the property set already contains. */
+ final Section ms = mps.getSections().get(0);
+
+ /* Turn the property set into a summary information property. This is
+ * done by setting the format ID of its first section to
+ * SectionIDMap.SUMMARY_INFORMATION_ID. */
+ ms.setFormatID(SectionIDMap.SUMMARY_INFORMATION_ID);
+
+ /* Create an empty property. */
+ final Property p = new Property();
+
+ /* Fill the property with appropriate settings so that it specifies the
+ * document's title. */
+ p.setID(PropertyIDMap.PID_TITLE);
+ p.setType(Variant.VT_LPWSTR);
+ p.setValue("Sample title");
+
+ /* Place the property into the section. */
+ ms.setProperty(p);
+
+ /* Create the POI file system the property set is to be written to. */
+ final POIFSFileSystem poiFs = new POIFSFileSystem();
+
+ /* For writing the property set into a POI file system it has to be
+ * handed over to the POIFS.createDocument() method as an input stream
+ * which produces the bytes making out the property set stream. */
+ final InputStream is = mps.toInputStream();
+
+ /* Create the summary information property set in the POI file
+ * system. It is given the default name most (if not all) summary
+ * information property sets have. */
+ poiFs.createDocument(is, SummaryInformation.DEFAULT_STREAM_NAME);
+
+ /* Write the whole POI file system to a disk file. */
+ poiFs.writeFilesystem(new FileOutputStream(fileName));
+ }
+
+}</source>
+
+ <p>The application first checks that there is exactly one single argument
+ on the command line: the name of the file to write. If this single
+ argument is present, the application stores it in the
+ <code>fileName</code> variable. It will be used in the end when the POI
+ file system is written to a disk file.</p>
+
+ <source>if (args.length != 1)
+{
+ System.err.println("Usage: " + WriteTitle.class.getName() +
+ "destinationPOIFS");
+ System.exit(1);
+}
+final String fileName = args[0];</source>
+
+ <p>Let's create a property set now. We cannot use the
+ <code>PropertySet</code> class, because it is read-only. It does not have
+ a constructor creating an empty property set, and it does not have any
+ methods to modify its contents, i.e. to write sections containing
+ properties into it.</p>
+
+ <p>The class to use is <code>PropertySet</code>. The sample application calls its no-args
+ constructor in order to establish an empty property set:</p>
+
+ <source>final PropertySet mps = new PropertySet();</source>
+
+ <p>As said, we have an empty property set now. Later we will put some
+ contents into it.</p>
+
+ <p>The <code>PropertySet</code> created by the no-args constructor
+ is not really empty: It contains a single section without properties. We
+ can either retrieve that section and fill it with properties or we can
+ replace it by another section. We can also add further sections to the
+ property set. The sample application decides to retrieve the section
+ being already there:</p>
+
+ <source>final Section ms = mps.getSections().get(0);</source>
+
+ <p>The <code>getSections()</code> method returns the property set's
+ sections as a list, i.e. an instance of
+ <code>java.util.List</code>. Calling <code>get(0)</code> returns the
+ list's first (or zeroth, if you prefer) element.</p>
+
+ <p>The alternative to retrieving the <code>Section</code> being
+ already there would have been to create an new
+ <code>Section</code> like this:</p>
+
+ <source>Section s = new Section();</source>
+
+ <p>The <code>Section</code> the sample application retrieved from
+ the <code>PropertySet</code> is still empty. It contains no
+ properties and does not have a format ID. As you have read <a
+ href="#sec3">above</a> the format ID of the first section in a
+ property set determines the property set's type. Since our property set
+ should become a SummaryInformation property set we have to set the format
+ ID of its first (and only) section to
+ <code>F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9</code>. However, you
+ won't have to remember that ID: HPSF has it defined as the well-known
+ constant <code>SectionIDMap.SUMMARY_INFORMATION_ID</code>. The sample
+ application writes it to the section using the
+ <code>setFormatID(byte[])</code> method:</p>
+
+ <source>ms.setFormatID(SectionIDMap.SUMMARY_INFORMATION_ID);</source>
+
+ <source>final Property p = new Property();</source>
+
+ <p>A <code>Property</code> object must have an ID, a type, and a
+ value (see <a href="#sec3">above</a> for details). The class
+ provides methods to set these attributes:</p>
+
+ <source>p.setID(PropertyIDMap.PID_TITLE);
+p.setType(Variant.VT_LPWSTR);
+p.setValue("Sample title");</source>
+
+ <p>The <code>Property</code> class has a constructor which you can
+ use to pass in all three attributes in a single call. See the Javadoc API
+ documentation for details!</p>
+
+ <p>The sample property set is complete now. We have a
+ <code>PropertySet</code> containing a <code>Section</code>
+ containing a <code>Property</code>. Of course we could have added
+ more sections to the property set and more properties to the sections but
+ we wanted to keep things simple.</p>
+
+ <p>The property set has to be written to a POI file system. The following
+ statement creates it.</p>
+
+ <source>final POIFSFileSystem poiFs = new POIFSFileSystem();</source>
+
+ <p>Writing the property set includes the step of converting it into a
+ sequence of bytes. The <code>PropertySet</code> class has the
+ method <code>toInputStream()</code> for this purpose. It returns the
+ bytes making out the property set stream as an
+ <code>InputStream</code>:</p>
+
+ <source>final InputStream is = mps.toInputStream();</source>
+
+ <p>If you'd read from this input stream you'd receive all the property
+ set's bytes. However, it is very likely that you'll never do
+ that. Instead you'll pass the input stream to the
+ <code>POIFSFileSystem.createDocument()</code> method, like this:</p>
+
+ <source>poiFs.createDocument(is, SummaryInformation.DEFAULT_STREAM_NAME);</source>
+
+ <p>Besides the <code>InputStream</code> <code>createDocument()</code>
+ takes a second parameter: the name of the document to be created. For a
+ SummaryInformation property set stream the default name is available as
+ the constant <code>SummaryInformation.DEFAULT_STREAM_NAME</code>.</p>
+
+ <p>The last step is to write the POI file system to a disk file:</p>
+
+ <source>poiFs.writeFilesystem(new FileOutputStream(fileName));</source>
+ </section>
+ </section>
+
+
+
+ <section><title>Further Reading</title>
+ <p>There are still some aspects of HSPF left which are not covered by this
+ HOW-TO. You should dig into the Javadoc API documentation to learn
+ further details. Since you've struggled through this document up to this
+ point, you are well prepared.</p>
+ </section>
+
+ </section>
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
diff --git a/src/documentation/content/xdocs/components/hpsf/index.xml b/src/documentation/content/xdocs/components/hpsf/index.xml
new file mode 100644
index 0000000000..88c4f306c5
--- /dev/null
+++ b/src/documentation/content/xdocs/components/hpsf/index.xml
@@ -0,0 +1,73 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Apache POI™ - HPSF - Java API for Microsoft Format Document
+ Properties</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Rainer Klute" email="klute@apache.org"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>Overview</title>
+
+ <p>Microsoft applications like "Word", "Excel" or "Powerpoint" let the user
+ describe a document by properties like "title", "category" and so on. The
+ application itself adds further information: last author, creation date
+ etc. These document properties are stored in <strong>property set
+ streams</strong>. A property set stream is a separate document within a
+ <a href="../poifs/index.html">POI filesystem</a>. HPSF is POI's pure-Java
+ implementation to read and write property sets.</p>
+
+ <p>The <a href="how-to.html">HPSF HOWTO</a> describes what a Java
+ application should do to read a property set using HPSF, how to retrieve
+ the information it needs, and how to write properties into the
+ document.</p>
+
+ <p>HPSF supports OLE2 property set streams in general, and is not limited to
+ the special case of document properties in the Microsoft Office files
+ mentioned above. The <a href="internals.html">HPSF description</a>
+ describes the internal structure of property set streams. A separate
+ document explains the internal of <a href="thumbnails.html">thumbnail
+ images</a>.</p>
+ </section>
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
diff --git a/src/documentation/content/xdocs/components/hpsf/internals.xml b/src/documentation/content/xdocs/components/hpsf/internals.xml
new file mode 100644
index 0000000000..4046f8859c
--- /dev/null
+++ b/src/documentation/content/xdocs/components/hpsf/internals.xml
@@ -0,0 +1,1079 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Apache POI™ - HPSF Internals</title>
+ <authors>
+ <person name="Rainer Klute" email="klute@rainer-klute.de"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>HPSF Internals</title>
+
+ <section><title>Introduction</title>
+
+ <p>A Microsoft Office document is internally organized like a filesystem
+ with directory and files. Microsoft calls these files
+ <strong>streams</strong>. A document can have properties attached to it,
+ like author, title, number of words etc. These metadata are not stored in
+ the main stream of, say, a Word document, but instead in a dedicated
+ stream with a special format. Usually this stream's name is
+ <code>\005SummaryInformation</code>, where <code>\005</code> represents
+ the character with a decimal value of 5.</p>
+
+ <p>A single piece of information in the stream is called a
+ <strong>property</strong>, for example the document title. Each property
+ has an integral <strong>ID</strong> (e.g. 2 for title), a
+ <strong>type</strong> (telling that the title is a string of bytes) and a
+ <strong>value</strong> (what this is should be obvious). A stream
+ containing properties is called a
+ <strong>property set stream</strong>.</p>
+
+ <p>This document describes the internal structure of a property set stream,
+ i.e. the <strong>HPSF</strong>. It does
+ not describe how a Microsoft Office document is organized internally and
+ how to retrieve a stream from it. See the <a
+ href="../poifs/index.html">POIFS documentation</a> for that kind of
+ stuff.</p>
+
+ <p>The HPSF is not only used in the Summary
+ Information stream in the top-level document of a Microsoft Office
+ document. Often there is also a property set stream named
+ <code>\005DocumentSummaryInformation</code> with additional properties.
+ Embedded documents may have their own property set streams. You cannot
+ tell by a stream's name whether it is a property set stream or not.
+ Instead you have to open the stream and look at its bytes.</p>
+ </section>
+
+
+
+ <section><title>Data Types</title>
+
+ <p>Before delving into the details of the property set stream format we
+ have to have a short look at data types. Integral values are stored in the
+ so-called <strong>little endian</strong> format. In this format the bytes
+ that make out an integral value are stored in the "wrong" order. For
+ example, the decimal value 4660 is 0x1234 in the hexadecimal notation. If
+ you think this should be represented by a byte 0x12 followed by another
+ byte 0x34, you are right. This is called the <strong>big endian</strong>
+ format. In the little endian format, however, this order is reversed and
+ the low-value byte comes first: 0x3412.
+ </p>
+
+ <p>The following table gives an overview about some important data
+ types:</p>
+
+ <table>
+
+ <tr>
+ <th>Name</th>
+ <th>Length</th>
+ <th>Example (Big Endian)</th>
+ <th>Example (Little Endian)</th>
+ </tr>
+
+ <tr>
+ <td><strong>Bytes</strong></td>
+ <td>1 byte</td>
+ <td><code>0x12</code></td>
+ <td><code>0x12</code></td>
+ </tr>
+
+ <tr>
+ <td><strong>Word</strong></td>
+ <td>2 bytes</td>
+ <td><code>0x1234</code></td>
+ <td><code>0x3412</code></td>
+ </tr>
+
+ <tr>
+ <td><strong>DWord</strong></td>
+ <td>4 bytes</td>
+ <td><code>0x12345678</code></td>
+ <td><code>0x78563412</code></td>
+ </tr>
+
+ <tr>
+ <td><strong>ClassID</strong><br/>
+ A sequence of one DWord, two Words and eight Bytes</td>
+
+ <td>16 bytes</td>
+
+ <td><code>0xE0859FF2F94F6810AB9108002B27B3D9</code> resp.
+ <code>E0859FF2-F94F-6810-AB-91-08-00-2B-27-B3-D9</code></td>
+
+ <td><code>0xF29F85E04FF91068AB9108002B27B3D9</code> resp.
+ <code>F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9</code></td>
+ </tr>
+
+ <tr>
+ <td></td>
+ <td></td>
+ <td>The ClassID examples are given here in two different notations. The
+ second notation without the "0x" at the beginning and with dashes
+ inside shows the internal grouping into one DWord, two Words and eight
+ Bytes.</td>
+ <td><em>Watch out:</em> Microsoft documentation and tools show class IDs
+ a little bit differently like
+ <code>F29F85E0-4FF9-1068-AB91-08002B27B3D9</code>.
+ However, that representation is (intentionally?) misleading with
+ respect to endianess.</td>
+ </tr>
+ </table>
+ </section>
+
+
+
+ <section><title>HPSF Overview</title>
+
+ <p>A property set stream consists of three main parts:</p>
+
+ <ol>
+ <li>The <strong>header</strong> and</li>
+ <li>the <strong>section(s)</strong> containing the properties.</li>
+ </ol>
+ </section>
+
+
+
+ <section><title>The Header</title>
+
+ <p>The first bytes in a property set stream is the <strong>header</strong>.
+ It has a fixed length and looks like this:</p>
+
+ <table>
+ <tr>
+ <th>Offset</th>
+ <th>Type</th>
+ <th>Contents</th>
+ <th>Remarks</th>
+ </tr>
+
+ <tr>
+ <td>0</td>
+ <td>Word</td>
+ <td><code>0xFFFE</code></td>
+ <td>If the first four bytes of a stream do not contain these values, the
+ stream is not a property set stream.</td>
+ </tr>
+
+ <tr>
+ <td>2</td>
+ <td>Word</td>
+ <td><code>0x0000</code></td>
+ <td></td>
+ </tr>
+
+ <tr>
+ <td>4</td>
+ <td>DWord</td>
+ <td>Denotes the operating system and the OS version under which this
+ stream was created. The operating system ID is in the DWord's higher
+ word (after little endian decoding): <code>0x0000</code> for Win16,
+ <code>0x0001</code> for Macintosh and <code>0x0002</code> for Win32 -
+ that's all. The reader is most likely aware of the fact that there are
+ some more operating systems. However, Microsoft does not seem to
+ know.</td>
+ <td></td>
+ </tr>
+
+ <tr>
+ <td>8</td>
+ <td>ClassID</td>
+ <td><code>0x00000000000000000000000000000000</code></td>
+ <td>Most property set streams have this value but this is not
+ required.</td>
+ </tr>
+
+ <tr>
+ <td>24</td>
+ <td>DWord</td>
+ <td><code>0x01000000</code> or greater</td>
+ <td>Section count. This field's value should be equal to 1 or greater.
+ Microsoft claims that this is a "reserved" field, but it seems to tell
+ how many sections (see below) are following in the stream. This would
+ really make sense because otherwise you could not know where and how
+ far you should read section data.</td>
+ </tr>
+ </table>
+ </section>
+
+
+
+ <section><title>Section List</title>
+
+ <p>Following the header is the section list. This is an array of pairs each
+ consisting of a section format ID and an offset. This array has as many
+ pairs of ClassID and and DWord fields as the section count field in the
+ header says. The Summary Information stream contains a single section, the
+ Document Summary Information stream contains two.</p>
+
+ <table>
+ <tr>
+ <th>Type</th>
+ <th>Contents</th>
+ <th>Remarks</th>
+ </tr>
+
+ <tr>
+ <td>ClassID</td>
+ <td>Section format ID</td>
+ <td><code>0xF29F85E04FF91068AB9108002B27B3D9</code> for the single section
+ in the Summary Information stream.<br/><br/>
+
+ <code>0xD5CDD5022E9C101B939708002B2CF9AE</code> for the first
+ section in the Document Summary Information stream.</td>
+ </tr>
+
+ <tr>
+ <td>DWord</td>
+ <td>Offset</td>
+ <td>The number of bytes between the beginning of the stream and the
+ beginning of the section within the stream.</td>
+ </tr>
+
+ <tr>
+ <td>ClassID</td>
+ <td>Section format ID</td>
+ <td>...</td>
+ </tr>
+
+ <tr>
+ <td>DWord</td>
+ <td>Offset</td>
+ <td>...</td>
+ </tr>
+
+ <tr>
+ <td>...</td>
+ <td>...</td>
+ <td>...</td>
+ </tr>
+ </table>
+ </section>
+
+
+
+ <section><title>Section</title>
+
+ <p>A section is divided into three parts: the section header (with the
+ section length and the number of properties in the section), the
+ properties list (with type and offset of each property), and the
+ properties themselves. Here are the details:</p>
+
+ <table>
+ <tr>
+ <th>&nbsp;</th>
+ <th>Type</th>
+ <th>Contents</th>
+ <th>Remarks</th>
+ </tr>
+
+ <tr>
+ <td>Section header</td>
+
+ <td>DWord</td>
+ <td>Length</td>
+ <td>The length of the section in bytes.</td>
+ </tr>
+
+ <tr>
+ <td></td>
+ <td>DWord</td>
+ <td>Property count</td>
+ <td>The number of properties in the section.</td>
+ </tr>
+
+ <tr>
+
+ <td>Properties list</td>
+
+ <td>DWord</td>
+ <td>Property ID</td>
+ <td>The property ID tells what the property means. For example, an ID of
+ <code>0x0002</code> in the Summary Information stands for the document's
+ title. See the <a href="#property_ids">Property IDs</a>
+ chapter below for more details.</td>
+ </tr>
+
+ <tr>
+ <td></td>
+ <td>DWord</td>
+ <td>Offset</td>
+ <td>The number of bytes between the beginning of the section and the
+ property.</td>
+ </tr>
+
+ <tr>
+ <td></td>
+ <td>...</td>
+ <td>...</td>
+ <td>...</td>
+ </tr>
+
+ <tr>
+ <td>Properties</td>
+
+ <td>DWord</td>
+ <td>Property type ("variant")</td>
+ <td>This is the property's data type, e.g. an integer value, a byte
+ string or a Unicode string. See the
+ <a href="#property_types"><em>Property Types</em></a> chapter
+ for details!</td>
+ </tr>
+
+ <tr>
+ <td></td>
+ <td><em>Field length depends on the property type
+ ("variant")</em></td>
+ <td>Property value</td>
+ <td>This field's length depends on the property's type. These are the
+ bytes that make out the DWord, the byte string or some other data of
+ fixed or variable length.<br/><br/>
+
+ The property value's length is always stored in an area which is a
+ multiple of 4 in length. If the property is shorter, e.g. a byte
+ string of 13 bytes, the remaining bytes are padded with <code>0x00</code>
+ bytes.</td>
+ </tr>
+
+ <tr>
+ <td></td>
+ <td>...</td>
+ <td>...</td>
+ <td>...</td>
+ </tr>
+ </table>
+ </section>
+
+
+
+ <section><title>Property IDs</title>
+ <anchor id="property_ids"/>
+
+ <p>As seen above, a section holds a property list: an array with property
+ IDs and offsets. The property ID gives each property a meaning. For
+ example, in the Summary Information stream the property ID 2 says that
+ this property is the document's title.</p>
+
+ <p>If you want to know a property ID's meaning, it is not sufficient to
+ know the ID itself. You must also know the
+ <strong>section format ID</strong>. For example, in the Document Summary
+ Information stream the property ID 2 means not the document's title but
+ its category. Due to Microsoft's infinite wisdom the section format ID is
+ not part of the section. Thus if you have only a section without the
+ stream it is in, you cannot make any sense of the properties because you
+ do not know what they mean.</p>
+
+ <p>So each section format ID has its own name space of property IDs.
+ Microsoft defined some "well-known" property IDs for the Summary
+ Information and the Document Summary Information streams. You can extend
+ them by your own additional IDs. This will be described below.</p>
+
+ <section><title>Property IDs in The Summary Information Stream</title>
+
+ <p>The Summary Information stream has a single section with a section
+ format ID of <code>0xF29F85E04FF91068AB9108002B27B3D9</code>. The following
+ table defines the meaning of its property IDs. Each row associates a
+ property ID with a <em>name</em> and an <em>ID string</em>. (The property
+ <em>type</em> is just for informational purposes given here. As we have
+ seen above, the type is always given along with the value.)</p>
+
+ <p>The property <em>name</em> is a readable string which could be
+ displayed to the user. However, this string is useful only for users who
+ understand English. The property name does not help with other
+ languages.</p>
+
+ <p>The property <em>ID string</em> is about the same but looks more
+ technically and is nothing a user should bother with. You could the ID
+ string and map it to an appropriate display string in a particular
+ language. Of course you could do that with the property ID as well and
+ with less overhead, but people (including software developers) tend to be
+ better in remembering symbolic constants than remembering numbers.</p>
+
+ <table>
+ <tr>
+ <th>Property ID</th>
+ <th>Property Name</th>
+ <th>Property ID String</th>
+ <th>Property Type</th>
+ </tr>
+ <tr>
+ <td>2</td>
+ <td>Title</td>
+ <td>PID_TITLE</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>3</td>
+ <td>Subject</td>
+ <td>PID_SUBJECT</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>4</td>
+ <td>Author</td>
+ <td>PID_AUTHOR</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>5</td>
+ <td>Keywords</td>
+ <td>PID_KEYWORDS</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>6</td>
+ <td>Comments</td>
+ <td>PID_COMMENTS</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>7</td>
+ <td>Template</td>
+ <td>PID_TEMPLATE</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>Last Saved By</td>
+ <td>PID_LASTAUTHOR</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>9</td>
+ <td>Revision Number</td>
+ <td>PID_REVNUMBER</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>10</td>
+ <td>Total Editing Time</td>
+ <td>PID_EDITTIME</td>
+ <td>VT_FILETIME</td>
+ </tr>
+ <tr>
+ <td>11</td>
+ <td>Last Printed</td>
+ <td>PID_LASTPRINTED</td>
+ <td>VT_FILETIME</td>
+ </tr>
+ <tr>
+ <td>12</td>
+ <td>Create Time/Date</td>
+ <td>PID_CREATE_DTM</td>
+ <td>VT_FILETIME</td>
+ </tr>
+ <tr>
+ <td>13</td>
+ <td>Last Saved Time/Date</td>
+ <td>PID_LASTSAVE_DTM</td>
+ <td>VT_FILETIME</td>
+ </tr>
+ <tr>
+ <td>14</td>
+ <td>Number of Pages</td>
+ <td>PID_PAGECOUNT</td>
+ <td>VT_I4</td>
+ </tr>
+ <tr>
+ <td>15</td>
+ <td>Number of Words</td>
+ <td>PID_WORDCOUNT</td>
+ <td>VT_I4</td>
+ </tr>
+ <tr>
+ <td>16</td>
+ <td>Number of Characters</td>
+ <td>PID_CHARCOUNT</td>
+ <td>VT_I4</td>
+ </tr>
+ <tr>
+ <td>17</td>
+ <td>Thumbnail</td>
+ <td>PID_THUMBNAIL</td>
+ <td>VT_CF</td>
+ </tr>
+ <tr>
+ <td>18</td>
+ <td>Name of Creating Application</td>
+ <td>PID_APPNAME</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>19</td>
+ <td>Security</td>
+ <td>PID_SECURITY</td>
+ <td>VT_I4</td>
+ </tr>
+ </table>
+ </section>
+
+
+
+ <section><title>Property IDs in The Document Summary Information Stream</title>
+
+ <p>The Document Summary Information stream has two sections with a section
+ format ID of <code>0xD5CDD5022E9C101B939708002B2CF9AE</code> for the first
+ one. The following table defines the meaning of the property IDs in the
+ first section. See the preceding section for interpreting the table.</p>
+
+ <table>
+ <tr>
+ <th>Property ID</th>
+ <th>Property name</th>
+ <th>Property ID string</th>
+ <th>VT type</th>
+ </tr>
+
+ <tr>
+ <td>0</td>
+ <td>Dictionary</td>
+ <td>PID_DICTIONARY</td>
+ <td>[Special format]</td>
+ </tr>
+ <tr>
+ <td>1</td>
+ <td>Code page</td>
+ <td>PID_CODEPAGE</td>
+ <td>VT_I2</td>
+ </tr>
+ <tr>
+ <td>2</td>
+ <td>Category</td>
+ <td>PID_CATEGORY</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>3</td>
+ <td>PresentationTarget</td>
+ <td>PID_PRESFORMAT</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>4</td>
+ <td>Bytes</td>
+ <td>PID_BYTECOUNT</td>
+ <td>VT_I4</td>
+ </tr>
+ <tr>
+ <td>5</td>
+ <td>Lines</td>
+ <td>PID_LINECOUNT</td>
+ <td>VT_I4</td>
+ </tr>
+ <tr>
+ <td>6</td>
+ <td>Paragraphs</td>
+ <td>PID_PARCOUNT</td>
+ <td>VT_I4</td>
+ </tr>
+ <tr>
+ <td>7</td>
+ <td>Slides</td>
+ <td>PID_SLIDECOUNT</td>
+ <td>VT_I4</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>Notes</td>
+ <td>PID_NOTECOUNT</td>
+ <td>VT_I4</td>
+ </tr>
+ <tr>
+ <td>9</td>
+ <td>HiddenSlides</td>
+ <td>PID_HIDDENCOUNT</td>
+ <td>VT_I4</td>
+ </tr>
+ <tr>
+ <td>10</td>
+ <td>MMClips</td>
+ <td>PID_MMCLIPCOUNT</td>
+ <td>VT_I4</td>
+ </tr>
+ <tr>
+ <td>11</td>
+ <td>ScaleCrop</td>
+ <td>PID_SCALE</td>
+ <td>VT_BOOL</td>
+ </tr>
+ <tr>
+ <td>12</td>
+ <td>HeadingPairs</td>
+ <td>PID_HEADINGPAIR</td>
+ <td>VT_VARIANT | VT_VECTOR</td>
+ </tr>
+ <tr>
+ <td>13</td>
+ <td>TitlesofParts</td>
+ <td>PID_DOCPARTS</td>
+ <td>VT_LPSTR | VT_VECTOR</td>
+ </tr>
+ <tr>
+ <td>14</td>
+ <td>Manager</td>
+ <td>PID_MANAGER</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>15</td>
+ <td>Company</td>
+ <td>PID_COMPANY</td>
+ <td>VT_LPSTR</td>
+ </tr>
+ <tr>
+ <td>16</td>
+ <td>LinksUpTo Date</td>
+ <td>PID_LINKSDIRTY</td>
+ <td>VT_BOOL</td>
+ </tr>
+ </table>
+ </section>
+ </section>
+
+
+
+ <section><title>Property Types</title>
+ <anchor id="property_types"/>
+
+ <p>A property consists of a DWord <em>type field</em> followed by the
+ property value. The property type is an integer value and tells how the
+ data byte following it are to be interpreted. In the Microsoft world it is
+ also known as the <em>variant</em>.</p>
+
+ <p>The <em>Usage</em> column says where a variant type may occur. Not all
+ of them are allowed in a property set but just those marked with a [P].
+ <strong>[V]</strong> - may appear in a VARIANT, <strong>[T]</strong> - may
+ appear in a TYPEDESC, <strong>[P]</strong> - may appear in an OLE property
+ set, <strong>[S]</strong> - may appear in a Safe Array.</p>
+
+ <table>
+ <tr>
+ <th>Variant ID</th>
+ <th>Variant Type</th>
+ <th>Usage</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <td>0</td>
+ <td>VT_EMPTY</td>
+ <td>[V] [P]</td>
+ <td>nothing</td>
+ </tr>
+ <tr>
+ <td>1</td>
+ <td>VT_NULL</td>
+ <td>[V] [P]</td>
+ <td>SQL style Null</td>
+ </tr>
+ <tr>
+ <td>2</td>
+ <td>VT_I2</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>2 byte signed int</td>
+ </tr>
+ <tr>
+ <td>3</td>
+ <td>VT_I4</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>4 byte signed int</td>
+ </tr>
+ <tr>
+ <td>4</td>
+ <td>VT_R4</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>4 byte real</td>
+ </tr>
+ <tr>
+ <td>5</td>
+ <td>VT_R8</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>8 byte real</td>
+ </tr>
+ <tr>
+ <td>6</td>
+ <td>VT_CY</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>currency</td>
+ </tr>
+ <tr>
+ <td>7</td>
+ <td>VT_DATE</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>date</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>VT_BSTR</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>OLE Automation string</td>
+ </tr>
+ <tr>
+ <td>9</td>
+ <td>VT_DISPATCH</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>IDispatch *</td>
+ </tr>
+ <tr>
+ <td>10</td>
+ <td>VT_ERROR</td>
+ <td>[V] [T] [S]</td>
+ <td>SCODE</td>
+ </tr>
+ <tr>
+ <td>11</td>
+ <td>VT_BOOL</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>True=-1, False=0</td>
+ </tr>
+ <tr>
+ <td>12</td>
+ <td>VT_VARIANT</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>VARIANT *</td>
+ </tr>
+ <tr>
+ <td>13</td>
+ <td>VT_UNKNOWN</td>
+ <td>[V] [T] [S]</td>
+ <td>IUnknown *</td>
+ </tr>
+ <tr>
+ <td>14</td>
+ <td>VT_DECIMAL</td>
+ <td>[V] [T] [S]</td>
+ <td>16 byte fixed point</td>
+ </tr>
+ <tr>
+ <td>16</td>
+ <td>VT_I1</td>
+ <td>[T]</td>
+ <td>signed char</td>
+ </tr>
+ <tr>
+ <td>17</td>
+ <td>VT_UI1</td>
+ <td>[V] [T] [P] [S]</td>
+ <td>unsigned char</td>
+ </tr>
+ <tr>
+ <td>18</td>
+ <td>VT_UI2</td>
+ <td>[T] [P]</td>
+ <td>unsigned short</td>
+ </tr>
+ <tr>
+ <td>19</td>
+ <td>VT_UI4</td>
+ <td>[T] [P]</td>
+ <td>unsigned short</td>
+ </tr>
+ <tr>
+ <td>20</td>
+ <td>VT_I8</td>
+ <td>[T] [P]</td>
+ <td>signed 64-bit int</td>
+ </tr>
+ <tr>
+ <td>21</td>
+ <td>VT_UI8</td>
+ <td>[T] [P]</td>
+ <td>unsigned 64-bit int</td>
+ </tr>
+ <tr>
+ <td>22</td>
+ <td>VT_INT</td>
+ <td>[T]</td>
+ <td>signed machine int</td>
+ </tr>
+ <tr>
+ <td>23</td>
+ <td>VT_UINT</td>
+ <td>[T]</td>
+ <td>unsigned machine int</td>
+ </tr>
+ <tr>
+ <td>24</td>
+ <td>VT_VOID</td>
+ <td>[T]</td>
+ <td>C style void</td>
+ </tr>
+ <tr>
+ <td>25</td>
+ <td>VT_HRESULT</td>
+ <td>[T]</td>
+ <td>Standard return type</td>
+ </tr>
+ <tr>
+ <td>26</td>
+ <td>VT_PTR</td>
+ <td>[T]</td>
+ <td>pointer type</td>
+ </tr>
+ <tr>
+ <td>27</td>
+ <td>VT_SAFEARRAY</td>
+ <td>[T]</td>
+ <td>(use VT_ARRAY in VARIANT)</td>
+ </tr>
+ <tr>
+ <td>28</td>
+ <td>VT_CARRAY</td>
+ <td>[T]</td>
+ <td>C style array</td>
+ </tr>
+ <tr>
+ <td>29</td>
+ <td>VT_USERDEFINED</td>
+ <td>[T]</td>
+ <td>user defined type</td>
+ </tr>
+ <tr>
+ <td>30</td>
+ <td>VT_LPSTR</td>
+ <td>[T] [P]</td>
+ <td>null terminated string</td>
+ </tr>
+ <tr>
+ <td>31</td>
+ <td>VT_LPWSTR</td>
+ <td>[T] [P]</td>
+ <td>wide null terminated string</td>
+ </tr>
+ <tr>
+ <td>64</td>
+ <td>VT_FILETIME</td>
+ <td>[P]</td>
+ <td>FILETIME</td>
+ </tr>
+ <tr>
+ <td>65</td>
+ <td>VT_BLOB</td>
+ <td>[P]</td>
+ <td>Length prefixed bytes</td>
+ </tr>
+ <tr>
+ <td>66</td>
+ <td>VT_STREAM</td>
+ <td>[P]</td>
+ <td>Name of the stream follows</td>
+ </tr>
+ <tr>
+ <td>67</td>
+ <td>VT_STORAGE</td>
+ <td>[P]</td>
+ <td>Name of the storage follows</td>
+ </tr>
+ <tr>
+ <td>68</td>
+ <td>VT_STREAMED_OBJECT</td>
+ <td>[P]</td>
+ <td>Stream contains an object</td>
+ </tr>
+ <tr>
+ <td>69</td>
+ <td>VT_STORED_OBJECT</td>
+ <td>[P]</td>
+ <td>Storage contains an object</td>
+ </tr>
+ <tr>
+ <td>70</td>
+ <td>VT_BLOB_OBJECT</td>
+ <td>[P]</td>
+ <td>Blob contains an object</td>
+ </tr>
+ <tr>
+ <td>71</td>
+ <td>VT_CF</td>
+ <td>[P]</td>
+ <td>Clipboard format</td>
+ </tr>
+ <tr>
+ <td>72</td>
+ <td>VT_CLSID</td>
+ <td>[P]</td>
+ <td>A Class ID</td>
+ </tr>
+ <tr>
+ <td>0x1000</td>
+ <td>VT_VECTOR</td>
+ <td>[P]</td>
+ <td>simple counted array</td>
+ </tr>
+ <tr>
+ <td>0x2000</td>
+ <td>VT_ARRAY</td>
+ <td>[V]</td>
+ <td>SAFEARRAY*</td>
+ </tr>
+ <tr>
+ <td>0x4000</td>
+ <td>VT_BYREF</td>
+ <td>[V]</td>
+ <td>void* for local use</td>
+ </tr>
+ <tr>
+ <td>0x8000</td>
+ <td>VT_RESERVED</td>
+ <td><br/></td>
+ <td><br/></td>
+ </tr>
+ <tr>
+ <td>0xFFFF</td>
+ <td>VT_ILLEGAL</td>
+ <td><br/></td>
+ <td><br/></td>
+ </tr>
+ <tr>
+ <td>0xFFF</td>
+ <td>VT_ILLEGALMASKED</td>
+ <td><br/></td>
+ <td><br/></td>
+ </tr>
+ <tr>
+ <td>0xFFF</td>
+ <td>VT_TYPEMASK</td>
+ <td><br/></td>
+ <td><br/></td>
+ </tr>
+ </table>
+ </section>
+
+
+
+ <section>
+ <title>The Dictionary</title>
+
+ <p>What a dictionary is good for is explained in the <a
+ href="how-to.html">HPSF HOW-TO</a>. This chapter explains how it is
+ organized internally.</p>
+
+ <p>The dictionary has a simple header consisting of a single UInt value. It
+ tells how many entries the dictionary comprises:</p>
+
+ <table>
+ <tr>
+ <th>Name</th>
+ <th>Data type</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <td>nrEntries</td>
+ <th>UInt</th>
+ <td>Number of dictionary entries</td>
+ </tr>
+ </table>
+
+ <p>The dictionary entries follow the header. Each one looks like this:</p>
+
+ <table>
+ <tr>
+ <th>Name</th>
+ <td>Data type</td>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <td>key</td>
+ <td>UInt</td>
+ <td>The unique number of this property, i.e. the PID</td>
+ </tr>
+ <tr>
+ <td>length</td>
+ <td>UInt</td>
+ <td>The length of the property name associated with the key</td>
+ </tr>
+ <tr>
+ <td>value</td>
+ <td>String</td>
+ <td>The property's name, terminated with a 0x00 character</td>
+ </tr>
+ </table>
+
+ <p>The entries are not aligned, i.e. each one follows its predecessor
+ without any gap or fill characters.</p>
+ </section>
+
+
+
+ <section><title>References</title>
+
+ <p>In order to assemble the HPSF description I used information publically
+ available on the Internet only. The references given below have been very
+ helpful. If you have any amendments or corrections, please let us know!
+ Thank you!</p>
+
+ <ol>
+
+ <li>In
+ <a href="https://www.kyler.com/pubs/ddj9894.html"><em>Understanding OLE
+ documents</em></a>, Ken Kyler gives an introduction to OLE2
+ documents and especially to property sets. He names the property names,
+ types, and IDs of the Summary Information and Document Summary
+ Information stream.</li>
+
+ <li>The <a href="https://www.dwam.net/docs/oleref/"><em>ActiveX
+ Programmer's Reference</em></a> at <a
+ href="https://www.dwam.net/docs/oleref/">https://www.dwam.net/docs/oleref/</a>
+ seems a little outdated, but that's what I have found.</li>
+
+ <li>An overview of the <code>VT_</code> types is in
+ <a href="https://www.marin.clara.net/COM/variant_type_definitions.htm"><em>Variant
+ Type Definitions</em></a>.</li>
+
+ <li>What is a <code>FILETIME</code>? The answer can be found
+ under <a
+ href="https://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/filetime_str.asp"></a>, <a href="https://www.vbapi.com/ref/f/filetime.html">https://www.vbapi.com/ref/f/filetime.html</a> or
+ <a href="https://www.cs.rpi.edu/courses/fall01/os/FILETIME.html">https://www.cs.rpi.edu/courses/fall01/os/FILETIME.html</a>.
+ In short: <em>The FILETIME structure holds a date and time associated
+ with a file. The structure identifies a 64-bit integer specifying the
+ number of 100-nanosecond intervals which have passed since January 1,
+ 1601. This 64-bit value is split into the two dwords stored in the
+ structure.</em></li>
+
+ <li>Microsoft provides some public information in the <a
+ href="https://msdn.microsoft.com/library/default.asp">MSDN
+ Library</a>. Use the search function to try to find what you are
+ looking for, e.g. "codepage" or "document summary information" etc.</li>
+ </ol>
+ </section>
+ </section>
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
diff --git a/src/documentation/content/xdocs/components/hpsf/thumbnails.xml b/src/documentation/content/xdocs/components/hpsf/thumbnails.xml
new file mode 100644
index 0000000000..3d109fc520
--- /dev/null
+++ b/src/documentation/content/xdocs/components/hpsf/thumbnails.xml
@@ -0,0 +1,198 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>HPSF THUMBNAIL HOW-TO</title>
+ <authors>
+ <person name="Drew Varner" email="Drew.Varner@-deleteThis-sc.edu" />
+ </authors>
+ </header>
+ <body>
+ <section><title>The VT_CF Format</title>
+
+ <p>Thumbnail information is stored as a VT_CF, or Thumbnail Variant. The
+ Thumbnail Variant is used to store various types of information in a
+ clipboard. The VT_CF can store information in formats for the Macintosh or
+ Windows clipboard.</p>
+
+ <p>There are many types of data that can be copied to the clipboard, but the
+ only types of information needed for thumbnail manipulation are the image
+ formats.</p>
+
+ <p>The <code>VT_CF</code> structure looks like this:</p>
+
+ <table>
+ <tr>
+ <th>Element:</th>
+ <td>Clipboard Size</td>
+ <td>Clipboard Format Tag</td>
+ <td>Clipboard Data</td>
+ </tr>
+ <tr>
+ <th>Size:</th>
+ <td>32 bit unsigned integer (DWord)</td>
+ <td>32 bit signed integer (DWord)</td>
+ <td>variable length (byte array)</td>
+ </tr>
+ </table>
+
+ <p>The Clipboard Size refers to the size (in bytes) of Clipboard Data
+ (variable size) plus the Clipboard Format (four bytes).</p>
+
+ <p>Clipboard Format Tag has four possible values:</p>
+
+ <table>
+ <tr>
+ <th>Value</th>
+ <th>Identifier</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <td><code>-1L</code></td>
+ <td><code>CFTAG_WINDOWS</code></td>
+ <td>a built-in Windows&copy; clipboard format value</td>
+ </tr>
+ <tr>
+ <td><code>-2L</code></td>
+ <td><code>CFTAG_MACINTOSH</code></td>
+ <td>a Macintosh clipboard format value</td>
+ </tr>
+ <tr>
+ <td><code>-3L</code></td>
+ <td><code>CFTAG_FMTID</code></td>
+ <td>a format identifier (FMTID) This is rarely used.</td>
+ </tr>
+ <tr>
+ <td><code>0L</code></td>
+ <td><code>CFTAG_NODATA</code></td>
+ <td>No data This is rarely used.</td>
+ </tr>
+ </table>
+ </section>
+
+
+
+ <section><title>Windows Clipboard Data</title>
+
+ <p>Windows clipboard data has four image formats for thumbnails:</p>
+
+ <table>
+ <tr>
+ <th>Value</th>
+ <th>Identifier</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <td>3</td>
+ <td><code>CF_METAFILEPICT</code></td>
+ <td>Windows metafile format - recommended</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td><code>CF_DIB</code></td>
+ <td>Device Independent Bitmap</td>
+ </tr>
+ <tr>
+ <td>14</td>
+ <td><code>CF_ENHMETAFILE</code></td>
+ <td>Enhanced Windows metafile format</td>
+ </tr>
+ <tr>
+ <td>2</td>
+ <td><code>CF_BITMAP</code></td>
+ <td>Bitmap - Obsolete - Use <code>CF_DIB</code> instead</td>
+ </tr>
+ </table>
+ </section>
+
+ <section><title>Windows Metafile Format</title>
+
+ <p>The most common format for thumbnails on the Windows platform is the
+ Windows metafile format. The Clipboard places and extra header in front of
+ a the standard Windows Metafile Format data.</p>
+
+ <p>The Clipboard Data byte array looks like this when an image is stored in
+ Windows' Clipboard WMF format.</p>
+
+ <table>
+ <tr>
+ <th>Identifier</th>
+ <td>CF_METAFILEPICT</td>
+ <td>mm</td>
+ <td>width</td>
+ <td>height</td>
+ <td>handle</td>
+ <td>WMF data</td>
+ </tr>
+ <tr>
+ <th>Size</th>
+ <td>32 bit unsigned int</td>
+ <td>16 bit unsigned(?) int</td>
+ <td>16 bit unsigned(?) int</td>
+ <td>16 bit unsigned(?) int</td>
+ <td>16 bit unsigned(?) int</td>
+ <td>byte array - variable length</td>
+ </tr>
+ <tr>
+ <th>Description</th>
+ <td>Clipboard WMF</td>
+ <td>Mapping Mode</td>
+ <td>Image Width</td>
+ <td>Image Height</td>
+ <td>handle to the WMF data array in memory, or 0</td>
+ <td>standard WMF byte stream</td>
+ </tr>
+ </table>
+ </section>
+
+
+ <section><title>Device Independent Bitmap</title>
+ <p><strong>FIXME:</strong> Describe the Device Independent Bitmap
+ format!</p>
+ </section>
+
+
+
+ <section><title>Macintosh Clipboard Data</title>
+ <p><strong>FIXME:</strong> Describe the Macintosh clipboard formats!</p>
+ </section>
+
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
diff --git a/src/documentation/content/xdocs/components/hpsf/todo.xml b/src/documentation/content/xdocs/components/hpsf/todo.xml
new file mode 100644
index 0000000000..d18e9b1e9a
--- /dev/null
+++ b/src/documentation/content/xdocs/components/hpsf/todo.xml
@@ -0,0 +1,77 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>To Do</title>
+ <authors>
+ <person name="Rainer Klute" email="klute@rainer-klute.de"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>To Do</title>
+
+ <p>The following functionalities should be added to HPFS:</p>
+
+ <ol>
+ <li>
+ Improve writing support! We need convenience classes and methods for
+ easily writing summary information streams and document summary
+ information streams.
+ </li>
+ <li>
+ Add resource bundles to
+ <code>org.apache.poi.hpsf.wellknown</code> to ease
+ localizations. This would be useful for mapping standard property IDs to
+ localized strings. Example: The property ID 4 could be mapped to "Author"
+ in English or "Verfasser" in German.
+ </li>
+ <li>
+ Implement reading functionality for those property types that are not
+ yet supported. HPSF should return proper Java types instead of just byte
+ arrays.
+ </li>
+ <li>
+ Add WMF to <code>java.awt.Image</code> example code in the <a
+ href="thumbnails.html">Thumbnail HOW-TO</a>.
+ </li>
+ </ol>
+ </section>
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
diff --git a/src/documentation/content/xdocs/components/hsmf/index.xml b/src/documentation/content/xdocs/components/hsmf/index.xml
new file mode 100644
index 0000000000..a678b0bd06
--- /dev/null
+++ b/src/documentation/content/xdocs/components/hsmf/index.xml
@@ -0,0 +1,65 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-HSMF - Java API To Access Microsoft Outlook MSG Files</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nick Burch" email="nick at apache dot org"/>
+ <person name="Travis Ferguson" email="uniformstupidity at gmail dot com"/>
+ </authors>
+ </header>
+
+ <body>
+ <section>
+ <title>Overview</title>
+
+ <p>HSMF is the POI Project's pure Java implementation of the Outlook MSG format.</p>
+ <p>At this time, it provides low-level read access to all of the file, along
+ with a user-facing way to get at the common textual content of MSG files.
+ to all</p>
+ <p>There is an example MSG textual renderer, which shows how to access the
+ common parts such as sender, subject, message body and examples. This is
+ in the
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hsmf/">HSMF examples area</a>
+ of SVN. You may also wish to look at the unit tests for more use guides.</p>
+
+ <note>
+ This code currently lives the
+ <a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/src/main/java">scratchpad area</a>
+ of the POI Git repository. To use this component, ensure
+ you have the Scratchpad Jar on your classpath, or a dependency
+ defined on the <em>poi-scratchpad</em> artifact - the main POI
+ jar is not enough! See the
+ <a href="site:components">POI Components Map</a>
+ for more details.
+ </note>
+ <note>
+ This code is subject to change between versions, and being
+ "scratchpad", doesn't maintain the usual Apache POI backwards
+ compatibility guarantees. In particular, the way that property
+ values are fetched is expected to change soon, as part of the
+ work to improve fixed-length property support.
+ </note>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/index.xml b/src/documentation/content/xdocs/components/index.xml
new file mode 100644
index 0000000000..7264361237
--- /dev/null
+++ b/src/documentation/content/xdocs/components/index.xml
@@ -0,0 +1,423 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Apache POI™ - Component Overview</title>
+ <authors>
+ <person id="AO" name="Andrew C. Oliver" email="acoliver@apache.org"/>
+ <person id="RK" name="Rainer Klute" email="klute@apache.org"/>
+ <person id="DF" name="David Fisher" email="dfisher@jmlafferty.com"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>Apache POI Project Components</title>
+ <p>The Apache POI project is the master project for developing pure
+ Java ports of file formats based on Microsoft's OLE 2 Compound
+ Document Format. OLE 2 Compound Document Format is used by
+ Microsoft Office Documents, as well as by programs using MFC
+ property sets to serialize their document objects.
+ </p>
+ <p>Apache POI is also the master project for developing pure
+ Java ports of file formats based on Office Open XML (ooxml).
+ OOXML is part of an ECMA / ISO standardisation effort. This
+ documentation is quite large, but you can normally find the bit you
+ need without too much effort!
+ <a href="https://ecma-international.org/publications-and-standards/standards/ecma-376/">ECMA-376 standard is here</a>,
+ and is also under the
+ <a href="https://msdn.microsoft.com/en-us/openspecifications/default">Microsoft OSP</a>.
+ </p>
+
+
+ <section><title>POIFS for OLE 2 Documents</title>
+ <p>
+ POIFS is the oldest and most stable part of POI. It is our port of the OLE 2 Compound Document Format to
+ pure Java. It supports both read and write functionality. All of our components for the binary (non-XML)
+ Microsoft Office formats ultimately rely on it by
+ definition. Please see <a href="./poifs/index.html">the POIFS project page</a> for more information.
+ </p>
+ </section>
+ <section><title>HSSF and XSSF for Excel Documents</title>
+ <p>
+ HSSF is our port of the Microsoft Excel 97 (-2003) file format (BIFF8) to pure
+ Java. XSSF is our port of the Microsoft Excel XML (2007+) file format (OOXML) to
+ pure Java. SS is a package that provides common support for both formats with a common API.
+ They both support read and write capability. Please see
+ <a href="site:spreadsheet">the HSSF+XSSF project page</a> for more
+ information.
+ </p>
+ </section>
+ <section><title>HWPF and XWPF for Word Documents</title>
+ <p>
+ HWPF is our port of the Microsoft Word 97 (-2003) file format to pure
+ Java. It supports read, and limited write capabilities. It also provides
+ simple text extraction support for the older Word 6 and Word 95 formats.
+ Please see <a href="site:document">the HWPF project page for more
+ information</a>. This component remains in early stages of
+ development. It can already read and write simple files.
+ </p>
+ <p>
+ We are also working on the XWPF for the WordprocessingML (2007+) format from the
+ OOXML specification. This provides read and write support for simpler
+ files, along with text extraction capabilities.
+ </p>
+ </section>
+ <section><title>HSLF and XSLF for PowerPoint Documents</title>
+ <p>
+ HSLF is our port of the Microsoft PowerPoint 97(-2003) file format to pure
+ Java. It supports read and write capabilities. Please see <a
+ href="site:slideshow">the HSLF project page for more
+ information</a>.
+ </p>
+ <p>
+ We are also working on the XSLF for the PresentationML (2007+) format from the
+ OOXML specification.
+ </p>
+ </section>
+ <section><title>HPSF for OLE 2 Document Properties</title>
+ <p>
+ HPSF is our port of the OLE 2 property set format to pure
+ Java. Property sets are mostly use to store a document's properties
+ (title, author, date of last modification etc.), but they can be used
+ for application-specific purposes as well.
+ </p>
+ <p>
+ HPSF supports both reading and writing of properties.
+ </p>
+ <p>
+ Please see <a href="./hpsf/index.html">the HPSF project
+ page</a> for more information.
+ </p>
+ </section>
+ <section><title>HDGF and XDGF for Visio Documents</title>
+ <p>
+ HDGF is our port of the Microsoft Visio 97(-2003) file format to pure
+ Java. It currently only supports reading at a very low level, and
+ simple text extraction. Please see <a
+ href="./diagram/index.html">the HDGF / Diagram project page for more
+ information</a>.
+ </p>
+ <p>
+ XDGF is our port of the Microsoft Visio XML (.vsdx) file format to pure
+ Java. It has slightly more support than HDGF. Please see <a
+ href="./diagram/index.html">the XDGF / Diagram project page for more
+ information</a>.
+ </p>
+ </section>
+ <section><title>HPBF for Publisher Documents</title>
+ <p>
+ HPBF is our port of the Microsoft Publisher 98(-2007) file format to pure
+ Java. It currently only supports reading at a low level for around
+ half of the file parts, and simple text extraction. Please see <a
+ href="./hpbf/index.html">the HPBF project page for more
+ information</a>.
+ </p>
+ </section>
+ <section><title>HMEF for TNEF (winmail.dat) Outlook Attachments</title>
+ <p>
+ HMEF is our port of the Microsoft TNEF (Transport Neutral Encoding
+ Format) file format to pure Java. TNEF is sometimes used by Outlook
+ for encoding the message, and will typically come through as
+ winmail.dat. HMEF currently only supports reading at a low level, but
+ we hope to add text and attachment extraction. Please see <a
+ href="./hmef/index.html">the HMEF project page for more
+ information</a>.
+ </p>
+ </section>
+ <section><title>HSMF for Outlook Messages</title>
+ <p>
+ HSMF is our port of the Microsoft Outlook message file format to pure
+ Java. It currently only some of the textual content of MSG files, and
+ some attachments. Further support and documentation is coming in slowly.
+ For now, users are advised to consult the unit tests for example use.
+ Please see <a href="./hsmf/index.html">the HSMF project page for more
+ information</a>.
+ </p>
+ <p>
+ Microsoft has recently added the Outlook file format to its OSP. More information
+ is now available making implementing this API an easier task.
+ </p>
+ </section>
+ </section>
+ <section id="components"><title>Component Map</title>
+ <p>
+ The Apache POI distribution consists of support for many document file formats. This support is provided
+ in several Jar files. Not all of the Jars are needed for every format. The following tables
+ show the relationships between POI components, Maven repository tags, and the project's Jar files.
+ </p>
+ <table>
+ <tr>
+ <th>Component</th>
+ <th>Application type</th>
+ <th>Maven artifactId</th>
+ <th>Notes</th>
+ </tr>
+ <tr>
+ <td><a href="./poifs/index.html">POIFS</a></td>
+ <td>OLE2 Filesystem</td>
+ <td><em>poi</em></td>
+ <td>Required to work with OLE2 / POIFS based files</td>
+ </tr>
+ <tr>
+ <td><a href="./hpsf/index.html">HPSF</a></td>
+ <td>OLE2 Property Sets</td>
+ <td><em>poi</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td><a href="site:spreadsheet">HSSF</a></td>
+ <td>Excel XLS</td>
+ <td><em>poi</em></td>
+ <td>For HSSF only, if common SS is needed see below</td>
+ </tr>
+ <tr>
+ <td><a href="site:slideshow">HSLF</a></td>
+ <td>PowerPoint PPT</td>
+ <td><em>poi-scratchpad</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td><a href="site:document">HWPF</a></td>
+ <td>Word DOC</td>
+ <td><em>poi-scratchpad</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td><a href="./diagram/index.html">HDGF</a></td>
+ <td>Visio VSD</td>
+ <td><em>poi-scratchpad</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td><a href="./hpbf/index.html">HPBF</a></td>
+ <td>Publisher PUB</td>
+ <td><em>poi-scratchpad</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td><a href="./hsmf/index.html">HSMF</a></td>
+ <td>Outlook MSG</td>
+ <td><em>poi-scratchpad</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td>DDF</td>
+ <td>Escher common drawings</td>
+ <td><em>poi</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td>HWMF</td>
+ <td>WMF drawings</td>
+ <td><em>poi-scratchpad</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td><a href="./oxml4j/index.html">OpenXML4J</a></td>
+ <td>OOXML</td>
+ <td><em>poi-ooxml</em> plus either <em>poi-ooxml-lite</em> or<br/>
+ <em>poi-ooxml-full</em></td>
+ <td>See notes below for differences between these options</td>
+ </tr>
+ <tr>
+ <td><a href="site:spreadsheet">XSSF</a></td>
+ <td>Excel XLSX</td>
+ <td><em>poi-ooxml</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td><a href="site:slideshow">XSLF</a></td>
+ <td>PowerPoint PPTX</td>
+ <td><em>poi-ooxml</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td><a href="site:document">XWPF</a></td>
+ <td>Word DOCX</td>
+ <td><em>poi-ooxml</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td><a href="./diagram/index.html">XDGF</a></td>
+ <td>Visio VSDX</td>
+ <td><em>poi-ooxml</em></td>
+ <td>&nbsp;</td>
+ </tr>
+ <tr>
+ <td><a href="./slideshow/index.html">Common SL</a></td>
+ <td>PowerPoint PPT and PPTX</td>
+ <td><em>poi-scratchpad</em> and <em>poi-ooxml</em></td>
+ <td>SL code is in the core POI jar, but implementations are in poi-scratchpad
+ and poi-ooxml.</td>
+ </tr>
+ <tr>
+ <td><a href="site:spreadsheet">Common SS</a></td>
+ <td>Excel XLS and XLSX</td>
+ <td><em>poi-ooxml</em></td>
+ <td>WorkbookFactory and friends all require poi-ooxml, not just core poi</td>
+ </tr>
+ </table>
+
+ <p><br /></p>
+
+ <p>
+ This table maps artifacts into the jar file name. "version-yyyymmdd" is
+ the POI version stamp. You can see what the latest stamp is on the
+ <a href="site:download">downloads page</a>.
+ </p>
+ <table>
+ <tr>
+ <th>Maven artifactId</th>
+ <th>Prerequisites</th>
+ <th>JAR</th>
+ </tr>
+ <tr>
+ <td>poi</td>
+ <td><a href="https://search.maven.org/#artifactdetails|org.apache.logging.log4j|log4j-api|2.24.3|jar">log4j 2.x</a>,
+ <a href="https://search.maven.org/#artifactdetails|commons-codec|commons-codec|1.17.1|jar">commons-codec</a>,
+ <a href="https://search.maven.org/#artifactdetails|org.apache.commons|commons-collections4|4.4|jar">commons-collections</a>,
+ <a href="https://search.maven.org/#artifactdetails|org.apache.commons|commons-math3|3.6.1|jar">commons-math3</a>
+ <a href="https://search.maven.org/#artifactdetails|commons-io|commons-io|2.20.0|jar">commons-io</a>
+ </td>
+ <td>poi-version-yyyymmdd.jar</td>
+ </tr>
+ <tr>
+ <td>poi-scratchpad</td>
+ <td><a href="https://search.maven.org/#search|gav|1|g:org.apache.poi AND a:poi">poi</a></td>
+ <td>poi-scratchpad-version-yyyymmdd.jar</td>
+ </tr>
+ <tr>
+ <td>poi-ooxml</td>
+ <td><a href="https://search.maven.org/#search|gav|1|g:org.apache.poi AND a:poi">poi</a>,
+ <a href="https://search.maven.org/#search|gav|1|g:org.apache.poi AND a:poi-ooxml-lite">poi-ooxml-lite</a>,
+ <a href="https://search.maven.org/#artifactdetails|org.apache.commons|commons-compress|1.23.0|jar">commons-compress</a>,
+ <a href="https://search.maven.org/#artifactdetails|com.zaxxer|SparseBitSet|1.2|jar">SparseBitSet</a><br/>
+ For SVG support:
+ <a href="https://search.maven.org/#search|gav|1|g:org.apache.xmlgraphics AND a:batik-all">batik-all</a>,
+ <a href="https://search.maven.org/#search|gav|1|g:xml-apis AND a:xml-apis-ext">xml-apis-ext</a>,
+ <a href="https://search.maven.org/#search|gav|1|g:org.apache.xmlgraphics AND a:xmlgraphics-commons">xmlgraphics-commons</a><br/>
+ For PDF support:
+ <a href="https://search.maven.org/#search|gav|1|g:org.apache.pdfbox AND a:pdfbox">pdfbox</a>,
+ <a href="https://search.maven.org/#search|gav|1|g:org.apache.pdfbox AND a:fontbox">fontbox</a>,
+ <a href="https://search.maven.org/#search|gav|1|g:de.rototor.pdfbox AND a:graphics2d">rototor graphics2d</a>
+ </td>
+ <td>poi-ooxml-version-yyyymmdd.jar</td>
+ </tr>
+ <tr>
+ <td>poi-ooxml-lite</td>
+ <td><a href="https://search.maven.org/#artifactdetails|org.apache.xmlbeans|xmlbeans|5.3.0|jar">xmlbeans</a></td>
+ <td>poi-ooxml-lite-version-yyyymmdd.jar</td>
+ </tr>
+ <tr>
+ <td>poi-examples</td>
+ <td><a href="https://search.maven.org/#search|gav|1|g:org.apache.poi AND a:poi">poi</a>,
+ <a href="https://search.maven.org/#search|gav|1|g:org.apache.poi AND a:poi-scratchpad">poi-scratchpad</a>,
+ <a href="https://search.maven.org/#search|gav|1|g:org.apache.poi AND a:poi-ooxml">poi-ooxml</a>
+ </td>
+ <td>poi-examples-version-yyyymmdd.jar</td>
+ </tr>
+ <tr>
+ <td>poi-ooxml-full (known as ooxml-schemas)</td>
+ <td><a href="https://search.maven.org/#artifactdetails|org.apache.xmlbeans|xmlbeans|5.3.0|jar">xmlbeans</a><br/>
+ For signing:
+ <a href="https://search.maven.org/#artifactdetails|org.bouncycastle|bcpkix-jdk18on|1.81|jar">bcpkix-jdk18on</a>,
+ <a href="https://search.maven.org/#artifactdetails|org.bouncycastle|bcutil-jdk18on|1.81|jar">bcprov-jdk18on</a>,
+ <a href="https://search.maven.org/#artifactdetails|org.apache.santuario|xmlsec|3.0.6|bundle">xmlsec</a>,
+ <a href="https://search.maven.org/#artifactdetails|org.slf4j|slf4j-api|2.0.17|jar">slf4j-api</a>
+ </td>
+ <td>poi-ooxml-full-version-yyyymmdd.jar</td>
+ </tr>
+ </table>
+
+ <p>&nbsp;</p>
+ <note>
+ Apache commons-math3 and commons-compress were added as a dependency in POI 4.0.0.<br/>
+ Zaxxer SparseBitSet was added as a dependency in POI 4.1.2<br/>
+ Apache commons-io was added as a dependency in POI 5.1.0
+ </note>
+ <p>
+ poi-ooxml requires poi-ooxml-lite. This is a substantially smaller
+ version of the poi-ooxml-full jar (ooxml-schemas-1.4.jar for POI 4.0.0,
+ ooxml-schemas-1.3.jar for POI 3.14 or to POI 3.17,
+ ooxml-schemas-1.1.jar for POI 3.7 up to POI 3.13, ooxml-schemas-1.0.jar
+ for POI 3.5 and 3.6).
+ The larger poi-ooxml-full (formerly, ooxml-schemas) jar is <a href="../help/index.html#faq-N10025">normally</a>
+ only required for features that are not fully implemented in poi-ooxml.
+ There used to also be an ooxml-security jar, which contained
+ all of the classes relating to encryption and signing. POI 5 no longer needs this jar.
+ The equivalent classes are now in poi-ooxml-full and poi-ooxml-lite.
+ This JAR was ooxml-security-1.1.jar for POI 3.14 and POI 4. ooxml-security-1.0.jar
+ was used prior to that.
+ </p>
+ <p>
+ The OOXML jars require a stax implementation, but now that Apache
+ POI requires Java 8, that dependency is provided by the JRE and no additional
+ stax jars are required. The OOXML jars used to require DOM4J, but
+ the code has now been changed to use JAXP and no additional dom4j
+ jars are required. By the way, look at this <a href="../help/index.html#faq-N1017E">FAQ</a>
+ if you have problems when using a non-Oracle JDK.
+ </p>
+ <p>
+ The ooxml schemas jars are compiled with <a href="https://xmlbeans.apache.org/">Apache XMLBeans</a>.
+ It is recommended that you use the XMLBeans version that was used to build the POI OOXML schemas.
+ It may be possible to use newer XMLBeans jars but there are no guarantees, especially if the XMLBeans version
+ numbers differ a lot.
+ </p>
+ </section>
+ <section><title>Examples</title>
+ <p>
+ Small sample programs using the POI API are available in the
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples">src/examples</a>
+ (<a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples">viewvc</a>) directory of the source distribution.
+ </p>
+ <p>
+ All of the examples are included in POI distributions as a poi-examples artifact.
+ </p>
+ </section>
+ <section><title>Running POI on other JVM languages</title>
+ <p>
+ POI can be run on most languages that run on the JVM. For code examples,
+ see <a href="poi-jvm-languages.html">Running POI on other JVM languages</a>
+ </p>
+ </section>
+ <section><title>Contributed Software</title>
+ <p>
+ Besides the "official" components outlined above there is some further
+ software distributed with POI. This is called "contributed" software. It
+ is not explicitly recommended or even maintained by the POI team, but
+ it might still be useful to you.
+ </p>
+ <p>
+ See <a href="poi-ruby.html">POI Ruby Bindings</a> and other code in the
+ <a
+ href="https://github.com/apache/poi/tree/trunk/src/contrib/">poi-contrib module</a>
+ </p>
+ </section>
+ </body>
+ <footer>
+ <legal>
+ Copyright (c) @year@ The Apache Software Foundation. All rights reserved.
+ <br />
+ Apache POI, POI, Apache, the Apache feather logo, and the Apache
+ POI project logo are trademarks of The Apache Software Foundation.
+ </legal>
+ </footer>
+</document>
diff --git a/src/documentation/content/xdocs/components/logging.xml b/src/documentation/content/xdocs/components/logging.xml
new file mode 100644
index 0000000000..10518acca8
--- /dev/null
+++ b/src/documentation/content/xdocs/components/logging.xml
@@ -0,0 +1,290 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Apache POI™ - Logging Framework</title>
+ <authors>
+ <person id="DS" name="Dominik Stadler" email="centic@apache.org"/>
+ <person id="MV" name="Marius Volkhart" email="mariusvolkhart@apache.org"/>
+ </authors>
+ </header>
+
+ <body>
+ <section>
+ <title>Introduction</title>
+ <p>
+ Logging in POI is used primarily as a debugging mechanism, not a normal runtime
+ logging system. Logging at levels noisier than WARN is ONLY for autopsy type debugging, and should
+ NEVER be enabled on a production system.
+ </p>
+ </section>
+ <section>
+ <title>POI 5.1.0 and above</title>
+ <p>
+ Since version 5.1.0 Apache POI uses <a href="https://logging.apache.org/log4j/2.x/">Apache Log4j v2</a> directly.
+ </p>
+ <p>
+ Apache POI only depends on log4j-api and allows choosing which logging framework to use. log4j-core is
+ just one of many options.
+ If you want to continue to use another SLF4J compatible logging framework, you can deploy the
+ <a href="https://logging.apache.org/log4j/log4j-2.2/log4j-to-slf4j/index.html">log4j-to-slf4j</a> jar to
+ facilitate this.
+ </p>
+ <p>
+ POI tries to name loggers after the canonical name of the containing class. For example,
+ <code>org.apache.poi.poifs.filesystem.POIFSFileSystem</code>. Use your logging framework's typical
+ mechanisms for activating and deactivating logging for specific loggers.
+ </p>
+ <p>
+ All loggers are named <code>com.apache.poi.*</code>, so rules applied to <code>com.apache.poi</code>
+ will affect all POI loggers.
+ </p>
+ </section>
+ <section>
+ <title>Logging with Log4j 2 Core</title>
+ <p>
+ Capturing POI logs using Log4j 2 Core is as simple as including the
+ <a href="https://logging.apache.org/log4j/2.x/maven-artifacts.html"><code>log4j-core</code></a> JAR in
+ your project. POI also has dependencies on libraries that make use of the SLF4J and Apache Commons
+ Logging APIs. Gather logs from these dependencies by adding the
+ <a href="https://logging.apache.org/log4j/2.x/log4j-jcl/index.html">Commons Logging Bridge</a> and the
+ the <a href="https://logging.apache.org/log4j/2.x/log4j-slf4j-impl/index.html">SLF4J Binding</a> to your
+ project.
+ </p>
+ <p>
+ The simplest configuration is to capture all POI logs at the same level as your application. You might
+ want to collect all messages <code>INFO</code> and higher, and are OK with capturing POI messages as well.
+ </p>
+ <source>
+ &lt;Configuration ...&gt;
+ &lt;Loggers&gt;
+ &lt;Root level="INFO"&gt;
+ ...
+ &lt;/Root&gt;
+ &lt;/Loggers&gt;
+ &lt;/Configuration&gt;
+ </source>
+
+ <p>
+ A more recommended configuration is to capture only messages from loggers you opt in to. For example,
+ you might want to capture all messages from <code>com.example.myapplication</code> at <code>INFO</code>
+ but only POI messages at <code>WARN</code> or more severe.
+ </p>
+ <source>
+ &lt;Configuration ...&gt;
+ &lt;Loggers&gt;
+ &lt;Logger name="com.example.myapplication" level="INFO" /&gt;
+ &lt;Logger name="org.apache.poi" level="WARN" /&gt;
+
+ &lt;Root level="OFF"&gt;
+ ...
+ &lt;/Root&gt;
+ &lt;/Loggers&gt;
+ &lt;/Configuration&gt;
+ </source>
+
+ <p>Another strategy you may decide to use is to capture all messages except those coming from POI.</p>
+ <source>
+ &lt;Configuration ...&gt;
+ &lt;Loggers&gt;
+ &lt;Logger name="org.apache.poi" level="OFF" /&gt;
+
+ &lt;Root level="INFO"&gt;
+ ...
+ &lt;/Root&gt;
+ &lt;/Loggers&gt;
+ &lt;/Configuration&gt;
+ </source>
+ </section>
+ <section>
+ <title>Log4J SimpleLogger</title>
+ <p>
+ If your main aim is just to get rid of the scary logging log message from Log4J that says
+ 'ERROR StatusLogger Log4j2 could not find a logging implementation.', then one option is to
+ enable the SimpleLogger using a system property.
+ </p>
+ <p>
+ -Dlog4j2.loggerContextFactory=org.apache.logging.log4j.simple.SimpleLoggerContextFactory
+ </p>
+ </section>
+ <section>
+ <title>Logging with SLF4J</title>
+ <p>
+ If you want to continue to use another SLF4J compatible logging framework, you can deploy the
+ <a href="https://logging.apache.org/log4j/log4j-2.2/log4j-to-slf4j/index.html">log4j-to-slf4j</a> jar
+ and the intended slf4j-bridges to facilitate this.
+ </p>
+ <p>
+ See <a href="https://www.slf4j.org/">https://www.slf4j.org/</a> for more details about using SLF4J.
+ </p>
+ </section>
+ <section>
+ <title>Logging with Logback</title>
+ <p>
+ Capturing POI logs using Logback requires adding the
+ <a href="https://logging.apache.org/log4j/2.x/log4j-to-slf4j/index.html">Log4j to SLF4J Adapter</a> to
+ your project, along with the standard Logback dependencies. POI also has dependencies on libraries that
+ make use of the SLF4J and Apache Commons Logging APIs. Gather logs from these dependencies by adding the
+ <a href="https://www.slf4j.org/legacy.html#jcl-over-slf4j">Commons Logging Bridge</a> to your project.
+ </p>
+
+ <p>
+ The simplest configuration is to capture all POI logs at the same level as your application. You might
+ want to collect all messages <code>INFO</code> and higher, and are OK with capturing POI messages as well.
+ </p>
+ <source>
+ &lt;configuration ...&gt;
+ &lt;root level="INFO"&gt;
+ ...
+ &lt;/root&gt;
+ &lt;/configuration&gt;
+ </source>
+
+ <p>
+ A more recommended configuration is to capture only messages from loggers you opt in to. For example,
+ you might want to capture all messages from <code>com.example.myapplication</code> at <code>INFO</code>
+ but only POI messages at <code>WARN</code> or more severe.
+ </p>
+ <source>
+ &lt;configuration ...&gt;
+ &lt;logger name="com.example.myapplication" level="INFO" /&gt;
+ &lt;logger name="org.apache.poi" level="WARN" /&gt;
+
+ &lt;root level="OFF"&gt;
+ ...
+ &lt;/root&gt;
+ &lt;/configuration&gt;
+ </source>
+
+ <p>Another strategy you may decide to use is to capture all messages except those coming from POI.</p>
+ <source>
+ &lt;configuration ...&gt;
+ &lt;logger name="org.apache.poi" level="OFF" /&gt;
+
+ &lt;root level="INFO"&gt;
+ ...
+ &lt;/root&gt;
+ &lt;/configuration&gt;
+ </source>
+ </section>
+ <section>
+ <title>POI 5.0.0</title>
+ <p>
+ POI 5.0.0 switched to using <a href="https://www.slf4j.org/">SLF4J</a> for logging. If you want
+ to enable logging, please read up on the various SLF4J compatible logging frameworks.
+ <a href="https://logging.apache.org/log4j/2.x/">Apache Log4j v2</a> is a good choice.
+ <a href="https://logback.qos.ch/">Logback</a> is also widely used.
+ </p>
+ </section>
+ <section>
+ <title>Legacy POI Logging Framework (no longer supported in POI 5.0.0 and above)</title>
+ <p>
+ Prior to POI 5.0.0, POI used a custom logging framework which allows to configure where logs are sent to.
+ </p>
+ <p>
+ Logging in POI 3 and 4 is used only as a debugging mechanism, not as a normal runtime
+ logging system. Logging at level debug/info is ONLY for debugging, and should
+ NEVER be enabled on a production system.
+ </p>
+ <p>
+ The framework is extensible so that you can send log messages to any logging framework
+ that your application uses.
+ </p>
+ <p>
+ A number of default logging implementations are supported by POI out-of-the-box and can be selected via a
+ system property.
+ </p>
+ </section>
+ <section><title>POI 4.x and before: Enable Legacy POI Logging Framework</title>
+ <p>
+ By default, logging is disabled in POI 3 and 4. Sometimes, it might be useful
+ to enable logging to see some debug messages printed out which can
+ help in analyzing problems.
+ </p>
+ <p>
+ You can select the logging framework by setting the system property <em>org.apache.poi.util.POILogger</em> during application startup or by calling System.setProperty():
+ </p>
+ <source>
+ System.setProperty("org.apache.poi.util.POILogger", "org.apache.poi.util.CommonsLogger" );
+ </source>
+ <p>
+ Note: You need to call <em>setProperty()</em> before any POI functionality is invoked as the logger is only initialized during startup.
+ </p>
+ </section>
+ <section><title>POI 4.x and before: Available Legacy POI Logging Framework implementations</title>
+ <p>
+ The following logger implementations are provided by POI 3 and 4:
+ </p>
+ <table>
+ <tr>
+ <th>Class</th>
+ <th>Type</th>
+ </tr>
+ <tr>
+ <td>org.apache.poi.util.SystemOutLogger</td>
+ <td>Sends log output to the system console</td>
+ </tr>
+ <tr>
+ <td>org.apache.poi.util.NullLogger</td>
+ <td>Default logger, does not log anything</td>
+ </tr>
+ <tr>
+ <td>org.apache.poi.util.CommonsLogger</td>
+ <td>Allows to use <a href="https://commons.apache.org/proper/commons-logging/">Apache Commons Logging</a> for logging. This can use JDK1.4 logging,
+ log4j, logkit, etc. The log4j dependency was removed in POI 5.0.0, so you will need to include this dependency yourself if you need it.</td>
+ </tr>
+ <tr>
+ <td>org.apache.poi.util.DummyPOILogger</td>
+ <td>Simple logger which will keep all log-lines in memory for later analysis (this class is not in the jar, just in the test source).
+ Used primarily for testing. Note: this may cause a memory leak if used in production application!</td>
+ </tr>
+ </table>
+ </section>
+ <section><title>POI 4.x and before: Sending logs to a different log framework</title>
+ <p>
+ You can send logs to other logging frameworks by implementing the interface <em>org.apache.poi.util.POILogger</em>.
+ </p>
+ </section>
+ <section><title>POI 4.x and before: Implementation details</title>
+ <p>
+ Every class uses a <code>POILogger</code> to log, and gets it using a static method
+ of the <code>POILogFactory</code> .
+ </p>
+ <p>
+ Each class in POI can log using a <code>POILogger</code>, which is an abstract class.
+ We decided to make our own logging facade because:</p>
+ <ol>
+ <li>we need to log many values and we put many methods in this class to facilitate the
+ programmer, without having him write string concatenations;</li>
+ <li>we need to be able to use POI without any logger package present.</li>
+ </ol>
+ </section>
+ </body>
+ <footer>
+ <legal>
+ Copyright (c) @year@ The Apache Software Foundation All rights reserved.
+ <br />
+ Apache POI, POI, Apache, the Apache feather logo, and the Apache
+ POI project logo are trademarks of The Apache Software Foundation.
+ </legal>
+ </footer>
+</document>
diff --git a/src/documentation/content/xdocs/components/oxml4j/index.xml b/src/documentation/content/xdocs/components/oxml4j/index.xml
new file mode 100644
index 0000000000..06cf00d20a
--- /dev/null
+++ b/src/documentation/content/xdocs/components/oxml4j/index.xml
@@ -0,0 +1,45 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-OpenXML4J - Java API To Access Office Open XML documents</title>
+ <subtitle>Overview</subtitle>
+ </header>
+
+ <body>
+ <section>
+ <title>Overview</title>
+ <p>OpenXML4J is the POI Project's pure Java implementation of the Open Packaging Conventions (OPC) defined in
+ <a href="https://ecma-international.org/publications-and-standards/standards/ecma-376/">ECMA-376</a>.</p>
+ <p>Every OpenXML file comprises a collection of byte streams called parts, combined into a container called a package.
+ POI OpenXML4J provides a physical implementation of the OPC that uses the Zip file format.</p>
+ </section>
+ <section>
+ <title>History</title>
+ <p>OpenXML4J was originally developed by
+ <a href="https://web.archive.org/web/20090611063015/https://www.openxml4j.org/">openxml4j.org</a>,
+ and was contributed to Apache POI in 2008. The original code is available at
+ <a href="https://sourceforge.net/projects/openxml4j/">https://sourceforge.net/projects/openxml4j/</a>.
+ Thanks to the support and guidance of Julien Chable</p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/poi-jvm-languages.xml b/src/documentation/content/xdocs/components/poi-jvm-languages.xml
new file mode 100644
index 0000000000..26ab85094d
--- /dev/null
+++ b/src/documentation/content/xdocs/components/poi-jvm-languages.xml
@@ -0,0 +1,351 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>JVM languages</title>
+ <authors>
+ <person id="JO" name="Javen O'Neal" email="onealj@apache.org"/>
+ </authors>
+ </header>
+
+ <body>
+ <section><title>Intro</title>
+ <p>
+ Apache POI can be used with any
+ <a href="https://en.wikipedia.org/wiki/List_of_JVM_languages">JVM language</a>
+ that can import Java jar files such as Jython, Groovy, Scala, Kotlin, and JRuby.
+ </p>
+ <ul>
+ <li><a href="#Jython+example">Jython</a></li>
+ <li><a href="#Scala+example">Scala</a></li>
+ <li><a href="#Groovy+example">Groovy</a></li>
+ <li><a href="#Clojure+example">Clojure</a></li>
+ </ul>
+ </section>
+
+
+ <section><title>Tested Environments</title>
+ <ul>
+ <li><a href="https://www.jython.org/">Jython</a> 2.5+ (older versions probably work, but are untested)</li>
+ <li><a href="https://www.scala-lang.org/">Scala</a> 2.x</li>
+ <li><a href="https://groovy-lang.org/">Groovy</a> 2.4 (anything from 1.6 onwards ought to work, but only the latest 2.4 releases have been tested by us)</li>
+ <li><a href="https://clojure.org/">Clojure</a> 1.5.1+</li>
+ </ul>
+ <p>If you use POI in a different language (Kotlin, JRuby, ...) and would like to share a <em>Hello POI!</em> example,
+ please share it.</p>
+ <p>Please <a href="site:mailinglists">let us know</a> if you use POI in an environment not listed here</p>
+ </section>
+
+ <!-- FIXME: Need to make each language section expandable/collapseable so that users can compare their language to Java on one screen. See https://jsfiddle.net/eJX8z/ for an example implementation. -->
+ <section><title>Java code</title>
+ <section><title>POILanguageExample.java</title>
+ <source> <!-- lang="java" -->
+ // include poi-{version}-{yyyymmdd}.jar, poi-ooxml-{version}-{yyyymmdd}.jar,
+ // and poi-ooxml-lite-{version}-{yyyymmdd}.jar on Java classpath
+
+ // Import the POI classes
+ import java.io.File;
+ import java.io.FileOutputStream;
+ import java.io.OutputStream;
+ import org.apache.poi.ss.usermodel.Cell;
+ import org.apache.poi.ss.usermodel.Row;
+ import org.apache.poi.ss.usermodel.Sheet;
+ import org.apache.poi.ss.usermodel.Workbook;
+ import org.apache.poi.ss.usermodel.WorkbookFactory;
+ import org.apache.poi.ss.usermodel.DataFormatter;
+
+ // Read the contents of the workbook
+ File f = new File("SampleSS.xlsx");
+ Workbook wb = WorkbookFactory.create(f);
+ DataFormatter formatter = new DataFormatter();
+ int i = 1;
+ int numberOfSheets = wb.getNumberOfSheets();
+ for ( Sheet sheet : wb ) {
+ System.out.println("Sheet " + i + " of " + numberOfSheets + ": " + sheet.getSheetName());
+ for ( Row row : sheet ) {
+ System.out.println("\tRow " + row.getRowNum());
+ for ( Cell cell : row ) {
+ System.out.println("\t\t"+ cell.getAddress().formatAsString() + ": " + formatter.formatCellValue(cell));
+ }
+ }
+ }
+
+ // Modify the workbook
+ Sheet sh = wb.createSheet("new sheet");
+ Row row = sh.createRow(7);
+ Cell cell = row.createCell(42);
+ cell.setActiveCell(true);
+ cell.setCellValue("The answer to life, the universe, and everything");
+
+ // Save and close the workbook
+ OutputStream fos = new FileOutputStream("SampleSS-updated.xlsx");
+ wb.write(fos);
+ fos.close();
+ </source>
+ </section> <!-- POILanguageExample.java -->
+ </section> <!-- Java code -->
+
+ <section><title>Jython example</title>
+ <source> <!-- lang="python" -->
+ # Add <a href="site:components">poi jars</a> onto the python classpath or add them at run time
+ import sys
+ for jar in ('poi', 'poi-ooxml', 'poi-ooxml-lite'):
+ sys.path.append('/path/to/%s-5.4.1.jar')
+
+ from java.io import File, FileOutputStream
+ from contextlib import closing
+
+ # Import the POI classes
+ from org.apache.poi.ss.usermodel import <a href="../apidocs/dev/org/apache/poi/ss/usermodel/WorkbookFactory.html">WorkbookFactory</a>, <a href="../apidocs/dev/org/apache/poi/ss/usermodel/DataFormatter.html">DataFormatter</a>
+
+ # Read the contents of the workbook
+ wb = WorkbookFactory.create(File('<a href="https://github.com/apache/poi/tree/trunk/test-data/spreadsheet/SampleSS.xlsx">SampleSS.xlsx</a>'))
+ formatter = DataFormatter()
+ for i, sheet in enumerate(wb, start=1):
+ print('Sheet %d of %d: %s'.format(i, wb.numberOfSheets, sheet.sheetName))
+ for row in sheet:
+ print('\tRow %i' % row.rowNum)
+ for cell in row:
+ print('\t\t%s: %s' % (cell.address, formatter.formatCellValue(cell)))
+
+ # Modify the workbook
+ sh = wb.createSheet('new sheet')
+ row = sh.createRow(7)
+ cell = sh.createCell(42)
+ cell.activeCell = True
+ cell.cellValue = 'The answer to life, the universe, and everything'
+
+ # Save and close the workbook
+ with closing(FileOutputStream('SampleSS-updated.xlsx')) as fos:
+ wb.write(fos)
+ wb.close()
+ </source>
+ <p>There are several websites that have examples of using Apache POI in Jython projects:
+ <a href="https://wiki.python.org/jython/PoiExample">python.org</a>,
+ <a href="https://www.jython.org/jythonbook/en/1.0/appendixB.html#working-with-spreadsheets">jython.org</a>, and many others.
+ </p>
+ </section>
+
+ <section><title>Scala example</title>
+ <section><title>build.sbt</title>
+ <source> <!-- lang="scala" -->
+ // Add the POI core and OOXML support dependencies into your build.sbt
+ libraryDependencies ++= Seq(
+ "org.apache.poi" % "poi" % "5.4.1",
+ "org.apache.poi" % "poi-ooxml" % "5.4.1",
+ "org.apache.poi" % "poi-ooxml-lite" % "5.4.1"
+ )
+ </source>
+ </section>
+ <section><title>XSSFMain.scala</title>
+ <source> <!-- lang="scala" -->
+ // Import the required classes
+ import org.apache.poi.ss.usermodel.{<a href="../apidocs/dev/org/apache/poi/ss/usermodel/WorkbookFactory.html">WorkbookFactory</a>, <a href="../apidocs/dev/org/apache/poi/ss/usermodel/DataFormatter.html">DataFormatter</a>}
+ import java.io.{File, FileOutputStream}
+
+ object XSSFMain extends App {
+
+ // Automatically convert Java collections to Scala equivalents
+ import scala.collection.JavaConversions._
+
+ // Read the contents of the workbook
+ val workbook = WorkbookFactory.create(new File("<a href="https://github.com/apache/poi/tree/trunk/test-data/spreadsheet/SampleSS.xlsx">SampleSS.xlsx</a>"))
+ val formatter = new DataFormatter()
+ for {
+ // Iterate and print the sheets
+ (sheet, i) &lt;- workbook.zipWithIndex
+ _ = println(s"Sheet $i of ${workbook.getNumberOfSheets}: ${sheet.getSheetName}")
+
+ // Iterate and print the rows
+ row &lt;- sheet
+ _ = println(s"\tRow ${row.getRowNum}")
+
+ // Iterate and print the cells
+ cell &lt;- row
+ } {
+ println(s"\t\t${cell.getCellAddress}: ${formatter.formatCellValue(cell)}")
+ }
+
+ // Add a sheet to the workbook
+ val sheet = workbook.createSheet("new sheet")
+ val row = sheet.createRow(7)
+ val cell = row.createCell(42)
+ cell.setAsActiveCell()
+ cell.setCellValue("The answer to life, the universe, and everything")
+
+ // Save the updated workbook as a new file
+ val fos = new FileOutputStream("SampleSS-updated.xlsx")
+ workbook.write(fos)
+ workbook.close()
+ }
+ </source>
+ </section>
+ </section>
+
+ <section><title>Groovy example</title>
+ <section><title>build.gradle</title>
+ <source> <!-- lang="groovy" -->
+// Add the POI core and OOXML support dependencies into your gradle build,
+// along with all of Groovy so it can run as a standalone script
+repositories {
+ mavenCentral()
+}
+dependencies {
+ runtime 'org.codehaus.groovy:groovy-all:2.5.15'
+ runtime 'org.apache.poi:poi:5.4.1'
+ runtime 'org.apache.poi:poi-ooxml:5.4.1'
+}
+ </source>
+ </section>
+ <section><title>SpreadSheetDemo.groovy</title>
+ <source> <!-- lang="groovy" -->
+import org.apache.poi.ss.usermodel.*
+import org.apache.poi.ss.util.*
+import java.io.File
+
+if (args.length == 0) {
+ println "Use:"
+ println " SpreadSheetDemo &lt;excel-file&gt; [output-file]"
+ return 1
+}
+
+File f = new File(args[0])
+DataFormatter formatter = new DataFormatter()
+WorkbookFactory.create(f,null,true).withCloseable { workbook ->
+ println "Has ${workbook.getNumberOfSheets()} sheets"
+
+ // Dump the contents of the spreadsheet
+ (0..&lt;workbook.getNumberOfSheets()).each { sheetNum ->
+ println "Sheet ${sheetNum} is called ${workbook.getSheetName(sheetNum)}"
+
+ def sheet = workbook.getSheetAt(sheetNum)
+ sheet.each { row ->
+ def nonEmptyCells = row.grep { c -> c.getCellType() != Cell.CELL_TYPE_BLANK }
+ println " Row ${row.getRowNum()} has ${nonEmptyCells.size()} non-empty cells:"
+ nonEmptyCells.each { c ->
+ def cRef = [c] as CellReference
+ println " * ${cRef.formatAsString()} = ${formatter.formatCellValue(c)}"
+ }
+ }
+ }
+
+ // Add two new sheets and populate
+ CellStyle headerStyle = makeHeaderStyle(workbook)
+ Sheet ns1 = workbook.createSheet("Generated 1")
+ exportHeader(ns1, headerStyle, null, ["ID","Title","Num"] as String[])
+ ns1.createRow(1).createCell(0).setCellValue("TODO - Populate with data")
+
+ Sheet ns2 = workbook.createSheet("Generated 2")
+ exportHeader(ns2, headerStyle, "This is a demo sheet",
+ ["ID","Title","Date","Author","Num"] as String[])
+ ns2.createRow(2).createCell(0).setCellValue(1)
+ ns2.createRow(3).createCell(0).setCellValue(4)
+ ns2.createRow(4).createCell(0).setCellValue(1)
+
+ // Save
+ File output = File.createTempFile("output-", (f.getName() =~ /(\.\w+$)/)[0][0])
+ output.withOutputStream { os -> workbook.write(os) }
+ println "Saved as ${output}"
+}
+
+CellStyle makeHeaderStyle(Workbook wb) {
+ int HEADER_HEIGHT = 18
+ CellStyle style = wb.createCellStyle()
+
+ style.setFillForegroundColor(IndexedColors.AQUA.getIndex())
+ style.setFillPattern(FillPatternType.SOLID_FOREGROUND)
+
+ Font font = wb.createFont()
+ font.setFontHeightInPoints((short)HEADER_HEIGHT)
+ font.setBold(true)
+ style.setFont(font)
+
+ return style
+}
+void exportHeader(Sheet s, CellStyle headerStyle, String info, String[] headers) {
+ Row r
+ int rn = 0
+ int HEADER_HEIGHT = 18
+ // Do they want an info row at the top?
+ if (info != null &amp;&amp; !info.isEmpty()) {
+ r = s.createRow(rn)
+ r.setHeightInPoints(HEADER_HEIGHT+1)
+ rn++
+
+ Cell c = r.createCell(0)
+ c.setCellValue(info)
+ c.setCellStyle(headerStyle)
+ s.addMergedRegion(new CellRangeAddress(0,0,0,headers.length-1))
+ }
+ // Create the header row, of the right size
+ r = s.createRow(rn)
+ r.setHeightInPoints(HEADER_HEIGHT+1)
+ // Add the column headings
+ headers.eachWithIndex { col, idx ->
+ Cell c = r.createCell(idx)
+ c.setCellValue(col)
+ c.setCellStyle(headerStyle)
+ s.autoSizeColumn(idx)
+ }
+ // Make all the columns filterable
+ s.setAutoFilter(new CellRangeAddress(rn, rn, 0, headers.length-1))
+}
+ </source>
+ </section>
+ </section>
+
+ <section><title>Clojure example</title>
+ <section><title>SpreadSheetDemo.clj</title>
+ <!-- code example provided by Blake Watson -->
+ <source> <!-- lang="clojure" -->
+(ns poi.core
+ (:gen-class)
+ (:use [clojure.java.io :only [input-stream]])
+ (:import [org.apache.poi.ss.usermodel WorkbookFactory DataFormatter]))
+
+
+(defn sheets [wb] (map #(.getSheetAt wb %1) (range 0 (.getNumberOfSheets wb))))
+
+(defn print-all [wb]
+ (let [df (DataFormatter.)]
+ (doseq [sheet (sheets wb)]
+ (doseq [row (seq sheet)]
+ (doseq [cell (seq row)]
+ (println (.formatAsString (.getAddress cell)) ": " (.formatCellValue df cell)))))))
+
+(defn -main [&amp; args]
+ (when-let [name (first args)]
+ (let [wb (WorkbookFactory/create (input-stream name))]
+ (print-all wb))))
+ </source>
+ </section>
+ </section>
+ </body>
+ <footer>
+ <legal>
+ Copyright (c) @year@ The Apache Software Foundation. All rights reserved.
+ <br />
+ Apache POI, POI, Apache, the Apache feather logo, and the Apache
+ POI project logo are trademarks of The Apache Software Foundation.
+ </legal>
+ </footer>
+</document>
diff --git a/src/documentation/content/xdocs/components/poi-ruby.xml b/src/documentation/content/xdocs/components/poi-ruby.xml
new file mode 100644
index 0000000000..ba7ceb586e
--- /dev/null
+++ b/src/documentation/content/xdocs/components/poi-ruby.xml
@@ -0,0 +1,151 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI Ruby Bindings</title>
+ <authors>
+ <person id="AS" name="Avik Sengupta" email="avik@apache.org"/>
+ </authors>
+ </header>
+
+ <body>
+ <section><title>Intro</title>
+ <p>The POI library can now be compiled as a Ruby extension, allowing the API to be called from
+ Ruby language programs. Ruby users can therefore read and write OLE2 documents, such as Excel files
+ with ease
+ </p>
+ <p>The bindings are generated by compiling POI with <a href="https://gcc.gnu.org/java/">gcj</a>,
+ and generating the Ruby wrapper using <a href="https://www.swig.org">SWIG</a>. The aim is the keep
+ the POI api as-is. However, where java standard library objects are used, an effort is made to transform them smoothly
+ into Ruby objects. Therefore, where the POI API takes an OutputStream, you can pass an IO object. Where the POI works
+ java.util.Date or java.util.Calendar object, you can work with a Ruby Time object. </p>
+ </section>
+
+
+ <section><title>Getting Started</title>
+ <section><title>Pre-Requisites</title>
+ <p>The bindings have been developed with GCC 3.4.3 and Ruby 1.8.2. You are unlikely to get correct results with
+ versions of GCC prior to 3.4 or versions of Ruby prior to 1.8. To compile the Ruby extension, you must have
+ GCC (compiled with java language support), Ruby development headers, and SWIG. To run, you will need Ruby (obviously!) and
+ <em>libgcj </em>, presumably from the same version of GCC with which you compiled.
+ </p>
+ </section>
+ <section><title>Source Repository</title>
+ <p>
+ The POI-Ruby module sits under the POI <a href="https://github.com/apache/poi/tree/trunk/src/contrib/poi-ruby/">Git</a>.
+ Running <em>make</em> inside that directory will create a loadable ruby extension <em>poi4r.so</em> in the release subdirectory. Tests
+ are in the <em>tests/</em> subdirectory, and should be run from the <em>poi-ruby</em> directory. Please read the tests to figure out the usage.
+ </p>
+ <p>Note that the makefile, though designed to work across Linux/OS X/Cygwin, has been tested only on linux.
+ There are likely to be issues on other platform; fixes gratefully accepted! </p>
+ </section>
+ <section><title>Binary</title>
+ <p>A version of poi4r.so is available <a href="https://www.apache.org/~avik/dist/poi4r.so">here</a> (broken link). Its been compiled on a linux box
+ with GCC 3.4.3 and Ruby 1.8.2. It dynamically links to libgcj. No guarantees about working on any other box. </p>
+ </section>
+ </section>
+
+
+
+
+ <section>
+ <title>Usage</title>
+ <p>The following ruby code shows some of the things you can do with POI in Ruby</p>
+ <source>
+ h=Poi4r::HSSFWorkbook.new
+ #Test Sheet Creation
+ s=h.createSheet("Sheet1")
+
+ #Test setting cell values
+ s=h.getSheetAt(0)
+ r=s.createRow(0)
+ c=r.createCell(0)
+ c.setCellValue(1.5)
+
+ c=r.createCell(1)
+ c.setCellValue("Ruby")
+
+ #Test styles
+ st = h.createCellStyle()
+ c=r.createCell(2)
+ st.setAlignment(Poi4r::HSSFCellStyle.ALIGN_CENTER)
+ c.setCellStyle(st)
+ c.setCellValue("centr'd")
+
+ #Date handling
+ c=r.createCell(3)
+ t1=Time.now
+ c.setCellValue(Time.now)
+ t2= c.getDateCellValue().gmtime
+
+ st=h.createCellStyle();
+ st.setDataFormat(Poi4r::HSSFDataFormat.getBuiltinFormat("m/d/yy h:mm"))
+ c.setCellStyle(st)
+
+ #Formulas
+ c=r.createCell(4)
+ c.setCellFormula("A1*2")
+ c.getCellFormula()
+
+ #Writing
+ h.write(File.new("test.xls","w"))
+ </source>
+ <p> The <em>tc_base_tests.rb</em> file in the <em>tests</em> sub directory of the source distribution
+ contains examples of simple uses of the API. The <a href="spreadsheet/quick-guide.html">quick guide </a> is the best
+ place to learn HSSF API use. (Note however that none of the Drawing features are implemented in the Ruby binding.)
+ See also the <a href="site:javadocs">POI API documentation</a> for more details.
+ </p>
+ </section>
+
+ <section>
+ <title>Future Directions</title>
+ <section><title>TODO's</title>
+ <ul>
+ <li>Implement support for reading Excel files (easy)</li>
+ <li>Expose POIFS API to read raw OLE2 files from Ruby</li>
+ <li>Expose HPSF API to read property streams </li>
+ <li>Tests... Tests... Tests...</li>
+ </ul>
+ </section>
+ <section><title>Limitations</title>
+ <ul>
+ <li>Check operations in 64bit machines - Java primitive types are fixed irrespective of machine type, unlike C/C++ types. The wrapping code
+ that converts C/C++ primitive types to/from Java types is making assumptions on type sizes that MAY be incorrect on wide architectures. </li>
+ <li>The current implementation is with the POI 2.0 release. The 2.5 release adds support for Excel drawing primitives, and
+ thus has a dependency on java AWT. Since AWT is not very mature in gcj, leaving it out seemed to be the safer option.</li>
+ <li>Packaging - The current make file makes no effort to install the extension into the standard ruby directories. This should probably be
+ packaged as a <a href="https://www.rubygems.org">gem</a>.</li>
+ </ul>
+ </section>
+
+ </section>
+
+ </body>
+ <footer>
+ <legal>
+ Copyright (c) @year@ The Apache Software Foundation. All rights reserved.
+ <br />
+ Apache POI, POI, Apache, the Apache feather logo, and the Apache
+ POI project logo are trademarks of The Apache Software Foundation.
+ </legal>
+ </footer>
+</document>
diff --git a/src/documentation/content/xdocs/components/poifs/design.xml b/src/documentation/content/xdocs/components/poifs/design.xml
new file mode 100644
index 0000000000..b4ab1d54f9
--- /dev/null
+++ b/src/documentation/content/xdocs/components/poifs/design.xml
@@ -0,0 +1,1099 @@
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Apache POI™ - POIFS - Design Document</title>
+ </header>
+ <body>
+ <section>
+ <title>POIFS Design Document</title>
+ <p>This document describes the design of the POIFS system. It is organized as follows:</p>
+ <ul>
+ <li>
+ <a href="#Scope">Scope:</a>
+ A description of the limitations of this document.
+ </li>
+ <li>
+ <a href="#Assumptions">Assumptions:</a>
+ The assumptions on which this design is based.
+ </li>
+ <li>
+ <a href="#Considerations">Design Considerations:</a>
+ The constraints and goals applied to the design.
+ </li>
+ <li>
+ <a href="#Design">Design:</a>
+ The design of the POIFS system.
+ </li>
+ </ul>
+ </section>
+ <section id="Scope">
+ <title>Scope</title>
+ <p>This document is written as part of an iterative process. As that process is not yet complete, neither is
+ this document.
+ </p>
+ </section>
+ <section id="Assumptions">
+ <title>Assumptions</title>
+ <p>The design of POIFS is not dependent on the code written for the proof-of-concept prototype POIFS
+ package.
+ </p>
+ </section>
+ <section id="Considerations">
+ <title>Design Considerations</title>
+ <p>As usual, the primary considerations in the design of the POIFS assumption involve the classic space-time
+ tradeoff. In this case, the main consideration has to involve minimizing the memory footprint of POIFS.
+ POIFS may be called upon to create relatively large documents, and in web application server, it may be
+ called upon to create several documents simultaneously, and it will likely co-exist with other
+ Serializer systems, competing with those other systems for space on the server.
+ </p>
+ <p>We've addressed the risk of being too slow through a proof-of-concept prototype. This prototype for POIFS
+ involved reading an existing file, decomposing it into its constituent documents, composing a new POIFS
+ from the constituent documents, and writing the POIFS file back to disk and verifying that the output
+ file, while not necessarily a byte-for-byte image of the input file, could be read by the application
+ that generated the input file. This prototype proved to be quite fast, reading, decomposing, and
+ re-generating a large (300K) file in 2 to 2.5 seconds.
+ </p>
+ <p>While the POIFS format allows great flexibility in laying out the documents and the other internal data
+ structures, the layout of the filesystem will be kept as simple as possible.
+ </p>
+ </section>
+ <section id="Design">
+ <title>Design</title>
+ <p>The design of the POIFS is broken down into two parts: <a href="#Classes">discussion of the classes and
+ interfaces</a>, and <a href="#Scenarios">discussion of how these classes and interfaces will be used to
+ convert an appropriate Java InputStream (such as an XML stream) to a POIFS output stream containing an
+ HSSF document</a>.
+ </p>
+ <p>
+ <strong id="Classes">Classes and Interfaces</strong>
+ </p>
+ <p>The classes and interfaces used in the POIFS are broken down as follows:</p>
+ <table>
+ <tr>
+ <th>Package</th>
+ <th>Contents</th>
+ </tr>
+ <tr>
+ <td>
+ <a href="#BlockClasses">net.sourceforge.poi.poifs.storage</a>
+ </td>
+ <td>Block classes and interfaces</td>
+ </tr>
+ <tr>
+ <td>
+ <a href="#PropertyClasses">net.sourceforge.poi.poifs.property</a>
+ </td>
+ <td>Property classes and interfaces</td>
+ </tr>
+ <tr>
+ <td>
+ <a href="#FilesystemClasses">net.sourceforge.poi.poifs.filesystem</a>
+ </td>
+ <td>Filesystem classes and interfaces</td>
+ </tr>
+ <tr>
+ <td>
+ <a href="#UtilityClasses">net.sourceforge.poi.util</a>
+ </td>
+ <td>Utility classes and interfaces</td>
+ </tr>
+ </table>
+
+ <section id="BlockClasses">
+ <title>Block Classes and Interfaces</title>
+ <p>The block classes and interfaces are shownin the following class diagram.</p>
+ <p>
+ <img src="images/BlockClassDiagram.gif" alt="Block Classes and Interfaces"/>
+ </p>
+ <table>
+ <tr>
+ <th>Class/Interface</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <th id="BATBlock">BATBlock</th>
+ <td>The <strong>BATBlock</strong> class represents a single big block containing 128
+ <a href="fileformat.html#BAT">BAT entries</a>.<br/>Its <code>_fields</code> array is used to
+ read and write the BAT entries into the <code>_data</code> array.
+ <br/>Its <code>createBATBlocks</code> method is used to create an array of BATBlock
+ instances from an array of int BAT entries.
+ <br/>
+ Its <code>calculateStorageRequirements</code> method calculates the number of BAT blocks
+ necessary to hold the specified number of BAT entries.
+ </td>
+ </tr>
+ <tr>
+ <th id="BigBlock">BigBlock</th>
+ <td>The <strong>BigBlock</strong> class is an abstract class representing the common big block
+ of 512 bytes. It implements <a href="#BlockWritable">BlockWritable</a>, trivially delegating
+ the <code>writeBlocks</code> method of BlockWritable to its own abstract <code>writeData
+ </code> method.
+ </td>
+ </tr>
+ <tr>
+ <th id="BlockWritable">BlockWritable</th>
+ <td>The <strong>BlockWritable</strong> interface defines a single method,
+ <code>writeBlocks</code>, that is used to write an implementation's block data to an <code>
+ OutputStream</code>.
+ </td>
+ </tr>
+ <tr>
+ <th id="DocumentBlock">DocumentBlock</th>
+ <td>The <strong>DocumentBlock</strong> class is used by a <a href="#Document">
+ Document
+ </a> to holds its raw data. It also retains the number of bytes read, as this is used by the
+ Document class to determine the total size of the data, and is also used internally to
+ determine whether the block was filled by the
+ <code>InputStream</code>
+ or not.
+ <br/>
+ The <code>DocumentBlock</code> constructor is passed an <code>InputStream</code> from which
+ to fill its <code>_data</code> array.
+ <br/>
+ The <code>size</code> method returns the number of bytes read (<code>_bytes_read</code>)
+ when the instance was constructed.
+ <br/>
+ The <code>partiallyRead</code> method returns true if the <code>_data</code> array was not
+ completely filled, which may be interpreted by the Document as having reached the end of
+ file point.<br/>Typical use of the DocumentBlock class is like this:
+ <br/>
+ <source><![CDATA[
+while (true) {
+ DocumentBlock block = new DocumentBlock(stream);
+ blocks.add(block);
+ size += block.size();
+ if (block.partiallyRead()) {
+ break;
+ }
+}]]></source>
+ </td>
+ </tr>
+ <tr>
+ <th id="HeaderBlock">HeaderBlock</th>
+ <td>The <strong>HeaderBlock</strong> class is used to contain the data found in a POIFS header.
+ <br/>
+ Its <a href="#IntegerField">IntegerField</a> members are used to read and write the
+ appropriate entries into the
+ <code>_data</code>
+ array.<br/>Its
+ <code>setBATBlocks</code>
+ ,
+ <code>setPropertyStart</code>
+ , and
+ <code>setXBATStart</code>
+ methods are used to set the appropriate fields in the
+ <code>_data</code>
+ array.<br/>The
+ <code>calculateXBATStorageRequirements</code>
+ method is used to determine how many XBAT blocks are necessary to accommodate the specified
+ number of BAT blocks.
+ </td>
+ </tr>
+ <tr>
+ <th id="PropertyBlock">PropertyBlock</th>
+ <td>The <strong>PropertyBlock</strong> class is used to contain
+ <a href="#Property">Property</a>
+ instances for the
+ <a href="#PropertyTable">PropertyTable</a>
+ class.<br/>It contains an array, <code>_properties</code> of 4 Property instances, which
+ together comprise the 512 bytes of a <a href="#BigBlock">BigBlock</a>.
+ <br/>
+ The <code>createPropertyBlockArray</code> method is used to convert a
+ <code>List</code>
+ of Property instances into an array of PropertyBlock instances. The number of Property
+ instances is rounded up to a multiple of 4 by creating empty anonymous inner class
+ extensions of Property.
+ </td>
+ </tr>
+ </table>
+ </section>
+
+ <section id="PropertyClasses">
+ <title>Property Classes and Interfaces</title>
+
+ <p>The property classes and interfaces are shown in the following class diagram.
+ </p>
+ <p>
+ <img src="images/PropertyTableClassDiagram.gif" alt="Property Classes and Interfaces"/>
+ </p>
+ <table>
+ <tr>
+ <th>Class/Interface</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <th id="Directory">Directory</th>
+ <td>The <strong>Directory</strong> interface is implemented by the
+ <a href="#RootProperty">RootProperty</a>
+ class. It is not strictly necessary for the initial POIFS implementation, but when the POIFS
+ supports <a href="fileformat.html#directoryEntry">directory elements</a>, this interface
+ will be more widely implemented, and so is included in the design at this point to ease the
+ eventual support of directory elements.<br/>Its methods are a getter/setter pair,
+ <code>getChildren</code>
+ , returning an <code>Iterator</code> of
+ <a href="#Property">Property</a>
+ instances; and
+ <code>addChild</code>
+ , which will allow the caller to add another Property instance to the Directory's children.
+ </td>
+ </tr>
+ <tr>
+ <th id="DocumentProperty">DocumentProperty</th>
+ <td>The <strong>DocumentProperty</strong> class is a trivial extension of <a href="#Property">
+ Property
+ </a> and is used by <a href="#Document">Document</a> to keep track of its associated entry in
+ the
+ <a href="#PropertyTable">PropertyTable</a>.<br/>Its constructor takes a name and the
+ document size, on the assumption that the Document will not create a DocumentProperty until
+ after it has created the storage for the document data and therefore knows how much data
+ there is.
+ </td>
+ </tr>
+ <tr>
+ <th id="File">File</th>
+ <td>The <strong>File</strong> interface specifies the behavior of reading and writing the next
+ and previous child fields of a <a href="#Property">Property</a>.
+ </td>
+ </tr>
+ <tr>
+ <th id="Property">Property</th>
+ <td>The <strong>Property</strong> class is an abstract class that defines the basic data
+ structure of an element of the <a href="fileformat.html#PropertyTable">
+ Property Table</a>.<br/>Its <a href="#ByteField">ByteField</a>, <a href="#ShortField">
+ ShortField</a>, and
+ <a href="#IntegerField">IntegerField</a>
+ members are used to read and write data into the appropriate locations in the
+ <code>_raw_data</code>
+ array.<br/>The
+ <code>_index</code>
+ member is used to hold a Propery instance's index in the <code>List</code> of Property
+ instances maintained by <a href="#PropertyTable">PropertyTable</a>, which is used to
+ populate the child property of parent
+ <a href="#Directory">Directory</a>
+ properties and the next property and previous property of sibling
+ <a href="#File">File</a>
+ properties.<br/>The
+ <code>_name</code>
+ ,
+ <code>_next_file</code>
+ , and
+ <code>_previous_file</code>
+ members are used to help fill the appropriate fields of the _raw_data array.<br/>Setters are
+ provided for some of the fields (name, property type, node color, child property, size,
+ index, start block), as well as a few getters (index, child property).<br/>The
+ <code>preWrite</code>
+ method is abstract and is used by the owning PropertyTable to iterate through its Property
+ instances and prepare each for writing.<br/>The
+ <code>shouldUseSmallBlocks</code>
+ method returns true if the Property's size is sufficiently small - how small is none of the
+ caller's business.
+ </td>
+ </tr>
+ <tr>
+ <th>PropertyBlock</th>
+ <td>See the description in <a href="#PropertyBlock">PropertyBlock</a>.
+ </td>
+ </tr>
+ <tr>
+ <th id="PropertyTable">PropertyTable</th>
+ <td>The <strong>PropertyTable</strong> class holds all of the
+ <a href="#DocumentProperty">DocumentProperty</a>
+ instances and the
+ <a href="#RootProperty">RootProperty</a>
+ instance for a
+ <a href="#Filesystem">Filesystem</a>
+ instance.<br/>It maintains a
+ <code>List</code>
+ of its
+ <a href="#Property">Property</a>
+ instances (
+ <code>_properties</code>
+ ), and when prepared to write its data by a call to
+ <code>preWrite</code>
+ , it gets and holds an array of
+ <a href="#PropertyBlock">PropertyBlock</a>
+ instances (
+ <code>_blocks</code>) .<br/>It also maintains its start block in its
+ <code>_start_block</code>
+ member.<br/>It has a method,
+ <code>getRoot</code>
+ , to get the RootProperty, returning it as an implementation of <a href="#Directory">
+ Directory</a>, and a method to add a Property,
+ <code>addProperty</code>
+ , and a method to get its start block,
+ <code>getStartBlock</code>
+ .
+ </td>
+ </tr>
+ <tr>
+ <th id="RootProperty">RootProperty</th>
+ <td>The <strong>RootProperty</strong> class acts as the <a href="#Directory">Directory</a> for
+ all of the
+ <a href="#DocumentProperty">DocumentProperty</a>
+ instance. As such, it is more of a pure <a href="fileformat.html#directoryEntry">directory
+ entry
+ </a> than a proper <a href="fileformat.html#RootEntry">root entry
+ </a> in the <a href="fileformat.html#PropertyTable">Property Table</a>, but the initial
+ POIFS implementation does not warrant the additional complexity of a full-blown root entry,
+ and so it is not modeled in this design.<br/>It maintains a
+ <code>List</code>
+ of its children,
+ <code>_children</code>
+ , in order to perform its directory-oriented duties.
+ </td>
+ </tr>
+ </table>
+ </section>
+
+ <section id="FilesystemClasses">
+ <title>Filesystem Classes and Interfaces</title>
+ <p>The property classes and interfaces are shown in the following class diagram.
+ </p>
+ <p>
+ <img src="images/POIFSClassDiagram.gif" alt="Filesystem Classes and Interfaces"/>
+ </p>
+ <table>
+ <tr>
+ <th>Class/Interface</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <th id="Filesystem">Filesystem</th>
+ <td>The <strong>Filesystem</strong> class is the top-level class that manages the creation of a
+ POIFS document.<br/>It maintains a
+ <a href="#PropertyTable">PropertyTable</a>
+ instance in its
+ <code>_property_table</code>
+ member, a
+ <a href="#HeaderBlock">HeaderBlock</a>
+ instance in its
+ <code>_header_block</code>
+ member, and a <code>List</code> of its
+ <a href="#Document">Document</a>
+ instances in its
+ <code>_documents</code>
+ member.<br/>It provides methods for a client to create a document (
+ <code>createDocument</code>
+ ), and a method to write the Filesystem to an
+ <code>OutputStream</code>
+ (
+ <code>writeFilesystem</code>
+ ).
+ </td>
+ </tr>
+ <tr>
+ <th>BATBlock</th>
+ <td>See the description in
+ <a href="#BATBlock">BATBlock</a>
+ </td>
+ </tr>
+ <tr>
+ <th id="BATManaged">BATManaged</th>
+ <td>The <strong>BATManaged</strong> interface defines common behavior for objects whose location
+ in the written file is managed by the <a href="fileformat.html#BAT">Block Allocation
+ Table</a>.<br/>It defines methods to get a count of the implementation's
+ <a href="#BigBlock">BigBlock</a>
+ instances (
+ <code>countBlocks</code>
+ ), and to set an implementation's start block (
+ <code>setStartBlock</code>
+ ).
+ </td>
+ </tr>
+ <tr>
+ <th id="BlockAllocationTable">BlockAllocationTable</th>
+ <td>The <strong>BlockAllocationTable</strong> is an implementation of the
+ POIFS <a href="fileformat.html#BAT">Block Allocation Table</a>. It is only created when the <a href="#Filesystem">
+ Filesystem
+ </a> is about to be written to an
+ <code>OutputStream</code>.<br/>It contains an <a href="#IntList">IntList</a> of block
+ numbers for all of the
+ <a href="#BATManaged">BATManaged</a>
+ implementations owned by the Filesystem,
+ <code>_entries</code>
+ , which is filled by calls to
+ <code>allocateSpace</code>
+ .<br/>It fills its array,
+ <code>_blocks</code>
+ , of
+ <a href="#BATBlock">BATBlock</a>
+ instances when its
+ <code>createBATBlocks</code>
+ method is called. This method has to take into account its own storage requirements, as well
+ as those of the XBAT blocks, and so calls
+ <code>BATBlock.calculateStorageRequirements</code>
+ and
+ <code>HeaderBlock.calculateXBATStorageRequirements</code>
+ repeatedly until the counts returned by those methods stabilize.<br/>The
+ <code>countBlocks</code>
+ method returns the number of BATBlock instances created by the preceding call to
+ createBlocks.
+ </td>
+ </tr>
+ <tr>
+ <th>BlockWritable</th>
+ <td>See the description in
+ <a href="#BlockWritable">BlockWritable</a>
+ </td>
+ </tr>
+ <tr>
+ <th id="Document">Document</th>
+ <td>The <strong>Document</strong> class is used to contain a document, such as an HSSF workbook.
+ <br/>It has its own
+ <a href="#DocumentProperty">DocumentProperty</a>
+ (
+ <code>_property</code>
+ ) and stores its data in a collection of
+ <a href="#DocumentBlock">DocumentBlock</a>
+ instances (
+ <code>_blocks</code>
+ ).<br/>It has a method,
+ <code>getDocumentProperty</code>
+ , to get its DocumentProperty.
+ </td>
+ </tr>
+ <tr>
+ <th>DocumentBlock</th>
+ <td>See the description in
+ <a href="#DocumentBlock">DocumentBlock</a>
+ </td>
+ </tr>
+ <tr>
+ <th>DocumentProperty</th>
+ <td>See the description in
+ <a href="#DocumentProperty">DocumentProperty</a>
+ </td>
+ </tr>
+ <tr>
+ <th>HeaderBlock</th>
+ <td>See the description in
+ <a href="#HeaderBlock">HeaderBlock</a>
+ </td>
+ </tr>
+ <tr>
+ <th>PropertyTable</th>
+ <td>See the description in
+ <a href="#PropertyTable">PropertyTable</a>
+ </td>
+ </tr>
+ </table>
+ </section>
+
+ <section id="UtilityClasses">
+ <title>Utility Classes and Interfaces</title>
+ <p>The utility classes and interfaces are shown in the following class diagram.
+ </p>
+ <p>
+ <img src="images/utilClasses.gif" alt="Utility Classes and Interfaces"/>
+ </p>
+ <table>
+ <tr>
+ <th>Class/Interface</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <th id="BitField">BitField</th>
+ <td>The <strong>BitField</strong> class is used primarily by HSSF code to manage bit-mapped
+ fields of HSSF records. It is not likely to be used in the POIFS code itself and is only
+ included here for the sake of complete documentation of the POI utility classes.
+ </td>
+ </tr>
+ <tr>
+ <th id="ByteField">ByteField</th>
+ <td>The <strong>ByteField</strong> class is an implementation of <a href="#FixedField">
+ FixedField
+ </a> for the purpose of managing reading and writing to a byte-wide field in an array of <code>
+ bytes</code>.
+ </td>
+ </tr>
+ <tr>
+ <th id="FixedField">FixedField</th>
+ <td>The <strong>FixedField</strong> interface defines a set of methods for reading a field from
+ an array of
+ <code>bytes</code>
+ or from an
+ <code>InputStream</code>, and for writing a field to an array of
+ <code>bytes</code>. Implementations typically require an offset in their constructors that,
+ for the purposes of reading and writing to an array of
+ <code>bytes</code>, makes sure that the correct <code>bytes</code> in the array are read or
+ written.
+ </td>
+ </tr>
+ <tr>
+ <th id="HexDump">HexDump</th>
+ <td>The <strong>HexDump</strong> class is a debugging class that can be used to dump an array of <code>
+ bytes
+ </code> to an <code>OutputStream</code>. The static method
+ <code>dump</code>
+ takes an array of <code>bytes</code>, a <code>long</code> offset that is used to label the
+ output, an open
+ <code>OutputStream</code>, and an
+ <code>int</code>
+ index that specifies the starting index within the array of
+ <code>bytes</code>.<br/>The data is displayed 16 bytes per line, with each byte displayed in
+ hexadecimal format and again in printable form, if possible (a byte is considered printable
+ if its value is in the range of 32 ... 126).<br/>Here is an example of a small array of
+ <code>bytes</code>
+ with an offset of 0x110:
+ <br/>
+ <source><![CDATA[
+00000110 C8 00 00 00 FF 7F 90 01 00 00 00 00 00 00 05 01 ................
+00000120 41 00 72 00 69 00 61 00 6C 00 A.r.i.a.l.
+ ]]></source>
+ </td>
+ </tr>
+ <tr>
+ <th id="IntegerField">IntegerField</th>
+ <td>The <strong>IntegerField</strong> class is an implementation of <a href="#FixedField">
+ FixedField
+ </a> for the purpose of managing reading and writing to an integer-wide field in an array
+ of <code>bytes</code>.
+ </td>
+ </tr>
+ <tr>
+ <th id="IntList">IntList</th>
+ <td>The <strong>IntList</strong> class is a work-around for functionality missing in Java (see
+ <a href="https://developer.java.sun.com/developer/bugParade/bugs/4487555.html">
+ https://developer.java.sun.com/developer/bugParade/bugs/4487555.html
+ </a>
+ for details); it is a simple growable array of <code>ints</code> that gets around the
+ requirement of wrapping and unwrapping <code>ints</code> in
+ <code>Integer</code>
+ instances in order to use the
+ <code>java.util.List</code>
+ interface.
+ <br/>
+ <strong>IntList</strong>
+ mimics the functionality of the
+ <code>java.util.List</code>
+ interface as much as possible.
+ </td>
+ </tr>
+ <tr>
+ <th id="LittleEndian">LittleEndian</th>
+ <td>The <strong>LittleEndian</strong> class provides a set of static methods for reading and
+ writing
+ <code>shorts</code>,
+ <code>ints</code>, <code>longs</code>, and <code>doubles</code> in and out of
+ <code>byte</code>
+ arrays, and out of
+ <code>InputStreams</code>, preserving the Intel byte ordering and encoding of these values.
+ </td>
+ </tr>
+ <tr>
+ <th id="LittleEndianConsts">LittleEndianConsts</th>
+ <td>The
+ <strong>LittleEndianConsts</strong>
+ interface defines the width of a
+ <code>short</code>, <code>int</code>,
+ <code>long</code>, and
+ <code>double</code>
+ as stored by Intel processors.
+ </td>
+ </tr>
+ <tr>
+ <th id="LongField">LongField</th>
+ <td>The <strong>LongField</strong> class is an implementation of <a href="#FixedField">
+ FixedField
+ </a> for the purpose of managing reading and writing to a long-wide field in an array of <code>
+ bytes</code>.
+ </td>
+ </tr>
+ <tr>
+ <th id="ShortField">ShortField</th>
+ <td>The <strong>ShortField</strong> class is an implementation of <a href="#FixedField">
+ FixedField
+ </a> for the purpose of managing reading and writing to a short-wide field in an array of <code>
+ bytes</code>.
+ </td>
+ </tr>
+ <tr>
+ <th id="ShortList">ShortList</th>
+ <td>The <strong>ShortList</strong> class is a work-around for functionality missing in Java (see
+ <a href="https://developer.java.sun.com/developer/bugParade/bugs/4487555.html">
+ https://developer.java.sun.com/developer/bugParade/bugs/4487555.html
+ </a>
+ for details); it is a simple growable array of <code>shorts</code> that gets around the
+ requirement of wrapping and unwrapping <code>shorts</code> in
+ <code>Short</code>
+ instances in order to use the
+ <code>java.util.List</code>
+ interface.
+ <br/>
+ <strong>ShortList</strong>
+ mimics the functionality of the
+ <code>java.util.List</code>
+ interface as much as possible.
+ </td>
+ </tr>
+ <tr>
+ <th id="StringUtil">StringUtil</th>
+ <td>The <strong>StringUtil</strong> class manages the processing of Unicode strings.
+ </td>
+ </tr>
+ </table>
+ </section>
+ </section>
+
+ <section id="Scenarios">
+ <title>Scenarios</title>
+ <p>This section describes the scenarios of how the POIFS classes and interfaces will be used to convert an
+ appropriate XML stream to a POIFS output stream containing an HSSF document.
+ </p>
+ <p>It is broken down as suggested by the following scenario diagram:
+ </p>
+ <p>
+ <img src="images/POIFSLifeCycle.gif" alt="POIFS LifeCycle"/>
+ </p>
+ <table>
+ <tr>
+ <th>Step</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <th>1</th>
+ <td>
+ <a href="#Initialization">The Filesystem is created by the client application.
+ </a>
+ </td>
+ </tr>
+ <tr>
+ <th>2</th>
+ <td><a href="#CreateDocument">The client application tells the Filesystem to create a document</a>,
+ providing an
+ <code>InputStream</code>
+ and the name of the document. This may be repeated several times.
+ </td>
+ </tr>
+ <tr>
+ <th>3</th>
+ <td>
+ <a href="#Initialization">The client application asks the Filesystem to write its data to
+ an <code>OutputStream</code>.
+ </a>
+ </td>
+ </tr>
+ </table>
+
+ <section id="Initialization">
+ <title>Initialization</title>
+ <p>Initialization of the POIFS system is shown in the following scenario diagram:
+ </p>
+ <p>
+ <img src="images/POIFSInitialization.gif" alt="Initialization"/>
+ </p>
+ <table>
+ <tr>
+ <th>Step</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <th>1</th>
+ <td>The
+ <a href="#Filesystem">Filesystem</a>
+ object, which is created for each request to convert an appropriate XML stream to a POIFS
+ output stream containing an HSSF document, creates its <a href="#PropertyTable">
+ PropertyTable</a>.
+ </td>
+ </tr>
+ <tr>
+ <th>2</th>
+ <td>The
+ <a href="#PropertyTable">PropertyTable</a>
+ creates its
+ <a href="#RootProperty">RootProperty</a>
+ instance, making the RootProperty the first
+ <a href="#Property">Property</a>
+ in its <code>List</code> of Property instances.
+ </td>
+ </tr>
+ <tr>
+ <th>3</th>
+ <td>The
+ <a href="#Filesystem">Filesystem</a>
+ creates its
+ <a href="#HeaderBlock">HeaderBlock</a>
+ instance. It should be noted that the decision to create the HeaderBlock at Filesystem
+ initialization is arbitrary; creation of the HeaderBlock could easily and harmlessly be
+ postponed to the appropriate moment in
+ <a href="#WriteFilesystem">writing the filesystem</a>.
+ </td>
+ </tr>
+ </table>
+ </section>
+
+ <section id="CreateDocument">
+ <title>Creating a Document</title>
+ <p>Creating and adding a document to a POIFS system is shown in the following scenario diagram:
+ </p>
+ <p>
+ <img src="images/POIFSAddDocument.gif" alt="Add Document"/>
+ </p>
+ <table>
+ <tr>
+ <th>Step</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <th>1</th>
+ <td>The
+ <a href="#Filesystem">Filesystem</a>
+ instance creates a new
+ <a href="#Document">Document</a>
+ instance. It will store the newly created Document in a
+ <code>List</code>
+ of
+ <a href="#BATManaged">BATManaged</a>
+ instances.
+ </td>
+ </tr>
+ <tr>
+ <th>2</th>
+ <td>The <a href="#Document">Document</a> reads data from the provided
+ <code>InputStream</code>, storing the data in
+ <a href="#DocumentBlock">DocumentBlock</a>
+ instances. It keeps track of the byte count as it reads the data.
+ </td>
+ </tr>
+ <tr>
+ <th>3</th>
+ <td>The <a href="#Document">Document</a> creates a
+ <a href="#DocumentProperty">DocumentProperty</a>
+ to keep track of its property data. The byte count is stored in the newly created
+ DocumentProperty instance.
+ </td>
+ </tr>
+ <tr>
+ <th>4</th>
+ <td>The
+ <a href="#Filesystem">Filesystem</a>
+ requests the newly created
+ <a href="#DocumentProperty">DocumentProperty</a>
+ from the newly created
+ <a href="#Document">Document</a>
+ instance.
+ </td>
+ </tr>
+ <tr>
+ <th>5</th>
+ <td>The
+ <a href="#Filesystem">Filesystem</a>
+ sends the newly created
+ <a href="#DocumentProperty">DocumentProperty</a>
+ to the Filesystem's
+ <a href="#PropertyTable">PropertyTable</a>
+ so that the PropertyTable can add the DocumentProperty to its
+ <code>List</code>
+ of
+ <a href="#Property">Property</a>
+ instances.
+ </td>
+ </tr>
+ <tr>
+ <th>6</th>
+ <td>The <a href="#Filesystem">Filesystem</a> gets the
+ <a href="#RootProperty">RootProperty</a>
+ from its <a href="#PropertyTable">PropertyTable</a>.
+ </td>
+ </tr>
+ <tr>
+ <th>7</th>
+ <td>The <a href="#Filesystem">Filesystem</a> adds the newly created
+ <a href="#DocumentProperty">DocumentProperty</a>
+ to the <a href="#RootProperty">RootProperty</a>.
+ </td>
+ </tr>
+ </table>
+ <p>Although typical deployment of the POIFS system will only entail adding a single <a href="#Document">
+ Document
+ </a> (the workbook) to the <a href="#Filesystem">Filesystem</a>, there is nothing in the design to
+ prevent multiple Documents from being added to the Filesystem. This flexibility can be employed to
+ write summary information document(s) in addition to the workbook.
+ </p>
+ </section>
+
+ <section id="WriteFilesystem">
+ <title>Writing the Filesystem</title>
+ <p>Writing the filesystem is shown in the following scenario diagram:
+ </p>
+ <p>
+ <img src="images/POIFSWriteFilesystem.gif" alt="Writing the Filesystem"/>
+ </p>
+ <table>
+ <tr>
+ <th>Step</th>
+ <th colspan="2">Description</th>
+ </tr>
+ <tr>
+ <th>1</th>
+ <td colspan="2">The <a href="#Filesystem">Filesystem</a> adds the
+ <a href="#PropertyTable">PropertyTable</a>
+ to its <code>List</code> of
+ <a href="#BATManaged">BATManaged</a>
+ instances and calls the PropertyTable's
+ <code>preWrite</code>
+ method. The action taken by the PropertyTable is shown in
+ the <a href="#PropertyTablePreWrite">PropertyTable preWrite scenario diagram</a>.
+ </td>
+ </tr>
+ <tr>
+ <th>2</th>
+ <td colspan="2">The
+ <a href="#Filesystem">Filesystem</a>
+ creates the <a href="#BlockAllocationTable">BlockAllocationTable</a>.
+ </td>
+ </tr>
+ <tr>
+ <th>3</th>
+ <td>The <a href="#Filesystem">Filesystem</a> gets the block count from the
+ <a href="#BATManaged">BATManaged</a>
+ instance.
+ </td>
+ <td rowspan="3">These three steps are repeated for each
+ <a href="#BATManaged">BATManaged</a>
+ instance in the <a href="#Filesystem">Filesystem</a>'s
+ <code>List</code>
+ of BATManaged instances (i.e., the <a href="#Document">Documents</a>, in order of their
+ addition to the Filesystem, followed by the <a href="#PropertyTable">PropertyTable</a>).
+ </td>
+ </tr>
+ <tr>
+ <th>4</th>
+ <td>The
+ <a href="#Filesystem">Filesystem</a>
+ sends the block count to the <a href="#BlockAllocationTable">
+ BlockAllocationTable</a>, which adds the appropriate entries to is <a href="#IntList">
+ IntList
+ </a> of entries, returning the starting block for the newly added entries.
+ </td>
+ </tr>
+ <tr>
+ <th>5</th>
+ <td>The
+ <a href="#Filesystem">Filesystem</a>
+ gives the start block number to the
+ <a href="#BATManaged">BATManaged</a>
+ instance. If the BATManaged instance is a <a href="#Document">Document</a>, it sets the
+ start block field in its
+ <a href="#DocumentProperty">DocumentProperty</a>.
+ </td>
+ </tr>
+ <tr>
+ <th>6</th>
+ <td colspan="2">The
+ <a href="#Filesystem">Filesystem</a>
+ tells the
+ <a href="#BlockAllocationTable">BlockAllocationTable</a>
+ to create its <a href="#BATBlock">BatBlocks</a>.
+ </td>
+ </tr>
+ <tr>
+ <th>7</th>
+ <td colspan="2">The
+ <a href="#Filesystem">Filesystem</a>
+ gives the BAT information to the <a href="#HeaderBlock">HeaderBlock</a> so that it can set
+ its BAT fields and, if necessary, create XBAT blocks.
+ </td>
+ </tr>
+ <tr>
+ <th>8</th>
+ <td colspan="2">If the filesystem is unusually large (over <strong>7MB</strong>), the
+ <a href="#HeaderBlock">HeaderBlock</a>
+ will create XBAT blocks to contain the BAT data that it cannot hold directly. In this case,
+ the
+ <a href="#Filesystem">Filesystem</a>
+ tells the HeaderBlock where those additional blocks will be stored.
+ </td>
+ </tr>
+ <tr>
+ <th>9</th>
+ <td colspan="2">The
+ <a href="#Filesystem">Filesystem</a>
+ gives the
+ <a href="#PropertyTable">PropertyTable</a>
+ start block to the <a href="#HeaderBlock">HeaderBlock</a>.
+ </td>
+ </tr>
+ <tr>
+ <th>10</th>
+ <td colspan="2">The
+ <a href="#Filesystem">Filesystem</a>
+ tells the
+ <a href="#BlockWritable">BlockWritable</a>
+ instance to write its blocks to the provided
+ <code>OutputStream</code>.<br/>This step is repeated for each BlockWritable instance, in
+ this order:
+ <br/>
+ <ol>
+ <li>
+ The <a href="#HeaderBlock">HeaderBlock</a>.
+ </li>
+ <li>
+ Each <a href="#Document">Document</a>, in the order in which it was added to
+ the <a href="#Filesystem">Filesystem</a>.
+ </li>
+ <li>
+ The <a href="#PropertyTable">PropertyTable</a>.
+ </li>
+ <li>
+ The
+ <a href="#BlockAllocationTable">BlockAllocationTable</a>
+ </li>
+ <li>
+ The XBAT blocks created by the
+ <a href="#HeaderBlock">HeaderBlock</a>, if any.
+ </li>
+ </ol>
+ </td>
+ </tr>
+ </table>
+ </section>
+
+ <section id="PropertyTablePreWrite">
+ <title>PropertyTable preWrite scenario diagram</title>
+ <p>
+ <img src="images/POIFSPropertyTablePreWrite.gif" alt="PropertyTable preWrite scenario diagram"/>
+ </p>
+ <table>
+ <tr>
+ <th>Step</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <th>1</th>
+ <td>The
+ <a href="#PropertyTable">PropertyTable</a>
+ calls
+ <code>setIndex</code>
+ for each of its
+ <a href="#Property">Property</a>
+ instances, so that each Property now knows its index within the PropertyTable's <code>List
+ </code> of Property instances.
+ </td>
+ </tr>
+ <tr>
+ <th>2</th>
+ <td>The
+ <a href="#PropertyTable">PropertyTable</a>
+ requests the
+ <a href="#PropertyBlock">PropertyBlock</a>
+ class to create an array of
+ <a href="#PropertyBlock">PropertyBlock</a>
+ instances.
+ </td>
+ </tr>
+ <tr>
+ <th>3</th>
+
+ <td>The
+ <a href="#PropertyBlock">PropertyBlock</a>
+ calculates the number of empty
+ <a href="#Property">Property</a>
+ instances it needs to create and creates them. The algorithm for the number to create is:
+ <br/>
+ <source><![CDATA[
+block_count = (properties.size() + 3) / 4;
+emptyPropertiesNeeded = (block_count * 4) - properties.size();]]></source>
+ </td>
+ </tr>
+ <tr>
+ <th>4</th>
+ <td>The
+ <a href="#PropertyBlock">PropertyBlock</a>
+ creates the required number of
+ <a href="#PropertyBlock">PropertyBlock</a>
+ instances from the
+ <code>List</code>
+ of
+ <a href="#Property">Property</a>
+ instances, including the newly created empty
+ <a href="#Property">Property</a>
+ instances.
+ </td>
+ </tr>
+ <tr>
+ <th>5</th>
+ <td>The
+ <a href="#PropertyTable">PropertyTable</a>
+ calls
+ <code>preWrite</code>
+ on each of its
+ <a href="#Property">Property</a>
+ instances. For
+ <a href="#DocumentProperty">DocumentProperty</a>
+ instances, this call is a no-op. For the <a href="#RootProperty">RootProperty</a>, the
+ action taken is shown in the <a href="#RootPropertyPreWrite">RootProperty preWrite scenario
+ diagram</a>.
+ </td>
+ </tr>
+ </table>
+ </section>
+
+ <section id="RootPropertyPreWrite">
+ <title>RootProperty preWrite scenario diagram</title>
+ <p>
+ <img src="images/POIFSRootPropertyPreWrite.gif" alt="RootProperty preWrite scenario diagram"/>
+ </p>
+ <table>
+ <tr>
+ <th>Step</th>
+ <th colspan="2">Description</th>
+ </tr>
+ <tr>
+ <th>1</th>
+ <td colspan="2">The
+ <a href="#RootProperty">RootProperty</a>
+ sets its child property with the index of the child <a href="#Property">Property</a> that is
+ first in its <code>List</code> of children.
+ </td>
+ </tr>
+ <tr>
+ <th>2</th>
+ <td>The
+ <a href="#RootProperty">RootProperty</a>
+ sets its child's next property field with the index of the child's next sibling in the
+ RootProperty's
+ <code>List</code>
+ of children. If the child is the last in the
+ <code>List</code>, its next property field is set to <code>-1</code>.
+ </td>
+ <td rowspan="2">These two steps are repeated for each <a href="#File">File</a> in
+ the <a href="#RootProperty">
+ RootProperty</a>'s
+ <code>List</code>
+ of children.
+ </td>
+ </tr>
+ <tr>
+ <th>3</th>
+ <td>The
+ <a href="#RootProperty">RootProperty</a>
+ sets its child's previous property field with a value of
+ <code>-1</code>.
+ </td>
+ </tr>
+ </table>
+ </section>
+ </section>
+ </body>
+</document> \ No newline at end of file
diff --git a/src/documentation/content/xdocs/components/poifs/embeded.xml b/src/documentation/content/xdocs/components/poifs/embeded.xml
new file mode 100644
index 0000000000..30852e199d
--- /dev/null
+++ b/src/documentation/content/xdocs/components/poifs/embeded.xml
@@ -0,0 +1,95 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+<document>
+ <header>
+ <title>Apache POI™ - POIFS - Documents embedded in other documents</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nick Burch" email="nick@apache.org"/>
+ <person name="Yegor Kozlov" email="yegor@apache.org"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>Overview</title>
+ <p>It is possible for one OLE 2 based document to have other
+ OLE 2 documents embedded in it. For example, an Excel file
+ may have a Word document and a PowerPoint slideshow
+ embedded as part of it.</p>
+ <p>Normally, these other documents are stored in subdirectories
+ of the OLE 2 (POIFS) filesystem. The exact location of the
+ embedded documents will vary depending on the type of the
+ master document, and the exact directory names will differ
+ each time. To figure out exactly which directory to look
+ in, you will either need to process the appropriate OLE 2
+ linking entry in the master document, or simple iterate
+ over all the directories in the filesystem.</p>
+ <p>As a general rule, you will find the same OLE 2 entries
+ in the subdirectories, as you would've found at the root
+ of the filesystem were a document to not be embedded.</p>
+
+ <section><title>Files embedded in Excel</title>
+ <p>Excel normally stores embedded files in subdirectories
+ of the filesystem root. Typically these subdirectories
+ are named starting with MBD, with 8 hex characters following.</p>
+ </section>
+
+ <section><title>Files embedded in Word</title>
+ <p>Word normally stores embedded files in subdirectories
+ of the ObjectPool directory, itself a subdirectory of the
+ filesystem root. Typically these subdirectories and named
+ starting with an underscore, followed by 10 numbers.</p>
+ </section>
+
+ <section><title>Files embedded in PowerPoint</title>
+ <p>PowerPoint does not normally store embedded files
+ in the OLE2 layer. Instead, they are held within records
+ of the main PowerPoint file.
+ <br/>See the <a href="./../slideshow/how-to-shapes.html#OLE">HSLF Tutorial</a>
+ for how to retrieve embedded OLE objects from a presentation</p>
+ </section>
+ </section>
+
+ <section><title>Listing POIFS contents</title>
+ <p>POIFS provides a simple tool for listing the contents of
+ OLE2 files. This can allow you to see what your POIFS file
+ contents, and hence if it has any embedded documents in it,
+ and where.</p>
+ <p>The tool to use is <em>org.apache.poi.poifs.dev.POIFSLister</em>.
+ This tool may be run from the command line, and takes a filename
+ as its parameter. It will print out all the directories and
+ files contained within the POIFS file.</p>
+ </section>
+
+ <section><title>Opening embedded files</title>
+ <p>All of the POIDocument classes (HSSFWorkbook, HSLFSlideShow,
+ HWPFDocument and HDGFDiagram) can either be opened from
+ a POIFSFileSystem, or from a specific directory within a
+ POIFSFileSystem. So, to open embedded files, simply locate the
+ appropriate DirectoryNode that represents the subdirectory
+ of interest, and pass this + the overall POIFSFileSystem to
+ the constructor.</p>
+ <p>I you want to extract the textual contents of the embedded file,
+ then open the appropriate POIDocument, and then pass this to
+ the extractor class, instead of simply passing the POIFSFilesystem
+ to the extractor.</p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/poifs/fileformat.xml b/src/documentation/content/xdocs/components/poifs/fileformat.xml
new file mode 100644
index 0000000000..0452c9adc3
--- /dev/null
+++ b/src/documentation/content/xdocs/components/poifs/fileformat.xml
@@ -0,0 +1,703 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+<document>
+ <header>
+ <title>POIFS File System Internals</title>
+ <authors>
+ <person email="mjohnson@apache.org" name="Marc Johnson" id="MJ"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>POIFS File System Internals</title>
+ <section><title>Introduction</title>
+ <p>POIFS file systems are essentially normal files stored on a
+ Java-compatible platform's native file system. They are
+ typically identified by names ending in a four character
+ extension noting what type of data they contain. For
+ example, a file ending in &quot;.xls&quot; would likely
+ contain spreadsheet data, and a file ending in
+ &quot;.doc&quot; would probably contain a word processing
+ document. POIFS file systems are called &quot;file
+ system&quot;, because they contain multiple embedded files
+ in a manner similar to traditional file systems. Along
+ functional lines, it would be more accurate to call these
+ POIFS archives. For the remainder of this document it is
+ referred to as a file system in order to avoid confusion
+ with the &quot;files&quot; it contains.</p>
+ <p>POIFS file systems are compatible with those document
+ formats used by a well-known software company's popular
+ office productivity suite and programs outputting
+ compatible data. Because the POIFS file system does not
+ provide compression, encryption or any other worthwhile
+ feature, its not a good choice unless you require
+ interoperability with these programs.</p>
+ <p>The POIFS file system does not encode the documents
+ themselves. For example, if you had a word processor file
+ with the extension &quot;.doc&quot;, you would actually
+ have a POIFS file system with a document file archived
+ inside of that file system.</p>
+ <p>Note - this document is a good overview and explanation of
+ the file format, but for the very nitty-gritty details,
+ you should refer to
+ <a href="https://msdn.microsoft.com/en-us/library/dd942138%28v=prot.13%29.aspx">[MS-CFB].pdf</a>
+ in the (now public) Microsoft Documentation.</p>
+ </section>
+ <section><title>Document Conventions</title>
+ <p>This document utilizes the numeric types as described by
+ the Java Language Specification, which can be found at
+ <a href="https://java.sun.com">https://java.sun.com</a>. In
+ short:</p>
+ <ul>
+ <li>A <em>byte</em> is an 8 bit signed integer ranging from
+ -128 to 127.</li>
+ <li>A <em>short</em> is a 16 bit signed integer ranging from
+ -32768 to 32767</li>
+ <li>An <em>int</em> is a 32 bit signed integer ranging from
+ -2147483648 to 2147483647</li>
+ <li>A <em>long</em> is a 64 bit signed integer ranging from
+ -9.22E18 to 9.22E18.</li>
+ </ul>
+ <p>The Java Language Specification spells out a number of
+ other types that are not referred to by this document.</p>
+ <p>Where this document makes references to &quot;endian
+ conversion&quot; it is referring to the byte order of
+ stored numbers. Numbers in &quot;little-endian order&quot;
+ are stored with the <em>least</em> significant byte first. In
+ order to properly read a short, for example, you'd read two
+ bytes and then shift the second byte 8 bits to the left
+ before performing an <code>or</code> operation to it
+ against the first byte. The following code illustrates this
+ method:</p>
+ <source>
+public int getShort (byte[] rec)
+{
+ return ((rec[1] &lt;&lt; 8) | (rec[0] &amp; 0x00ff));
+}</source>
+ </section>
+ <section><title>File System Walkthrough</title>
+ <p>This is a walkthrough of a POIFS file system and how it is
+ put together. It is not intended to give a concise
+ description but to give a &quot;big picture&quot; of the
+ general structure and how it's interpreted.</p>
+ <p>A POIFS file system begins with a header. This header
+ identifies locations in the file by function and provides a
+ sanity check identifying a file as a POIFS file system.</p>
+ <p>The first 64 bits of the header compose a <em>magic number
+ identifier.</em> This identifier tells the client software
+ that this is indeed a POIFS file system and that it should
+ be treated as such. This is a &quot;sanity check&quot; to
+ make sure this is a POIFS file system and not some other
+ format. The header also contains an <em>array of block
+ numbers</em>. These block numbers refer to blocks in the
+ file. When these blocks are read together they form the
+ <em>Block Allocation Table</em>. The header also contains a
+ pointer to the first element in the <em>property table</em>,
+ also known as the <em>root element</em>, and a pointer to the
+ <em>small Block Allocation Table (SBAT)</em>.</p>
+ <p>The <em>block allocation table</em> or <em>BAT</em>, along with
+ the <em>property table</em>, specify which blocks in the file
+ system belong to which files. After the header block, the
+ file system is divided into identically sized blocks of
+ data, numbered from 0 to however many blocks there are in
+ the file system. For each file in the file system, its
+ entry in the property table includes the index of the first
+ block in the array of blocks. Each block's index into the
+ array of blocks is also its index into the BAT, and the
+ integer value stored at that index in the BAT gives the
+ index of the next block in the array (and thus the index of
+ the next BAT value). A special value is stored in the BAT
+ to indicate &quot;end of file&quot;.</p>
+ <p>The <em>property table</em> is essentially the directory
+ storage for the file system. It consists of the name of the
+ file or directory, its <em>start block</em> in both the file
+ system and <em>BAT</em>, and its actual size. The first
+ property in the property table is the <em>root
+ element</em>. It has two purposes: to be a directory entry
+ (the root of the directory tree, to be specific), and to
+ hold the start block for the <em>small block data</em>.</p>
+ <p>Small block data is a special file that contains the data
+ for small files (less than 4K bytes). It subdivides its
+ blocks into smaller blocks and there is a special small
+ block allocation table that, like the main BAT for larger
+ files, is used to map a small file to its small blocks.</p>
+ </section>
+ <section><title>Header Block</title>
+ <p>The POIFS file system begins with a <em>header
+ block</em>. The first 64 bits of the header form a long
+ <em>file type id</em> or <em>magic number identifier</em> of
+ <code>0xE11AB1A1E011CFD0L</code>. This is basically a
+ sanity check. If this isn't the first thing in the header
+ (and consequently the file system) then this is not a
+ POIFS file system and should be read with some other
+ library.</p>
+ <p>It's important to know the most important parts of the
+ header. These are discussed in the rest of this
+ section.</p>
+ <section><title>BATs</title>
+ <p>At offset <em>0x2C</em> is an int specifying the number
+ of elements in the <em>BAT array</em>. The array at
+ <em>0x4C</em> an array of ints. This array contains the
+ indices of every block in the Block Allocation
+ Table.</p>
+ </section>
+ <section><title>XBATs</title>
+ <p>Very large POIFS archives may have more blocks than can
+ be addressed by the BAT blocks enumerated in the header
+ block. How large? Well, the BAT array in the header can
+ contain up to 109 BAT block indices; each BAT block
+ references up to 128 blocks, and each block is 512
+ bytes, so we're talking about 109 * 128 * 512 =
+ 6.8MB. That's a pretty respectable document! But, you
+ could have much more data than that, and in today's
+ world of cheap gigabyte drives, why not? So, the BAT
+ may be extended in that event. The integer value at
+ offset <em>0x44</em> of the header is the index of the
+ first <em>extended BAT (XBAT) block</em>. At offset
+ <em>0x48</em> of the header, there is an int value that
+ specifies how many XBAT blocks there are. The XBAT
+ blocks begin at the specified index into the array of
+ blocks making up the POIFS file system, and are chained
+ for the specified count of XBAT blocks.</p>
+ <p>Each XBAT block contains the indices of up to 127 BAT
+ blocks, so the document size can be expanded by another
+ ~8MB for each XBAT block. The BAT blocks indexed by an
+ XBAT block are appended to the end of the list of BAT
+ blocks enumerated in the header block. Thus the BAT
+ blocks enumerated in the header block are BAT blocks 0
+ through 108, the BAT blocks enumerated in the first
+ XBAT block are BAT blocks 109 through 235, the BAT
+ blocks enumerated in the second XBAT block are BAT
+ blocks 236 through 362, and so on.</p>
+ <p>While a normal BAT block holds 128 entries, each XBAT
+ only references 127 BAT blocks. The last, 128th entry
+ in an XBAT is the offset to the next XBAT block in the
+ chain (or -1 if this is the last XBAT).</p>
+ <p>Through the use of XBAT blocks, the limit on the
+ overall document size is that imposed by the 4-byte
+ block indices; if the indices are unsigned ints, the
+ maximum file size is 2 terabytes, 1 terabyte if the
+ indices are treated as signed ints. Either way, I have
+ yet to see a disk drive large enough to accommodate
+ such a file on the shelves at the local office supply
+ stores.</p>
+ </section>
+ <section><title>SBATs</title>
+ <p>If a file contained in a POIFS archive is smaller than
+ 4096 bytes, it is stored in small blocks. Small blocks
+ are 64 bytes in length and are contained within big
+ blocks, up to 8 to a big block. As the main BAT is used
+ to navigate the array of big blocks, so the <em>small
+ block allocation table</em> is used to navigate the
+ array of small blocks. The SBAT's start block index is
+ found at offset <em>0x3C</em> of the header block, and
+ remaining blocks constituting the SBAT are found by
+ walking the main BAT as if it were an ordinary file in
+ the POIFS file system (this process is described
+ below).</p>
+ </section>
+ <section><title>Property Table Start Index</title>
+ <p>An integer at address <em>0x30</em> specifies the start
+ index of the property table. This integer is specified
+ as a <em>&quot;block index&quot;</em>. The Property Table
+ is stored, as is almost everything in a POIFS file
+ system, in big blocks and walked via the BAT. The
+ Property Table is described below.</p>
+ </section>
+ </section>
+ <section><title>Property Table</title>
+ <p>The property table is essentially nothing more than the
+ directory system. Properties are 128 byte records
+ contained within the 512 byte blocks. The first property
+ is always the Root Entry. The following applies to
+ individual properties within a property table:</p>
+ <ul>
+ <li>At offset <em>0x00</em> in the property is the
+ &quot;<em>name</em>&quot;. This is stored as an
+ uncompressed 16 bit unicode string. In short every
+ other byte corresponds to an &quot;ASCII&quot;
+ character. The size of this string is stored at offset
+ <em>0x40</em> (<em>string size</em>) as a short.</li>
+ <li>At offset <em>0x42</em> is the <em>property type</em>
+ (byte). The type is 1 for directory, 2 for file or 5
+ for the Root Entry.</li>
+ <li>At offset <em>0x43</em> is the <em>node color</em>
+ (byte). The color is either 1, (black), or 0,
+ (red). Properties are apparently meant to be arranged
+ in a red-black binary tree, subject to the following
+ rules:
+ <ol>
+ <li>The root of the tree is always black</li>
+ <li>Two consecutive nodes cannot both be red</li>
+ <li>A property is less than another property if its
+ name length is less than the other property's name
+ length</li>
+ <li>If two properties have the same name length, the
+ sort order is determined by the sort order of the
+ properties' names.</li>
+ </ol></li>
+ <li>At offset <em>0x44</em> is the index (int) of the
+ <em>previous property</em>.</li>
+ <li>At offset <em>0x48</em> is the index (int) of the
+ <em>next property</em>.</li>
+ <li>At offset <em>0x4C</em> is the index (int) of the
+ <em>first directory entry</em>. This is used by
+ directory entries.</li>
+ <li>At offset <em>0x74</em> is an integer giving the
+ <em>start block</em> for the file described by this
+ property. This index corresponds to an index in the
+ array of indices that is the Block Allocation Table
+ (or the Small Block Allocation Table) as well as the
+ index of the first block in the file. This is used by
+ files and the root entry.</li>
+ <li>At offset <em>0x78</em> is an integer giving the total
+ <em>actual size</em> of the file pointed at by this
+ property. If the file size is less than 4096, the file
+ is stored in small blocks and the SBAT is used to walk
+ the small blocks making up the file. If the file size
+ is 4096 or larger, the file is stored in big blocks
+ and the main BAT is used to walk the big blocks making
+ up the file. The exception to this rule is the <em>Root
+ Entry</em>, which, regardless of its size, is
+ <em>always</em> stored in big blocks and the main BAT is
+ used to walk the big blocks making up this special
+ file.</li>
+ </ul>
+ </section>
+ <section><title>Root Entry</title>
+ <p>The <em>Root Entry</em> in the <em>Property Table</em>
+ contains the information necessary to read and write
+ small files, which are files less than 4096 bytes
+ long. The start block field of the Root Entry is the
+ start index of the <em>Small Block Array</em>, which is
+ read like any other file in the POIFS file system. Since
+ the SBAT cannot be used without the Small Block Array,
+ the Root Entry MUST be read or written using the <em>Block
+ Allocation Table</em>. The blocks making up the Small
+ Block Array are divided into 64-byte small blocks, up to
+ the size indicated in the Root Entry (which should always
+ be a multiple of 64).</p>
+ </section>
+ <section><title>Walking the Nodes of the Property Table</title>
+ <p>The individual properties form a directory tree, with the
+ <em>Root Entry</em> as the directory tree's root, as shown
+ in the accompanying drawing. Note the numbers in
+ parentheses in each node; they represent the node's index
+ in the array of properties. The <em>NEXT_PROP</em>,
+ <em>PREVIOUS_PROP</em>, and <em>CHILD_PROP</em> fields hold
+ these indices, and are used to navigate the tree.</p>
+ <p><img alt="property set" src="images/PropertySet.jpg" /></p>
+ <p>Each directory entry (i.e., a property whose type is
+ <em>directory</em> or <em>root entry</em>) uses its
+ <em>CHILD_PROP</em> field to point to one of its
+ subordinate (child) properties. It doesn't seem to matter
+ which of its children it points to. Thus in the previous
+ drawing, the Root Entry's CHILD_PROP field may contain 1,
+ 4, or the index of one of its other children. Similarly,
+ the directory node (index 1) may have, in its CHILD_PROP
+ field, 2, 3, or the index of one of its other
+ children.</p>
+ <p>The children of a given directory property point to each
+ other in a similar fashion by using their
+ <em>NEXT_PROP</em> and <em>PREVIOUS_PROP</em> fields.</p>
+ <p>Unused <em>NEXT_PROP</em>, <em>PREVIOUS_PROP</em>, and
+ <em>CHILD_PROP</em> fields contain the marker value of
+ -1. All file properties have a value of -1 for their
+ CHILD_PROP fields for example.</p>
+ </section>
+ <section><title>Block Allocation Table</title>
+ <p>The <em>BAT blocks</em> are pointed at by the bat array
+ contained in the header and supplemented, if necessary,
+ by the <em>XBAT blocks</em>. These blocks form a large
+ table of integers. These integers are block numbers. The
+ <em>Block Allocation Table</em> holds chains of integers.
+ These chains are terminated with -2. The elements in
+ these chains refer to blocks in the files. The starting
+ block of a file is NOT specified in the BAT. It is
+ specified by the <em>property</em> for a given file. The
+ elements in this BAT are both the block number (within
+ the file minus the header) <em>and</em> the number of the
+ next BAT element in the chain. This can be thought of as
+ a linked list of blocks. The BAT array contains the links
+ from one block to the next, including the end of chain
+ marker.</p>
+ <p>Here's an example: Let's assume that the BAT begins as
+ follows:</p>
+ <p><code>BAT[ 0 ] = 2</code></p>
+ <p><code>BAT[ 1 ] = 5</code></p>
+ <p><code>BAT[ 2 ] = 3</code></p>
+ <p><code>BAT[ 3 ] = 4</code></p>
+ <p><code>BAT[ 4 ] = 6</code></p>
+ <p><code>BAT[ 5 ] = -2</code></p>
+ <p><code>BAT[ 6 ] = 7</code></p>
+ <p><code>BAT[ 7 ] = -2</code></p>
+ <p><code>...</code></p>
+ <p>Now, if we have a file whose Property Table entry says it
+ begins with index 0, we walk the BAT array and see that
+ the file consists of blocks 0 (because the start block is
+ 0), 2 (because BAT[ 0 ] is 2), 3 (BAT[ 2 ] is 3), 4 (BAT[
+ 3 ] is 4), 6 (BAT[ 4 ] is 6), and 7 (BAT[ 6 ] is 7). It
+ ends at block 7 because BAT[ 7 ] is -2, which is the end
+ of chain marker.</p>
+ <p>Similarly, a file beginning at index 1 consists of
+ blocks 1 and 5.</p>
+ <p>Other special numbers in a BAT array are:</p>
+ <ul>
+ <li>-1, which indicates an unused block</li>
+ <li>-3, which indicates a &quot;special&quot; block, such
+ as a block used to make up the Small Block Array, the
+ Property Table, the main BAT, or the SBAT</li>
+ </ul>
+ </section>
+ <section><title>File System Structures</title>
+ <p>The following outlines the basic file system structures.</p>
+ <section><title>Header (block 1) -- 512 (0x200) bytes</title>
+ <table>
+ <tr>
+ <td><em>Field</em></td>
+ <td><em>Description</em></td>
+ <td><em>Offset</em></td>
+ <td><em>Length</em></td>
+ <td><em>Default value or const</em></td>
+ </tr>
+ <tr>
+ <td>FILETYPE</td>
+ <td>Magic number identifying this as a POIFS file
+ system.</td>
+ <td>0x0000</td>
+ <td>Long</td>
+ <td>0xE11AB1A1E011CFD0</td>
+ </tr>
+ <tr>
+ <td>UK1</td>
+ <td>Unknown constant</td>
+ <td>0x0008</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>UK2</td>
+ <td>Unknown Constant</td>
+ <td>0x000C</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>UK3</td>
+ <td>Unknown Constant</td>
+ <td>0x0014</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>UK4</td>
+ <td>Unknown Constant (revision?)</td>
+ <td>0x0018</td>
+ <td>Short</td>
+ <td>0x003B</td>
+ </tr>
+ <tr>
+ <td>UK5</td>
+ <td>Unknown Constant (version?)</td>
+ <td>0x001A</td>
+ <td>Short</td>
+ <td>0x0003</td>
+ </tr>
+ <tr>
+ <td>UK6</td>
+ <td>Unknown Constant</td>
+ <td>0x001C</td>
+ <td>Short</td>
+ <td>-2</td>
+ </tr>
+ <tr>
+ <td>LOG_2_BIG_BLOCK_SIZE</td>
+ <td>Log, base 2, of the big block size</td>
+ <td>0x001E</td>
+ <td>Short</td>
+ <td>9 (2 ^ 9 = 512 bytes)</td>
+ </tr>
+ <tr>
+ <td>LOG_2_SMALL_BLOCK_SIZE</td>
+ <td>Log, base 2, of the small block size</td>
+ <td>0x0020</td>
+ <td>Integer</td>
+ <td>6 (2 ^ 6 = 64 bytes)</td>
+ </tr>
+ <tr>
+ <td>UK7</td>
+ <td>Unknown Constant</td>
+ <td>0x0024</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>UK8</td>
+ <td>Unknown Constant</td>
+ <td>0x0028</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>BAT_COUNT</td>
+ <td>Number of elements in the BAT array</td>
+ <td>0x002C</td>
+ <td>Integer</td>
+ <td>required</td>
+ </tr>
+ <tr>
+ <td>PROPERTIES_START</td>
+ <td>Block index of the first block of the property
+ table</td>
+ <td>0x0030</td>
+ <td>Integer</td>
+ <td>required</td>
+ </tr>
+ <tr>
+ <td>UK9</td>
+ <td>Unknown Constant</td>
+ <td>0x0034</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>UK10</td>
+ <td>Unknown Constant</td>
+ <td>0x0038</td>
+ <td>Integer</td>
+ <td>0x00001000</td>
+ </tr>
+ <tr>
+ <td>SBAT_START</td>
+ <td>Block index of first big block containing the small
+ block allocation table (SBAT)</td>
+ <td>0x003C</td>
+ <td>Integer</td>
+ <td>-2</td>
+ </tr>
+ <tr>
+ <td>SBAT_Block_Count</td>
+ <td>Number of big blocks holding the SBAT</td>
+ <td>0x0040</td>
+ <td>Integer</td>
+ <td>1</td>
+ </tr>
+ <tr>
+ <td>XBAT_START</td>
+ <td>Block index of the first block in the Extended Block
+ Allocation Table (XBAT)</td>
+ <td>0x0044</td>
+ <td>Integer</td>
+ <td>-2</td>
+ </tr>
+ <tr>
+ <td>XBAT_COUNT</td>
+ <td>Number of elements in the Extended Block Allocation
+ Table (to be added to the BAT)</td>
+ <td>0x0048</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>BAT_ARRAY</td>
+ <td>Array of block indices constituting the Block
+ Allocation Table (BAT)</td>
+ <td>0x004C, 0x0050, 0x0054 ... 0x01FC</td>
+ <td>Integer[]</td>
+ <td>-1 for unused elements, at least first element must
+ be filled.</td>
+ </tr>
+ <tr>
+ <td>N/A</td>
+ <td>Header block data not otherwise described in this
+ table</td>
+ <td>N/A</td>
+ <td>N/A</td>
+ <td>-1</td>
+ </tr>
+ </table>
+ </section>
+ <section>
+ <title>Block Allocation Table Block -- 512 (0x200) bytes</title>
+ <table>
+ <tr>
+ <td>
+ <em>Field</em>
+ </td>
+ <td>
+ <em>Description</em>
+ </td>
+ <td>
+ <em>Offset</em>
+ </td>
+ <td>
+ <em>Length</em>
+ </td>
+ <td>
+ <em>Default value or const</em>
+ </td>
+ </tr>
+ <tr>
+ <td>BAT_ELEMENT</td>
+ <td>Any given element in the BAT block</td>
+ <td>0x0000, 0x0004, 0x0008, ... 0x01FC</td>
+ <td>Integer</td>
+ <td>
+ -1 = unused<br/>
+ -2 = end of chain<br/>
+ -3 = special (e.g., BAT block)<br/>
+ All other values point to the next element in the
+ chain and the next index of a block composing the
+ file.
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Property Block -- 512 (0x200) byte block</title>
+ <table>
+ <tr>
+ <td><em>Field</em></td>
+ <td><em>Description</em></td>
+ <td><em>Offset</em></td>
+ <td><em>Length</em></td>
+ <td><em>Default value or const</em></td>
+ </tr>
+ <tr>
+ <td>Properties[]</td>
+ <td>This block contains the properties.</td>
+ <td>0x0000, 0x0080, 0x0100, 0x0180</td>
+ <td>128 bytes</td>
+ <td>All unused space is set to -1.</td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Property -- 128 (0x80) byte block</title>
+ <table>
+ <tr>
+ <td><em>Field</em></td>
+ <td><em>Description</em></td>
+ <td><em>Offset</em></td>
+ <td><em>Length</em></td>
+ <td><em>Default value or const</em></td>
+ </tr>
+ <tr>
+ <td>NAME</td>
+ <td>A unicode null-terminated uncompressed 16bit string
+ (lose the high bytes) containing the name of the
+ property.</td>
+ <td>0x00, 0x02, 0x04, ... 0x3E</td>
+ <td>Short[]</td>
+ <td>0x0000 for unused elements, field required, 32
+ (0x40) element max</td>
+ </tr>
+ <tr>
+ <td>NAME_SIZE</td>
+ <td>Number of characters in the NAME field</td>
+ <td>0x40</td>
+ <td>Short</td>
+ <td>Required</td>
+ </tr>
+ <tr>
+ <td>PROPERTY_TYPE</td>
+ <td>Property type (directory, file, or root)</td>
+ <td>0x42</td>
+ <td>Byte</td>
+ <td>1 (directory), 2 (file), or 5 (root entry)</td>
+ </tr>
+ <tr>
+ <td>NODE_COLOR</td>
+ <td>Node color</td>
+ <td>0x43</td>
+ <td>Byte</td>
+ <td>0 (red) or 1 (black)</td>
+ </tr>
+ <tr>
+ <td>PREVIOUS_PROP</td>
+ <td>Previous property index</td>
+ <td>0x44</td>
+ <td>Integer</td>
+ <td>-1</td>
+ </tr>
+ <tr>
+ <td>NEXT_PROP</td>
+ <td>Next property index</td>
+ <td>0x48</td>
+ <td>Integer</td>
+ <td>-1</td>
+ </tr>
+ <tr>
+ <td>CHILD_PROP</td>
+ <td>First child property index</td>
+ <td>0x4c</td>
+ <td>Integer</td>
+ <td>-1</td>
+ </tr>
+ <tr>
+ <td>SECONDS_1</td>
+ <td>Seconds component of the created timestamp?</td>
+ <td>0x64</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>DAYS_1</td>
+ <td>Days component of the created timestamp?</td>
+ <td>0x68</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>SECONDS_2</td>
+ <td>Seconds component of the modified timestamp?</td>
+ <td>0x6C</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>DAYS_2</td>
+ <td>Days component of the modified timestamp?</td>
+ <td>0x70</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ <tr>
+ <td>START_BLOCK</td>
+ <td>Starting block of the file, used as the first block
+ in the file and the pointer to the next block from
+ the BAT</td>
+ <td>0x74</td>
+ <td>Integer</td>
+ <td>Required</td>
+ </tr>
+ <tr>
+ <td>SIZE</td>
+ <td>Actual size of the file this property points
+ to. (used to truncate the blocks to the real
+ size).</td>
+ <td>0x78</td>
+ <td>Integer</td>
+ <td>0</td>
+ </tr>
+ </table>
+ </section>
+ </section>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/poifs/how-to.xml b/src/documentation/content/xdocs/components/poifs/how-to.xml
new file mode 100644
index 0000000000..234838608e
--- /dev/null
+++ b/src/documentation/content/xdocs/components/poifs/how-to.xml
@@ -0,0 +1,649 @@
+<?xml version="1.0" encoding="UTF-8"?><!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+<document>
+ <header>
+ <title>How To Use the POIFS APIs</title>
+ <authors>
+ <person email="mjohnson@apache.org" name="Marc Johnson" id="MJ"/>
+ </authors>
+ </header>
+ <body>
+ <section>
+ <title>How To Use the POIFS APIs</title>
+ <p>This document describes how to use the POIFS APIs to read, write, and modify files that employ a
+ POIFS-compatible data structure to organize their content.
+ </p>
+ <section>
+ <title>Target Audience</title>
+ <p>This document is intended for Java developers who need to use the POIFS APIs to read, write, or
+ modify files that employ a POIFS-compatible data structure to organize their content. It is not
+ necessary for developers to understand the POIFS data structures, and an explanation of those data
+ structures is beyond the scope of this document. It is expected that the members of the target
+ audience will understand the rudiments of a hierarchical file system, and familiarity with the event
+ pattern employed by Java APIs such as AWT would be helpful.
+ </p>
+ </section>
+ <section>
+ <title>Glossary</title>
+ <p>This document attempts to be consistent in its terminology, which is defined here:</p>
+ <dl>
+ <dt>Directory</dt>
+ <dd>A special file that may contain other directories and documents.</dd>
+ <dt>DirectoryEntry</dt>
+ <dd>Representation of a directory within another directory.</dd>
+ <dt>Document</dt>
+ <dd>A file containing data, such as word processing data or a spreadsheet workbook.</dd>
+ <dt>DocumentEntry</dt>
+ <dd>Representation of a document within a directory.</dd>
+ <dt>Entry</dt>
+ <dd>Representation of a file in a directory.</dd>
+ <dt>File</dt>
+ <dd>A named entity, managed and contained by the file system.</dd>
+ <dt>File System</dt>
+ <dd>The POIFS data structures, plus the contained directories and documents, which are maintained in
+ a hierarchical directory structure.
+ </dd>
+ <dt>Root Directory</dt>
+ <dd>The directory at the base of a file system. All file systems have a root directory. The POIFS
+ APIs will not allow the root directory to be removed or renamed, but it can be accessed for the
+ purpose of reading its contents or adding files (directories and documents) to it.
+ </dd>
+ </dl>
+ </section>
+ </section>
+
+ <section>
+ <title>The different ways of working with POIFS</title>
+ <p>The POIFS API provides ways to read, modify and write files and streams that employ a POIFS-compatible
+ data structure to organize their content. The following use cases are covered:
+ </p>
+ <ul>
+ <li>
+ <a href="#reading">Reading a File System</a>
+ </li>
+ <li>
+ <a href="#reading_poifsfilesystem">Conventional Reading with POIFSFileSystem</a>
+ </li>
+ <li>
+ <a href="#reading_event">Event-Driven Reading</a>
+ </li>
+ <li>
+ <a href="#writing">Writing a File System</a>
+ </li>
+ <li>
+ <a href="#modifying">Modifying a File System</a>
+ </li>
+ </ul>
+ </section>
+
+ <section>
+ <title>Reading a File System</title>
+ <anchor id="reading"/>
+ <p>This section covers reading a file system. There are two ways to read a file system; these techniques are
+ sketched out in the following table, and then explained in greater depth in the sections following the
+ table.
+ </p>
+ <dl>
+ <dt>Conventional Reading with POIFSFileSystem</dt>
+ <dd>
+ <ul>
+ <li class="pro">Simpler API similar to reading a conventional file system.</li>
+ <li class="pro">Can read documents in any order.</li>
+ <li class="pro">Well tested read and write support.</li>
+ <li class="con">If created from an InputStream, all files are resident in memory. (If created
+ from a File, only certain key structures are)
+ </li>
+ </ul>
+ </dd>
+ <dt>Event-Driven Reading</dt>
+ <dd>
+ <ul>
+ <li class="pro">Reduced footprint -- only the documents you care about are processed.</li>
+ <li class="pro">Improved performance -- no time is wasted reading the documents you're not
+ interested in.
+ </li>
+ <li class="con">More complicated API.</li>
+ <li class="con">Need to know in advance which documents you want to read.</li>
+ <li class="con">No control over the order in which the documents are read.</li>
+ <li class="con">No way to go back and get additional documents except to re-read the file
+ system, which may not be possible, e.g., if the file system is being read from an input
+ stream that lacks random access support.
+ </li>
+ </ul>
+ </dd>
+ </dl>
+
+ <section>
+ <title>Conventional Reading with POIFSFileSystem</title>
+ <anchor id="reading_poifsfilesystem"/>
+ <p>In this technique for reading, certain key structures are loaded into memory, and the entire
+ directory tree can be walked by the application, reading specific documents at leisure.
+ </p>
+ <p>If you create a POIFSFileSystem instance from a File, the memory footprint is very small. However, if
+ you createa a POIFSFileSystem instance from an input stream, then the whole contents must be
+ buffered into memory to allow random access. As such, you should budget on memory use of up to 20%
+ of the file size when using a File, or up to 120% of the file size when using an InputStream.
+ </p>
+
+ <section>
+ <title>Preparation</title>
+ <p>Before an application can read a file from the file system, the file system needs to be opened
+ and core parts processed. This is done using the
+ <code>org.apache.poi.poifs.filesystem.POIFSFileSystem</code>
+ class. Once the file system has been loaded into memory, the application may need the root
+ directory. The following code fragment will accomplish this preparation stage:
+ </p>
+ <source><![CDATA[
+ // This is the most memory efficient way to open the FileSystem
+ try (POIFSFileSystem fs = new POIFSFileSystem(new File(filename))) {
+ DirectoryEntry root = fs.getRoot();
+ } catch (IOException e) {
+ // an I/O error occurred, or the File did not provide a compatible
+ // POIFS data structure
+ }
+
+ // Using an InputStream requires more memory than using a File
+ try (POIFSFileSystem fs = new POIFSFileSystem(inputStream)) {
+ DirectoryEntry root = fs.getRoot();
+ } catch (IOException e) {
+ // an I/O error occurred, or the InputStream did not provide
+ // a compatible POIFS data structure
+ }
+ ]]></source>
+ <p>Assuming no exception was thrown, the file system can then be read.</p>
+ </section>
+ <section>
+ <title>Reading the Directory Tree</title>
+ <p>Once the file system has been loaded into memory and the root directory has been obtained, the
+ root directory can be read. The following code fragment shows how to read the entries in an <code>
+ org.apache.poi.poifs.filesystem.DirectoryEntry
+ </code> instance:
+ </p>
+ <source><![CDATA[
+ // dir is an instance of DirectoryEntry ...
+ for (Entry entry : dir) {
+ System.out.println("found entry: " + entry.getName());
+ if (entry instanceof DirectoryEntry) {
+ // .. recurse into this directory
+ } else if (entry instanceof DocumentEntry) {
+ // entry is a document, which you can read
+ } else {
+ // currently, either an Entry is a DirectoryEntry or a DocumentEntry,
+ // but in the future, there may be other entry subinterfaces.
+ // The internal data structure certainly allows for a lot more entry types.
+ }
+ }
+ ]]></source>
+ </section>
+ <section>
+ <title>Reading a Specific Document</title>
+ <p>There are a couple of ways to read a document, depending on whether the document resides in the
+ root directory or in another directory. Either way, you will obtain an <code>
+ org.apache.poi.poifs.filesystem.DocumentInputStream
+ </code> instance.
+ </p>
+ <section>
+ <title>DocumentInputStream</title>
+ <p>The DocumentInputStream class is a simple implementation of InputStream that makes a few
+ guarantees worth noting:
+ </p>
+ <ul>
+ <li>
+ <code>available()</code>
+ always returns the number of bytes in the document from your current position in the
+ document.
+ </li>
+ <li>
+ <code>markSupported()</code>
+ returns <code>true</code>.
+ </li>
+ <li>
+ <code>mark(int limit)</code>
+ ignores the limit parameter; basically the method marks the current position in the
+ document.
+ </li>
+ <li>
+ <code>reset()</code>
+ takes you back to the position when <code>mark()</code> was last called, or to the
+ beginning of the document if <code>mark()</code> has not been called.
+ </li>
+ <li>
+ <code>skip(long n)</code>
+ will take you to your current position + n (but not past the end of the document).
+ </li>
+ </ul>
+ <p>The behavior of <code>available</code> means you can read in a document in a single read call
+ like this:
+ </p>
+ <source><![CDATA[
+ byte[] content = new byte[ stream.available() ];
+ stream.read(content);
+ stream.close();
+ ]]></source>
+ <p>The combination of <code>mark</code>, <code>reset</code>, and <code>skip</code> provide the
+ basic mechanisms needed for random access of the document contents.
+ </p>
+ </section>
+ <section>
+ <title>Reading a Document From the Root Directory</title>
+ <p>If the document resides in the root directory, you can obtain a <code>DocumentInputStream
+ </code> like this:
+ </p>
+ <source><![CDATA[
+ // load file system
+ try (DocumentInputStream stream = filesystem.createDocumentInputStream(documentName)) {
+ // process data from stream
+ } catch (IOException e) {
+ // no such document, or the Entry represented by documentName is not a DocumentEntry
+ }
+ ]]></source>
+ </section>
+ <section>
+ <title>Reading a Document From an Arbitrary Directory</title>
+ <p>A more generic technique for reading a document is to obtain an <code>
+ org.apache.poi.poifs.filesystem.DirectoryEntry
+ </code> instance for the directory containing the desired document (recall that you can use <code>
+ getRoot()
+ </code> to obtain the root directory from its file system). From that DirectoryEntry, you can
+ then obtain a <code>DocumentInputStream</code> like this:
+ </p>
+ <source><![CDATA[
+ DocumentEntry document = (DocumentEntry)directory.getEntry(documentName);
+ DocumentInputStream stream = new DocumentInputStream(document);
+ ]]></source>
+ </section>
+ </section>
+ </section>
+
+ <section>
+ <title>Event-Driven Reading</title>
+ <anchor id="reading_event"/>
+ <p>The event-driven API for reading documents is a little more complicated and requires that your
+ application know, in advance, which files it wants to read. The benefit of using this API is that
+ each document is in memory just long enough for your application to read it, and documents that you
+ never read at all are not in memory at all. When you're finished reading the documents you wanted,
+ the file system has no data structures associated with it at all and can be discarded.
+ </p>
+ <section>
+ <title>Preparation</title>
+ <p>The preparation phase involves creating an instance of <code>
+ org.apache.poi.poifs.eventfilesystem.POIFSReader
+ </code> and to then register one or more <code>
+ org.apache.poi.poifs.eventfilesystem.POIFSReaderListener
+ </code> instances with the <code>POIFSReader</code>.
+ </p>
+ <source><![CDATA[
+ POIFSReader reader = new POIFSReader();
+ // register for everything
+ reader.registerListener(myOmnivorousListener);
+ // register for selective files
+ reader.registerListener(myPickyListener, "foo");
+ reader.registerListener(myPickyListener, "bar");
+ // register for selective files
+ reader.registerListener(myOtherPickyListener, new POIFSDocumentPath(), "fubar");
+ reader.registerListener(myOtherPickyListener, new POIFSDocumentPath( new String[] { "usr", "bin" ), "fubar");
+ ]]></source>
+ </section>
+ <section>
+ <title>POIFSReaderListener</title>
+ <p>
+ <code>org.apache.poi.poifs.eventfilesystem.POIFSReaderListener</code>
+ is an interface used to register for documents. When a matching document is read by the <code>
+ org.apache.poi.poifs.eventfilesystem.POIFSReader</code>, the <code>POIFSReaderListener</code> instance
+ receives an <code>org.apache.poi.poifs.eventfilesystem.POIFSReaderEvent</code> instance, which
+ contains an open <code>DocumentInputStream</code> and information about the document.
+ </p>
+ <p>A <code>POIFSReaderListener</code> instance can register for individual documents, or it can
+ register for all documents; once it has registered for all documents, subsequent (and previous!)
+ registration requests for individual documents are ignored. There is no way to unregister
+ a <code>POIFSReaderListener</code>.
+ </p>
+ <p>Thus, it is possible to register a single <code>POIFSReaderListener</code> for multiple documents
+ - one, some, or all documents. It is guaranteed that a single <code>POIFSReaderListener</code> will
+ receive exactly one notification per registered document. There is no guarantee as to the order
+ in which it will receive notification of its documents, as future implementations of <code>
+ POIFSReader
+ </code> are free to change the algorithm for walking the file system's directory structure.
+ </p>
+ <p>It is also permitted to register more than one <code>POIFSReaderListener</code> for the same
+ document. There is no guarantee of ordering for notification of <code>POIFSReaderListener</code> instances
+ that have registered for the same document when <code>POIFSReader</code> processes that
+ document.
+ </p>
+ <p>It is guaranteed that all notifications occur in the same thread. A future enhancement may be
+ made to provide multi-threaded notifications, but such an enhancement would very probably be
+ made in a new reader class, a <code>ThreadedPOIFSReader</code> perhaps.
+ </p>
+ <p>The following describes the three ways to register a <code>POIFSReaderListener</code> for a
+ document or set of documents:
+ </p>
+ <dl>
+ <dt>registers <em>listener</em> for all documents.
+ </dt>
+ <dd>registerListener(POIFSReaderListener <em>listener</em>)
+ </dd>
+ <dt>registers <em>listener</em> for a document with the specified <em>name</em> in the root
+ directory.
+ </dt>
+ <dd>registerListener(POIFSReaderListener <em>listener</em>, String <em>name</em>)
+ </dd>
+ <dt>registers <em>listener</em> for a document with the specified <em>name</em> in the directory
+ described by
+ <em>path</em>
+ </dt>
+ <dd>registerListener(POIFSReaderListener <em>listener</em>, POIFSDocumentPath <em>path</em>,
+ String <em>name</em>)
+ </dd>
+ </dl>
+ </section>
+ <section>
+ <title>POIFSDocumentPath</title>
+ <p>The <code>org.apache.poi.poifs.filesystem.POIFSDocumentPath</code> class is used to describe a
+ directory in a POIFS file system. Since there are no reserved characters in the name of a file
+ in a POIFS file system, a more traditional string-based solution for describing a directory,
+ with special characters delimiting the components of the directory name, is not feasible. The
+ constructors for the class are used as follows:
+ </p>
+ <table>
+ <tr>
+ <td>
+ <em>Constructor example</em>
+ </td>
+ <td>
+ <em>Directory described</em>
+ </td>
+ </tr>
+ <tr>
+ <td>new POIFSDocumentPath()</td>
+ <td>The root directory.</td>
+ </tr>
+ <tr>
+ <td>new POIFSDocumentPath(null)</td>
+ <td>The root directory.</td>
+ </tr>
+ <tr>
+ <td>new POIFSDocumentPath(new String[ 0 ])</td>
+ <td>The root directory.</td>
+ </tr>
+ <tr>
+ <td>new POIFSDocumentPath(new String[ ] { "foo", "bar"} )</td>
+ <td>in Unix terminology, "/foo/bar".</td>
+ </tr>
+ <tr>
+ <td>new POIFSDocumentPath(new POIFSDocumentPath(new String[] { "foo" }), new String[ ] {
+ "fu", "bar"} )
+ </td>
+ <td>in Unix terminology, "/foo/fu/bar".</td>
+ </tr>
+ </table>
+ </section>
+ <section>
+ <title>Processing POIFSReaderEvent Events</title>
+ <p>Processing <code>org.apache.poi.poifs.eventfilesystem.POIFSReaderEvent</code> events is
+ relatively easy. After all of the <code>POIFSReaderListener</code> instances have been
+ registered with <code>POIFSReader</code>, the <code>POIFSReader.read(InputStream stream)</code> method
+ is called.
+ </p>
+ <p>Assuming that there are no problems with the data, as the <code>POIFSReader</code> processes the
+ documents in the specified <code>InputStream</code>'s data, it calls registered <code>
+ POIFSReaderListener
+ </code> instances' <code>processPOIFSReaderEvent</code> method with a <code>POIFSReaderEvent
+ </code> instance.
+ </p>
+ <p>The <code>POIFSReaderEvent</code> instance contains information to identify the document (a <code>
+ POIFSDocumentPath
+ </code> object to identify the directory that the document is in, and the document name), and an
+ open <code>DocumentInputStream</code> instance from which to read the document.
+ </p>
+ </section>
+ </section>
+ </section>
+
+ <section>
+ <title>Writing a File System</title>
+ <anchor id="writing"/>
+ <p>Writing a file system is very much like reading a file system in that there are multiple ways to do so.
+ You can load an existing file system into memory and modify it (removing files, renaming files) and/or
+ add new files to it, and write it, or you can start with a new, empty file system:
+ </p>
+ <source>
+ POIFSFileSystem fs = new POIFSFileSystem();
+ </source>
+ <section>
+ <title>The Naming of Names</title>
+ <p>There are two restrictions on the names of files in a file system that must be considered when
+ creating files:
+ </p>
+ <ol>
+ <li>The name of the file must not exceed 31 characters. If it does, the POIFS API will silently
+ truncate the name to fit.
+ </li>
+ <li>The name of the file must be unique within its containing directory. This seems pretty obvious,
+ but if it isn't spelled out, there'll be hell to pay, to be sure. Uniqueness, of course, is
+ determined <em>after</em> the name has been truncated, if the original name was too long to
+ begin with.
+ </li>
+ </ol>
+ </section>
+ <section>
+ <title>Creating a Document</title>
+ <p>A document can be created by acquiring a <code>DirectoryEntry</code> and calling one of the two <code>
+ createDocument
+ </code> methods:
+ </p>
+
+ <dl>
+ <dt>createDocument(String name, InputStream stream)</dt>
+ <dd>
+ <ul>
+ <li class="pro">Simple API</li>
+ <li class="con">Increased memory footprint (document is in memory until file system is
+ written).
+ </li>
+ </ul>
+ </dd>
+ <dt>createDocument(String name, int size, POIFSWriterListener writer)</dt>
+ <dd>
+ <ul>
+ <li class="pro">Decreased memory footprint (only very small documents are held in memory,
+ and then only for a short time).
+ </li>
+ <li class="con">More complex API.</li>
+ <li class="con">Determining document size in advance may be difficult.</li>
+ <li class="con">Lose control over when document is to be written.</li>
+ </ul>
+ </dd>
+ </dl>
+
+ <p>Unlike reading, you don't have to choose between the in-memory and event-driven writing models; both
+ can co-exist in the same file system.
+ </p>
+ <p>Writing is initiated when the <code>POIFSFileSystem</code> instance's <code>writeFilesystem()</code> method
+ is called with an <code>OutputStream</code> to write to.
+ </p>
+ <p>The event-driven model is quite similar to the event-driven model for reading, in that the file
+ system calls your <code>org.apache.poi.poifs.filesystem.POIFSWriterListener</code> when it's time to
+ write your document, just as the <code>POIFSReader</code> calls your <code>POIFSReaderListener
+ </code> when it's time to read your document. Internally, when <code>writeFilesystem()</code> is
+ called, the final POIFS data structures are created and are written to the specified <code>
+ OutputStream</code>. When the file system needs to write a document out that was created with
+ the event-driven model, it calls the <code>POIFSWriterListener</code> back, calling its <code>
+ processPOIFSWriterEvent()
+ </code> method, passing an <code>org.apache.poi.poifs.filesystem.POIFSWriterEvent</code> instance.
+ This object contains the <code>POIFSDocumentPath</code> and name of the document, its size, and an
+ open <code>org.apache.poi.poifs.filesystem.DocumentOutputStream</code> to which to write. A <code>
+ DocumentOutputStream
+ </code> is a wrapper over the <code>OutputStream</code> that was provided to the <code>
+ POIFSFileSystem
+ </code> to write to, and has the responsibility of making sure that the document your application
+ writes fits within the size you specified for it.
+ </p>
+ <p>If you are using a <code>POIFSFileSystem</code> loaded from a
+ <code>File</code>
+ with <code>readOnly</code> set to false, it is also possible to do an in-place write. Simply call <code>
+ writeFilesystem()
+ </code> to have the (limited) in-memory structures synced with the disk, then <code>close()</code> to
+ finish.
+ </p>
+ </section>
+ <section>
+ <title>Creating a Directory</title>
+ <p>Creating a directory is similar to creating a document, except that there's only one way to do so:
+ </p>
+ <source>
+ DirectoryEntry createdDir = existingDir.createDirectory(name);
+ </source>
+ </section>
+ <section>
+ <title>Using POIFSFileSystem Directly To Create a Document Or Directory</title>
+ <p>As with reading documents, it is possible to create a new document or directory in the root directory
+ by using convenience methods of POIFSFileSystem.
+ </p>
+ <table>
+ <tr>
+ <td>
+ <em>DirectoryEntry Method Signature</em>
+ </td>
+ <td>
+ <em>POIFSFileSystem Method Signature</em>
+ </td>
+ </tr>
+ <tr>
+ <td>createDocument(String name, InputStream stream)</td>
+ <td>createDocument(InputStream stream, String name)</td>
+ </tr>
+ <tr>
+ <td>createDocument(String name, int size, POIFSWriterListener writer)</td>
+ <td>createDocument(String name, int size, POIFSWriterListener writer)</td>
+ </tr>
+ <tr>
+ <td>createDirectory(String name)</td>
+ <td>createDirectory(String name)</td>
+ </tr>
+ </table>
+ </section>
+ </section>
+
+ <section>
+ <title>Modifying a File System</title>
+ <anchor id="modifying"/>
+ <p>It is possible to modify an existing POIFS file system, whether it's one your application has loaded into
+ memory, or one which you are creating on the fly.
+ </p>
+ <section>
+ <title>Removing a Document</title>
+ <p>Removing a document is simple: you get the <code>Entry</code> corresponding to the document and call
+ its <code>delete()</code> method. This is a boolean method, but should always return <code>
+ true</code>, indicating that the operation succeeded.
+ </p>
+ </section>
+ <section>
+ <title>Removing a Directory</title>
+ <p>Removing a directory is also simple: you get the <code>Entry</code> corresponding to the directory
+ and call its <code>delete()</code> method. This is a boolean method, but, unlike deleting a
+ document, may not always return <code>true</code>, indicating that the operation succeeded. Here are
+ the reasons why the operation may fail:
+ </p>
+ <ul>
+ <li>The directory still has files in it (to check, call <code>isEmpty()</code> on its
+ DirectoryEntry; is the return value <code>false</code>?)
+ </li>
+ <li>The directory is the root directory. You cannot remove the root directory.</li>
+ </ul>
+ </section>
+ <section>
+ <title>Changing a File's contents</title>
+ <p>There are two ways available to change the contents of an existing file within a POIFS file system.
+ One is using a <code>DocumentOutputStream</code>, the other is with
+ <code>POIFSDocument.replaceContents</code>
+ </p>
+ <p>If you have available to you an <code>InputStream</code> to read the new File contents from, then the
+ easiest way is via
+ <code>POIFSDocument.replaceContents</code>. You would do something like:
+ </p>
+ <source><![CDATA[
+ // Get the input stream from somewhere
+ InputStream inp = db.getContentStream();
+
+ // Open the POIFS File System, and get the entry of interest
+
+ POIFSFileSystem fs = new POIFSFileSystem(new File(filename), false);
+ DirectoryEntry root = fs.getRoot();
+ DocumentEntry myDocE = (DocumentEntry)root.getEntry("ToChange");
+
+ // Replace the contents
+ POIFSDocument myDoc = new POIFSDocument(myDocE);
+ myDoc.replaceContents(inp);
+
+ // Save the changes to the file in-place
+ fs.writeFileSystem();
+ fs.close();
+ ]]></source>
+ <p>Alternately, if you either have a byte array, or you need to write as you go along, then the
+ OutputStream interface provided by
+ <code>DocumentOutputStream</code>
+ will likely be a better bet. Your code would want to look somewhat like:
+ </p>
+ <source><![CDATA[
+ // Open the POIFS File System, and get the entry of interest
+ try (POIFSFileSystem fs = new POIFSFileSystem(new File(filename))) {
+ DirectoryEntry root = fs.getRoot();
+ DocumentEntry myDoc = (DocumentEntry)root.getEntry("ToChange");
+
+ // Replace the content with a Write
+ try (DocumentOutputStream docOut = new DocumentOutputStream(myDoc)) {
+ myDoc.writeTo(docOut);
+ }
+
+ // Save the changes to a new file
+ try (FileOutputStream out = new FileOutputStream("NewFile.ole2")) {
+ fs.write(out);
+ }
+ }
+ ]]></source>
+ <p>For an example of an in-place change to one stream within a file, you can see the example
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hpsf/ModifyDocumentSummaryInformation.java">
+ org/apache/poi/hpsf/examples/ModifyDocumentSummaryInformation.java
+ </a>
+ </p>
+ </section>
+ <section>
+ <title>Renaming a File</title>
+ <p>Regardless of whether the file is a directory or a document, it can be renamed, with one exception -
+ the root directory has a special name that is expected by the components of a major software
+ vendor's office suite, and the POIFS API will not let that name be changed. Renaming is done by
+ acquiring the file's corresponding <code>Entry</code> instance and calling its <code>renameTo</code> method,
+ passing in the new name.
+ </p>
+ <p>Like <code>delete</code>, <code>renameTo</code> returns <code>true</code> if the operation succeeded,
+ otherwise <code>false</code>. Reasons for failure include these:
+ </p>
+ <ul>
+ <li>The new name is the same as another file in the same directory. And don't forget - if the new
+ name is longer than 31 characters, it <em>will</em> be silently truncated. In its original
+ length, the new name may have been unique, but truncated to 31 characters, it may not be unique
+ any longer.
+ </li>
+ <li>You tried to rename the root directory.</li>
+ </ul>
+ </section>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/poifs/index.xml b/src/documentation/content/xdocs/components/poifs/index.xml
new file mode 100644
index 0000000000..986419d6f9
--- /dev/null
+++ b/src/documentation/content/xdocs/components/poifs/index.xml
@@ -0,0 +1,58 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+<document>
+ <header>
+ <title>Apache POI™ - POIFS - Java implementation of the OLE 2 Compound Document format</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Andrew C. Oliver" email="acoliver@apache.org"/>
+ <person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>Overview</title>
+ <p>POIFS is a pure Java implementation of the OLE 2 Compound
+ Document format.</p>
+ <p>By definition, all APIs developed by the POI project are
+ based somehow on the POIFS API.</p>
+ <p>A common confusion is on just what POIFS buys you or what OLE
+ 2 Compound Document format is exactly. POIFS does not buy you
+ DOC, or XLS, but is necessary to generate or read DOC or XLS
+ files. You see, all file formats based on the OLE 2 Compound
+ Document Format have a common structure. The OLE 2 Compound
+ Document Format is essentially a convoluted archive
+ format. Think of POIFS as a "zip" library. Once you can get
+ the data in a zip file you still need to interpret the
+ data. As a general rule, while all of our formats <em>use</em>
+ POIFS, most of them attempt to abstract you from it. There
+ are some circumstances where this is not possible, but as a
+ general rule this is true.</p>
+ <p>If you're an end user type just looking to generate XLS
+ files, then you'd be looking for HSSF not POIFS; however, if
+ you have legacy code that uses MFC property sets, POIFS is
+ for you! Regardless, you may or may not need to know how to
+ use POIFS but ultimately if you use technologies that come
+ from the POI project, you're using POIFS underneath. Perhaps
+ we should have a branding campaign "POIFS Inside!". ;-)</p>
+
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/poifs/usecases.xml b/src/documentation/content/xdocs/components/poifs/usecases.xml
new file mode 100644
index 0000000000..597f77ef4b
--- /dev/null
+++ b/src/documentation/content/xdocs/components/poifs/usecases.xml
@@ -0,0 +1,653 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+<document>
+ <header>
+ <title>POIFS Use Cases</title>
+ <authors>
+ <person email="mjohnson@apache.org" name="Marc Johnson" id="MJ"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>POIFS Use Cases</title>
+ <section><title>Use Case 1: Read existing file system</title>
+ <table>
+ <tr>
+ <td><em>Primary Actor:</em></td>
+ <td>POIFS client</td>
+ </tr>
+ <tr>
+ <td><em>Scope:</em></td>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <td><em>Level:</em></td>
+ <td>Summary</td>
+ </tr>
+ <tr>
+ <td><em>Stakeholders and Interests:</em></td>
+ <td>
+ POIFS client- wants to read content of file
+ system<br/>
+ POIFS - understands POIFS file system
+ </td>
+ </tr>
+ <tr>
+ <td><em>Precondition:</em></td>
+ <td>None</td>
+ </tr>
+ <tr>
+ <td><em>Minimal Guarantee:</em></td>
+ <td>None</td>
+ </tr>
+ <tr>
+ <td><em>Main Success Guarantee:</em></td>
+ <td>
+ 1. POIFS client requests POIFS to read a POIFS file
+ system, providing an
+ <code>InputStream</code>
+ containing POIFS file system in question.<br/>
+ 2. POIFS reads from the
+ <code>InputStream</code> in
+ 512 byte blocks.<br/>
+ 3. POIFS verifies that the first block begins with
+ the well known signature
+ (
+ <code>0xE11AB1A1E011CFD0</code>)<br/>
+ 4. POIFS reads the Block Allocation Table from the
+ first block and, if necessary, from the XBAT
+ blocks.<br/>
+ 5. POIFS obtains the start block of the Property
+ Table and reads the Property Table (use case 9,
+ read file)<br/>
+ 6. POIFS reads the individual entries in the Property
+ Table<br/>
+ 7. POIFS obtains the start block of the Small Block
+ Allocation Table and reads the Small Block
+ Allocation Table (use case 9, read file)<br/>
+ 8. POIFS obtains the start block of the Small Block
+ store from the first entry in the Property Table
+ and reads the Small Block Array (use case 9, read
+ file)<br/>
+ </td>
+ </tr>
+ <tr>
+ <td><em>Extensions:</em></td>
+ <td>
+ 2a. If the last block read is not a 512 byte
+ block, the
+ <code>InputStream</code> is not that of
+ a POIFS file system, and POIFS throws an
+ appropriate exception.
+ <br/>
+ 3a. If the signature is incorrect, the
+ <code>InputStream</code> is not that of a POIFS
+ file system, and POIFS throws an appropriate
+ exception.<br/>
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Use Case 2: Write file system</title>
+ <table>
+ <tr>
+ <th>Primary Actor:</th>
+ <th>POIFS client</th>
+ </tr>
+ <tr>
+ <th>Scope:</th>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <th>Level:</th>
+ <td>Summary</td>
+ </tr>
+ <tr>
+ <th>Stakeholders and Interests:</th>
+ <td>
+ POIFS client- wants to write file system out.<br/>
+ POIFS - knows how to write file system out.
+ </td>
+ </tr>
+ <tr>
+ <th>Precondition:</th>
+ <td>
+ File system has been read (use case 1, read
+ existing file system) and subsequently modified
+ (use case 4, replace file in file system; use case
+ 5, delete file from file system; or use case 6,
+ write new file to file system; in any
+ combination)
+ <br/>or<br/>
+ File system has been created (use case 3, create
+ new file system)
+ </td>
+ </tr>
+ <tr>
+ <th>Minimal Guarantee:</th>
+ <td>None</td>
+ </tr>
+ <tr>
+ <th>Main Success Guarantee:</th>
+ <td>
+ 1. POIFS client provides an
+ <code>OutputStream</code>
+ to write the file system to.
+ <br/>
+ 2. POIFS gets the sizes of the Property Table and
+ each file in the file system.<br/>
+ 3. If any files in the file system requires storage
+ in a Small Block Array, POIFS creates a Small
+ Block Array of sufficient size to hold all of the
+ small files.<br/>
+ 4. POIFS calculates the number of big blocks needed
+ to hold all of the large files, the Property
+ Table, and, if necessary, the Small Block Array
+ and the Small Block Allocation Table.<br/>
+ 5. POIFS creates a set of big blocks sufficient to
+ store the Block Allocation Table<br/>
+ 6. POIFS creates and writes the header block<br/>
+ 7. POIFS writes out the XBAT blocks, if needed.<br/>
+ 8. POIFS writes out the Small Block Array, if
+ needed<br/>
+ 9. POIFS writes out the Small Block Allocation Table,
+ if needed<br/>
+ 10. POIFS writes out the Property Table<br/>
+ 11. POIFS writes out the large files, if needed<br/>
+ 12. POIFS closes the <code>OutputStream</code>.
+ </td>
+ </tr>
+ <tr>
+ <th>Extensions:</th>
+ <td>
+ 6a. Exceptions writing to the
+ <code>OutputStream</code> will be propagated back
+ to the POIFS client.
+ <br/>
+ 7a. Exceptions writing to the
+ <code>OutputStream</code> will be propagated back
+ to the POIFS client.
+ <br/>
+ 8a. Exceptions writing to the
+ <code>OutputStream</code> will be propagated back
+ to the POIFS client.
+ <br/>
+ 9a. Exceptions writing to the
+ <code>OutputStream</code> will be propagated back
+ to the POIFS client.
+ <br/>
+ 10a. Exceptions writing to the
+ <code>OutputStream</code> will be propagated back
+ to the POIFS client.
+ <br/>
+ 11a. Exceptions writing to the
+ <code>OutputStream</code> will be propagated back
+ to the POIFS client.
+ <br/>
+ 12a. Exceptions closing the
+ <code>OutputStream</code> will be propagated back
+ to the POIFS client.
+ <br/>
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Use Case 3: Create new file system</title>
+ <table>
+ <tr>
+ <th>Primary Actor:</th>
+ <td>POIFS client</td>
+ </tr>
+ <tr>
+ <th>Scope:</th>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <th>Level:</th>
+ <td>Summary</td>
+ </tr>
+ <tr>
+ <th>Stakeholders and Interests:</th>
+ <td>
+ POIFS client- wants to create a new file
+ system<br/>
+ POIFS - knows how to create a new file system
+ </td>
+ </tr>
+ <tr>
+ <th>Precondition:</th>
+ <td>None</td>
+ </tr>
+ <tr>
+ <th>Minimal Guarantee:</th>
+ <td>None</td>
+ </tr>
+ <tr>
+ <th>Main Success Guarantee:</th>
+ <td>
+ POIFS creates an empty Property Table.
+ </td>
+ </tr>
+ <tr>
+ <th>Extensions:</th>
+ <td>None</td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Use Case 4: Replace file in file system</title>
+ <table>
+ <tr>
+ <td><em>Primary Actor:</em></td>
+ <td>POIFS client</td>
+ </tr>
+ <tr>
+ <td><em>Scope:</em></td>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <td><em>Level:</em></td>
+ <td>Summary</td>
+ </tr>
+ <tr>
+ <td><em>Stakeholders and Interests:</em></td>
+ <td>
+ 1. POIFS client- wants to replace an existing file in
+ the file system<br/>
+ 2. POIFS - knows how to manage the file system
+ </td>
+ </tr>
+ <tr>
+ <td><em>Precondition:</em></td>
+ <td>
+ Either
+ <br/><br/>
+ The file system has been read (use case 1, read
+ existing file system) and a file has been
+ extracted from the file system (use case 7, read
+ existing file from file system)
+ <br/><br/>or<br/><br/>
+ The file system has been created (use case 3,
+ create new file system) and a file has been
+ written to the file system (use case 6, write new
+ file to file system)
+ </td>
+ </tr>
+ <tr>
+ <td><em>Minimal Guarantee:</em></td>
+ <td>None</td>
+ </tr>
+ <tr>
+ <td><em>Main Success Guarantee:</em></td>
+ <td>
+ 1. POIFS discards storage of the existing file.<br/>
+ 2. POIFS updates the existing file's entry in the
+ Property Table<br/>
+ 3. POIFS stores the new file's data
+ </td>
+ </tr>
+ <tr>
+ <td><em>Extensions:</em></td>
+ <td>
+ 1a. POIFS throws an exception if the file does not
+ exist.
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Use Case 5: Delete file from file system</title>
+ <table>
+ <tr>
+ <td><em>Primary Actor:</em></td>
+ <td>POIFS client</td>
+ </tr>
+ <tr>
+ <td><em>Scope:</em></td>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <td><em>Level:</em></td>
+ <td>Summary</td>
+ </tr>
+ <tr>
+ <td><em>Stakeholders and Interests:</em></td>
+ <td>
+ * POIFS client- wants to remove a file from a file
+ system<br/>
+ * POIFS - knows how to manage the file system
+ </td>
+ </tr>
+ <tr>
+ <td><em>Precondition:</em></td>
+ <td>
+ Either<br/><br/>
+ The file system has been read (use case 1, read
+ existing file system) and a file has been
+ extracted from the file system (use case 7, read
+ existing file from file system)<br/>
+ <br/>
+ or<br/>
+ <br/>
+ The file system has been created (use case 3,
+ create new file system) and a file has been
+ written to the file system (use case 6, write new
+ file to file system)
+ </td>
+ </tr>
+ <tr>
+ <td><em>Minimal Guarantee:</em></td>
+ <td>None</td>
+ </tr>
+ <tr>
+ <td><em>Main Success Guarantee:</em></td>
+ <td>
+ 1. POIFS discards the specified file's storage.<br/>
+ 2. POIFS discards the file's Property Table
+ entry.
+ </td>
+ </tr>
+ <tr>
+ <td><em>Extensions:</em></td>
+ <td>
+ 1a. POIFS throws an exception if the file does not
+ exist.
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Use Case 6: Write new file to file system</title>
+ <table>
+ <tr>
+ <td><em>Primary Actor:</em></td>
+ <td>POIFS client</td>
+ </tr>
+ <tr>
+ <td><em>Scope:</em></td>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <td><em>Level:</em></td>
+ <td>Summary</td>
+ </tr>
+ <tr>
+ <td><em>Stakeholders and Interests:</em></td>
+ <td>
+ * POIFS client- wants to add a new file to the file
+ system<br/>
+ * POIFS - knows how to manage the file system
+ </td>
+ </tr>
+ <tr>
+ <td><em>Precondition:</em></td>
+ <td>The specified file does not yet exist in the file
+ system</td>
+ </tr>
+ <tr>
+ <td><em>Minimal Guarantee:</em></td>
+ <td>None</td>
+ </tr>
+ <tr>
+ <td><em>Main Success Guarantee:</em></td>
+ <td>
+ 1. The POIFS client provides a file name<br/>
+ 2. POIFS creates a new Property Table entry for the
+ new file<br/>
+ 3. POIFS provides the POIFS client with an
+ <code>OutputStream</code> to write to.<br/>
+ 4. The POIFS client writes data to the provided
+ <code>OutputStream</code>.<br/>
+ 5. The POIFS client closes the provided
+ <code>OutputStream</code><br/>
+ 6. POIFS updates the Property Table entry with the
+ new file's size
+ </td>
+ </tr>
+ <tr>
+ <td><em>Extensions:</em></td>
+ <td>
+ 1a. POIFS throws an exception if a file with the
+ specified name already exists in the file
+ system.<br/>
+ 1b. POIFS throws an exception if the file name is
+ too long. The limit on file name length is 31
+ characters.
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Use Case 7: Read existing file from file system</title>
+ <table>
+ <tr>
+ <td><em>Primary Actor:</em></td>
+ <td>POIFS client</td>
+ </tr>
+ <tr>
+ <td><em>Scope:</em></td>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <td><em>Level:</em></td>
+ <td>Summary</td>
+ </tr>
+ <tr>
+ <td><em>Stakeholders and Interests:</em></td>
+ <td>
+ * POIFS client- wants to read a file from the file
+ system<br/>
+ * POIFS - knows how to manage the file system
+ </td>
+ </tr>
+ <tr>
+ <td><em>Precondition:</em></td>
+ <td>
+ * The file system has been read (use case 1, read
+ existing file system) or has been created and
+ written to (use case 3, create new file system;
+ use case 6, write new file to file system).<br/>
+ * The specified file exists in the file system.
+ </td>
+ </tr>
+ <tr>
+ <td><em>Minimal Guarantee:</em></td>
+ <td>None</td>
+ </tr>
+ <tr>
+ <td><em>Main Success Guarantee:</em></td>
+ <td>
+ * The POIFS client provides the name of a file to be read <br/>
+ * POIFS provides an <code>InputStream</code> to read from. <br/>
+ * The POIFS client reads from the <code>InputStream</code>.<br/>
+ * The POIFS client closes the <code>InputStream</code>.
+ </td>
+ </tr>
+ <tr>
+ <td><em>Extensions:</em></td>
+ <td>1a. POIFS throws an exception if no file with the
+ specified name exists.</td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Use Case 8: Read file system directory</title>
+ <table>
+ <tr>
+ <td><em>Primary Actor:</em></td>
+ <td>POIFS client</td>
+ </tr>
+ <tr>
+ <td><em>Scope:</em></td>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <td><em>Level:</em></td>
+ <td>Summary</td>
+ </tr>
+ <tr>
+ <td><em>Stakeholders and Interests:</em></td>
+ <td>
+ * POIFS client- wants to know what files exist in
+ the file system<br/>
+ * POIFS - knows how to manage the file system
+ </td>
+ </tr>
+ <tr>
+ <td><em>Precondition:</em></td>
+ <td>The file system has been read (use case 1, read
+ existing file system) or created (use case 3, create
+ new file system)</td>
+ </tr>
+ <tr>
+ <td><em>Minimal Guarantee:</em></td>
+ <td>None</td>
+ </tr>
+ <tr>
+ <td><em>Main Success Guarantee:</em></td>
+ <td>
+ 1. The POIFS client requests the file system
+ directory.
+ 2. POIFS returns an <code>Iterator</code>. The
+ <code>Iterator</code> will not include the root
+ entry in the Property Table, and may be an
+ <code>Iterator</code> over an empty
+ <code>Collection</code>.
+ </td>
+ </tr>
+ <tr>
+ <td><em>Extensions:</em></td>
+ <td>None</td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Use Case 9: Read file</title>
+ <table>
+ <tr>
+ <td><em>Primary Actor:</em></td>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <td><em>Scope:</em></td>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <td><em>Level:</em></td>
+ <td>Summary</td>
+ </tr>
+ <tr>
+ <td><em>Stakeholders and Interests:</em></td>
+ <td>
+ POIFS - POIFS needs to read a file, or something
+ resembling a file (i.e., the Property Table, the
+ Small Block Array, or the Small Block Allocation
+ Table)
+ </td>
+ </tr>
+ <tr>
+ <td><em>Precondition:</em></td>
+ <td>None</td>
+ </tr>
+ <tr>
+ <td><em>Minimal Guarantee:</em></td>
+ <td>None</td>
+ </tr>
+ <tr>
+ <td><em>Main Success Guarantee:</em></td>
+ <td>
+ 1. POIFS begins with a start block, a file size, and
+ a flag indicating whether to use the Big Block
+ Allocation Table or the Small Block Allocation
+ Table<br/>
+ 2. POIFS returns an <code>InputStream</code>.<br/>
+ 3. Reads from the <code>InputStream</code> are
+ performed by walking the specified Block
+ Allocation Table and reading the blocks
+ indicated.<br/>
+ 4. POIFS closes the <code>InputStream</code> when
+ finished reading the file, or its client wants to
+ close the <code>InputStream</code>.
+ </td>
+ </tr>
+ <tr>
+ <td><em>Extensions:</em></td>
+ <td>3a. An exception will be thrown if the specified Block
+ Allocation Table is corrupt, as evidenced by an index
+ pointing to a non-existent block, or by a chain
+ extending past the known size of the file.</td>
+ </tr>
+ </table>
+ </section>
+ <section><title>Use Case 10: Rename existing file in the file system</title>
+ <table>
+ <tr>
+ <td><em>Primary Actor:</em></td>
+ <td>POIFS client</td>
+ </tr>
+ <tr>
+ <td><em>Scope:</em></td>
+ <td>POIFS</td>
+ </tr>
+ <tr>
+ <td><em>Level:</em></td>
+ <td>Summary</td>
+ </tr>
+ <tr>
+ <td><em>Stakeholders and Interests:</em></td>
+ <td>
+ * POIFS client- wants to rename an existing file in
+ the file system.<br/>
+ * POIFS - knows how to manage the file system.
+ </td>
+ </tr>
+ <tr>
+ <td><em>Precondition:</em></td>
+ <td>
+ * The file system is has been read (use case 1, read
+ existing file system) or has been created and
+ written to (use case 3, create new file system;
+ use case 6, write new file to file system.<br/>
+ * The specified file exists in the file system.<br/>
+ * The new name for the file does not duplicate
+ another file in the file system.
+ </td>
+ </tr>
+ <tr>
+ <td><em>Minimal Guarantee:</em></td>
+ <td>None</td>
+ </tr>
+ <tr>
+ <td><em>Main Success Guarantee:</em></td>
+ <td>
+ 1. POIFS updates the Property Table entry for the
+ specified file with its new name.
+ </td>
+ </tr>
+ <tr>
+ <td><em>Extensions:</em></td>
+ <td>
+ * 1a. If the old file name is not in the file
+ system, POIFS throws an exception.<br/>
+ * 1b. If the new file name already exists in the
+ file system, POIFS throws an exception.<br/>
+ * 1c. If the new file name is too long (the limit is
+ 31 characters), POIFS throws an exception.
+ </td>
+ </tr>
+ </table>
+ </section>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/slideshow/how-to-shapes.xml b/src/documentation/content/xdocs/components/slideshow/how-to-shapes.xml
new file mode 100644
index 0000000000..f1183c357d
--- /dev/null
+++ b/src/documentation/content/xdocs/components/slideshow/how-to-shapes.xml
@@ -0,0 +1,642 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Busy Developers' Guide to HSLF drawing layer</title>
+ <authors>
+ <person email="yegor@dinom.ru" name="Yegor Kozlov" id="CO"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>Busy Developers' Guide to HSLF drawing layer</title>
+ <section><title>Index of Features</title>
+ <ul>
+ <li><a href="#NewPresentation">How to create a new presentation and add new slides to it</a></li>
+ <li><a href="#PageSize">How to retrieve or change slide size</a></li>
+ <li><a href="#GetShapes">How to get shapes contained in a particular slide</a></li>
+ <li><a href="#Shapes">Drawing a shape on a slide</a></li>
+ <li><a href="#Pictures">How to work with pictures</a></li>
+ <li><a href="#SlideTitle">How to set slide title</a></li>
+ <li><a href="#Fill">How to work with slide/shape background</a></li>
+ <li><a href="#Bullets">How to create bulleted lists</a></li>
+ <li><a href="#Hyperlinks">Hyperlinks</a></li>
+ <li><a href="#Tables">Tables</a></li>
+ <li><a href="#RemoveShape">How to remove shapes</a></li>
+ <li><a href="#OLE">How to retrieve embedded OLE objects</a></li>
+ <li><a href="#Sound">How to retrieve embedded sounds</a></li>
+ <li><a href="#Freeform">How to create shapes of arbitrary geometry</a></li>
+ <li><a href="#Graphics2D">Shapes and Graphics2D</a></li>
+ <li><a href="#Render">How to convert slides into images</a></li>
+ <li><a href="#HeadersFooters">Headers / Footers</a></li>
+ </ul>
+ </section>
+ <section><title>Features</title>
+ <anchor id="NewPresentation"/>
+ <section><title>New Presentation</title>
+ <source>
+ //create a new empty slide show
+ HSLFSlideShow ppt = new HSLFSlideShow();
+
+ //add first slide
+ HSLFSlide s1 = ppt.createSlide();
+
+ //add second slide
+ HSLFSlide s2 = ppt.createSlide();
+
+ //save changes in a file
+ FileOutputStream out = new FileOutputStream("slideshow.ppt");
+ ppt.write(out);
+ out.close();
+ </source>
+ </section>
+ <anchor id="PageSize"/>
+ <section><title>How to retrieve or change slide size</title>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow(new HSLFSlideShowImpl("slideshow.ppt"));
+ //retrieve page size. Coordinates are expressed in points (72 dpi)
+ java.awt.Dimension pgsize = ppt.getPageSize();
+ int pgx = pgsize.width; //slide width
+ int pgy = pgsize.height; //slide height
+
+ //set new page size
+ ppt.setPageSize(new java.awt.Dimension(1024, 768));
+ //save changes
+ FileOutputStream out = new FileOutputStream("slideshow.ppt");
+ ppt.write(out);
+ out.close();
+ </source>
+ </section>
+ <anchor id="GetShapes"/>
+ <section><title>How to get shapes contained in a particular slide</title>
+ <p>
+ The following code demonstrates how to iterate over shapes for each slide.
+ </p>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow(new HSLFSlideShowImpl("slideshow.ppt"));
+ // get slides
+ for (HSLFSlide slide : ppt.getSlides()) {
+ for (HSLFShape sh : slide.getShapes()) {
+ // name of the shape
+ String name = sh.getShapeName();
+
+ // shapes's anchor which defines the position of this shape in the slide
+ java.awt.Rectangle anchor = sh.getAnchor();
+
+ if (sh instanceof Line) {
+ Line line = (Line) sh;
+ // work with Line
+ } else if (sh instanceof HSLFAutoShape) {
+ HSLFAutoShape shape = (HSLFAutoShape) sh;
+ // work with AutoShape
+ } else if (sh instanceof HSLFTextBox) {
+ HSLFTextBox shape = (HSLFTextBox) sh;
+ // work with TextBox
+ } else if (sh instanceof HSLFPictureShape) {
+ HSLFPictureShape shape = (HSLFPictureShape) sh;
+ // work with Picture
+ }
+ }
+ }
+ </source>
+ </section>
+ <anchor id="Shapes"/>
+ <section><title>Drawing a shape on a slide</title>
+ <warning>
+ To work with graphic objects HSLF uses Java2D classes
+ that may throw exceptions if graphical environment is not available. In case if graphical environment
+ is not available, you must tell Java that you are running in headless mode and
+ set the following system property: <code> java.awt.headless=true </code>
+ (either via <code>-Djava.awt.headless=true</code> startup parameter or via <code>System.setProperty("java.awt.headless", "true")</code>).
+ </warning>
+ <p>
+ When you add a shape, you usually specify the dimensions of the shape and the position
+ of the upper left corner of the bounding box for the shape relative to the upper left
+ corner of the slide. Distances in the drawing layer are measured in points (72 points = 1 inch).
+ </p>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow();
+
+ HSLFSlide slide = ppt.createSlide();
+
+ //Line shape
+ Line line = new Line();
+ line.setAnchor(new java.awt.Rectangle(50, 50, 100, 20));
+ line.setLineColor(new Color(0, 128, 0));
+ line.setLineCompound(LineCompound.DOUBLE);
+ slide.addShape(line);
+
+ //TextBox
+ HSLFTextBox txt = new HSLFTextBox();
+ txt.setText("Hello, World!");
+ txt.setAnchor(new java.awt.Rectangle(300, 100, 300, 50));
+
+ // use TextRun to work with the text format
+ HSLFTextParagraph tp = txt.getTextParagraphs().get(0);
+ tp.setAlignment(TextAlign.RIGHT);
+ HSLFTextRun rt = tp.getTextRuns().get(0);
+ rt.setFontSize(32.);
+ rt.setFontFamily("Arial");
+ rt.setBold(true);
+ rt.setItalic(true);
+ rt.setUnderlined(true);
+ rt.setFontColor(Color.red);
+
+ slide.addShape(txt);
+
+ // Autoshape
+ // 32-point star
+ HSLFAutoShape sh1 = new HSLFAutoShape(ShapeType.STAR_32);
+ sh1.setAnchor(new java.awt.Rectangle(50, 50, 100, 200));
+ sh1.setFillColor(Color.red);
+ slide.addShape(sh1);
+
+ //Trapezoid
+ HSLFAutoShape sh2 = new HSLFAutoShape(ShapeType.TRAPEZOID);
+ sh2.setAnchor(new java.awt.Rectangle(150, 150, 100, 200));
+ sh2.setFillColor(Color.blue);
+ slide.addShape(sh2);
+
+ FileOutputStream out = new FileOutputStream("slideshow.ppt");
+ ppt.write(out);
+ out.close();
+ </source>
+ </section>
+ <anchor id="Pictures"/>
+ <section><title>How to work with pictures</title>
+
+ <p>
+ Currently, HSLF API supports the following types of pictures:
+ </p>
+ <ul>
+ <li>Windows Metafiles (WMF)</li>
+ <li>Enhanced Metafiles (EMF)</li>
+ <li>JPEG Interchange Format</li>
+ <li>Portable Network Graphics (PNG)</li>
+ <li>Macintosh PICT</li>
+ </ul>
+
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow(new HSLFSlideShowImpl("slideshow.ppt"));
+
+ // extract all pictures contained in the presentation
+ int idx = 1;
+ for (HSLFPictureData pict : ppt.getPictureData()) {
+ // picture data
+ byte[] data = pict.getData();
+
+ PictureData.PictureType type = pict.getType();
+ String ext = type.extension;
+ FileOutputStream out = new FileOutputStream("pict_" + idx + ext);
+ out.write(data);
+ out.close();
+ idx++;
+ }
+
+ // add a new picture to this slideshow and insert it in a new slide
+ HSLFPictureData pd = ppt.addPicture(new File("clock.jpg"), PictureData.PictureType.JPEG);
+
+ HSLFPictureShape pictNew = new HSLFPictureShape(pd);
+
+ // set image position in the slide
+ pictNew.setAnchor(new java.awt.Rectangle(100, 100, 300, 200));
+
+ HSLFSlide slide = ppt.createSlide();
+ slide.addShape(pictNew);
+
+ // now retrieve pictures containes in the first slide and save them on disk
+ idx = 1;
+ slide = ppt.getSlides().get(0);
+ for (HSLFShape sh : slide.getShapes()) {
+ if (sh instanceof HSLFPictureShape) {
+ HSLFPictureShape pict = (HSLFPictureShape) sh;
+ HSLFPictureData pictData = pict.getPictureData();
+ byte[] data = pictData.getData();
+ PictureData.PictureType type = pictData.getType();
+ FileOutputStream out = new FileOutputStream("slide0_" + idx + type.extension);
+ out.write(data);
+ out.close();
+ idx++;
+ }
+ }
+
+ FileOutputStream out = new FileOutputStream("slideshow.ppt");
+ ppt.write(out);
+ out.close();
+ </source>
+ </section>
+ <anchor id="SlideTitle"/>
+ <section><title>How to set slide title</title>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow();
+ HSLFSlide slide = ppt.createSlide();
+ HSLFTextBox title = slide.addTitle();
+ title.setText("Hello, World!");
+
+ // save changes
+ FileOutputStream out = new FileOutputStream("slideshow.ppt");
+ ppt.write(out);
+ out.close();
+ </source>
+ <p>
+ Below is the equivalent code in PowerPoint VBA:
+ </p>
+ <source>
+ Set myDocument = ActivePresentation.Slides(1)
+ myDocument.Shapes.AddTitle.TextFrame.TextRange.Text = "Hello, World!"
+ </source>
+ </section>
+ <anchor id="Fill"/>
+ <section><title>How to modify background of a slide master</title>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow();
+ HSLFSlideMaster master = ppt.getSlideMasters().get(0);
+
+ HSLFFill fill = master.getBackground().getFill();
+ HSLFPictureData pd = ppt.addPicture(new File("background.png"), PictureData.PictureType.PNG);
+ fill.setFillType(HSLFFill.FILL_PICTURE);
+ fill.setPictureData(pd);
+ </source>
+ </section>
+ <section><title>How to modify background of a slide</title>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow();
+ HSLFSlide slide = ppt.createSlide();
+
+ // This slide has its own background.
+ // Without this line it will use master's background.
+ slide.setFollowMasterBackground(false);
+ HSLFFill fill = slide.getBackground().getFill();
+ HSLFPictureData pd = ppt.addPicture(new File("background.png"), PictureData.PictureType.PNG);
+ fill.setFillType(HSLFFill.FILL_PATTERN);
+ fill.setPictureData(pd);
+ </source>
+ </section>
+ <section><title>How to modify background of a shape</title>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow();
+ HSLFSlide slide = ppt.createSlide();
+
+ HSLFShape shape = new HSLFAutoShape(ShapeType.RECT);
+ shape.setAnchor(new java.awt.Rectangle(100, 100, 200, 200));
+ HSLFFill fill = shape.getFill();
+ fill.setFillType(HSLFFill.FILL_SHADE);
+ fill.setBackgroundColor(Color.red);
+ fill.setForegroundColor(Color.green);
+
+ slide.addShape(shape);
+ </source>
+ </section>
+ <anchor id="Bullets"/>
+ <section><title>How to create bulleted lists</title>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow();
+
+ HSLFSlide slide = ppt.createSlide();
+
+ HSLFTextBox shape = new HSLFTextBox();
+ HSLFTextParagraph tp = shape.getTextParagraphs().get(0);
+ tp.setBullet(true);
+ tp.setBulletChar('\u263A'); //bullet character
+ tp.setIndent(0.); //bullet offset
+ tp.setLeftMargin(50.); //text offset (should be greater than bullet offset)
+ HSLFTextRun rt = tp.getTextRuns().get(0);
+ shape.setText(
+ "January\r" +
+ "February\r" +
+ "March\r" +
+ "April");
+ rt.setFontSize(42.);
+ slide.addShape(shape);
+
+ shape.setAnchor(new java.awt.Rectangle(50, 50, 500, 300)); //position of the text box in the slide
+ slide.addShape(shape);
+
+ FileOutputStream out = new FileOutputStream("bullets.ppt");
+ ppt.write(out);
+ out.close();
+ </source>
+ </section>
+ <anchor id="Hyperlinks"/>
+ <section><title>How to read hyperlinks from a slide show</title>
+ <source>
+ FileInputStream is = new FileInputStream("slideshow.ppt");
+ HSLFSlideShow ppt = new HSLFSlideShow(is);
+ is.close();
+
+ for (HSLFSlide slide : ppt.getSlides()) {
+ //read hyperlinks from the text runs
+ for (List&lt;HSLFTextParagraph&gt; txt : slide.getTextParagraphs()) {
+ for (HSLFTextParagraph para : txt) {
+ for (HSLFTextRun run : para) {
+ HSLFHyperlink link = run.getHyperlink();
+ if (link != null) {
+ String title = link.getLabel();
+ String address = link.getAddress();
+ String text = run.getRawText();
+ }
+ }
+ }
+ }
+
+ //in PowerPoint you can assign a hyperlink to a shape without text,
+ //for example to a Line object. The code below demonstrates how to
+ //read such hyperlinks
+ for (HSLFShape sh : slide.getShapes()) {
+ if (sh instanceof HSLFSimpleShape) {
+ HSLFHyperlink link = ((HSLFSimpleShape)sh).getHyperlink();
+ if(link != null) {
+ String title = link.getLabel();
+ String address = link.getAddress();
+ }
+ }
+ }
+ }
+ </source>
+ </section>
+ <anchor id="Tables"/>
+ <section><title>How to create tables</title>
+ <source>
+ //table data
+ String[][] data = {
+ {"INPUT FILE", "NUMBER OF RECORDS"},
+ {"Item File", "11,559"},
+ {"Vendor File", "300"},
+ {"Purchase History File", "10,000"},
+ {"Total # of requisitions", "10,200,038"}
+ };
+
+ HSLFSlideShow ppt = new HSLFSlideShow();
+
+ HSLFSlide slide = ppt.createSlide();
+ //create a table of 5 rows and 2 columns
+ HSLFTable table = new HSLFTable(5, 2);
+ for (int i = 0; i &lt; data.length; i++) {
+ for (int j = 0; j &lt; data[i].length; j++) {
+ HSLFTableCell cell = table.getCell(i, j);
+ cell.setText(data[i][j]);
+
+ HSLFTextRun rt = cell.getTextParagraphs().get(0).getTextRuns().get(0);
+ rt.setFontFamily("Arial");
+ rt.setFontSize(10.);
+
+ cell.setVerticalAlignment(VerticalAlignment.MIDDLE);
+ cell.setHorizontalCentered(true);
+ }
+ }
+
+ //set table borders
+ Line border = table.createBorder();
+ border.setLineColor(Color.black);
+ border.setLineWidth(1.0);
+ table.setAllBorders(border);
+
+ //set width of the 1st column
+ table.setColumnWidth(0, 300);
+ //set width of the 2nd column
+ table.setColumnWidth(1, 150);
+
+ slide.addShape(table);
+ table.moveTo(100, 100);
+
+ FileOutputStream out = new FileOutputStream("hslf-table.ppt");
+ ppt.write(out);
+ out.close();
+ </source>
+ </section>
+
+ <anchor id="RemoveShape"/>
+ <section><title>How to remove shapes from a slide</title>
+ <source>
+ for (HSLFShape shape : slide.getShapes()) {
+ // remove the shape
+ boolean ok = slide.removeShape(shape);
+ if (ok) {
+ // the shape was removed. Do something.
+ }
+ }
+ </source>
+ </section>
+ <anchor id="OLE"/>
+ <section><title>How to retrieve embedded OLE objects</title>
+ <source>
+ for (HSLFShape shape : slide.getShapes()) {
+ if (shape instanceof OLEShape) {
+ OLEShape ole = (OLEShape) shape;
+ HSLFObjectData data = ole.getObjectData();
+ String name = ole.getInstanceName();
+ if ("Worksheet".equals(name)) {
+ HSSFWorkbook wb = new HSSFWorkbook(data.getData());
+ } else if ("Document".equals(name)) {
+ HWPFDocument doc = new HWPFDocument(data.getData());
+ }
+ }
+ }
+ </source>
+ </section>
+
+ <anchor id="Sound"/>
+ <section><title>How to retrieve embedded sounds</title>
+ <source>
+ FileInputStream is = new FileInputStream(args[0]);
+ HSLFSlideShow ppt = new HSLFSlideShow(is);
+ is.close();
+
+ for (HSLFSoundData sound : ppt.getSoundData()) {
+ // save *WAV sounds on disk
+ if (sound.getSoundType().equals(".WAV")) {
+ FileOutputStream out = new FileOutputStream(sound.getSoundName());
+ out.write(sound.getData());
+ out.close();
+ }
+ }
+ </source>
+ </section>
+
+ <anchor id="Freeform"/>
+ <section><title>How to create shapes of arbitrary geometry</title>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow();
+ HSLFSlide slide = ppt.createSlide();
+
+ java.awt.geom.GeneralPath path = new java.awt.geom.GeneralPath();
+ path.moveTo(100, 100);
+ path.lineTo(200, 100);
+ path.curveTo(50, 45, 134, 22, 78, 133);
+ path.curveTo(10, 45, 134, 56, 78, 100);
+ path.lineTo(100, 200);
+ path.closePath();
+
+ HSLFFreeformShape shape = new HSLFFreeformShape();
+ shape.setPath(path);
+ slide.addShape(shape);
+ </source>
+ </section>
+
+ <anchor id="Graphics2D"/>
+ <section><title>How to draw into a slide using Graphics2D</title>
+ <warning>
+ Current implementation of the PowerPoint Graphics2D driver is not fully compliant with the java.awt.Graphics2D specification.
+ Some features like clipping, drawing of images are not yet supported.
+ </warning>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow();
+ HSLFSlide slide = ppt.createSlide();
+
+ // draw a simple bar graph
+ // bar chart data.
+ // The first value is the bar color,
+ // the second is the width
+ Object[] def = new Object[]{
+ Color.yellow, new Integer(100),
+ Color.green, new Integer(150),
+ Color.gray, new Integer(75),
+ Color.red, new Integer(200),
+ };
+
+ // all objects are drawn into a shape group so we need to create one
+
+ HSLFGroupShape group = new HSLFGroupShape();
+ // define position of the drawing in the slide
+ Rectangle bounds = new java.awt.Rectangle(200, 100, 350, 300);
+ // if you want to draw in the entire slide area then define the anchor
+ // as follows:
+ // Dimension pgsize = ppt.getPageSize();
+ // java.awt.Rectangle bounds = new java.awt.Rectangle(0, 0,
+ // pgsize.width, pgsize.height);
+
+ group.setAnchor(bounds);
+ slide.addShape(group);
+
+ // draw a simple bar chart
+ Graphics2D graphics = new PPGraphics2D(group);
+ int x = bounds.x + 50, y = bounds.y + 50;
+ graphics.setFont(new Font("Arial", Font.BOLD, 10));
+ for (int i = 0, idx = 1; i &lt; def.length; i += 2, idx++) {
+ graphics.setColor(Color.black);
+ int width = ((Integer) def[i + 1]).intValue();
+ graphics.drawString("Q" + idx, x - 20, y + 20);
+ graphics.drawString(width + "%", x + width + 10, y + 20);
+ graphics.setColor((Color) def[i]);
+ graphics.fill(new Rectangle(x, y, width, 30));
+ y += 40;
+ }
+ graphics.setColor(Color.black);
+ graphics.setFont(new Font("Arial", Font.BOLD, 14));
+ graphics.draw(bounds);
+ graphics.drawString("Performance", x + 70, y + 40);
+
+ FileOutputStream out = new FileOutputStream("hslf-graphics2d.ppt");
+ ppt.write(out);
+ out.close();
+ </source>
+ </section>
+
+ <anchor id="Render"/>
+ <section><title>Export PowerPoint slides into java.awt.Graphics2D</title>
+ <p>
+ HSLF provides a way to export slides into images. You can capture slides into java.awt.Graphics2D object (or any other)
+ and serialize it into a PNG or JPEG format. Please note, although HSLF attempts to render slides as close to PowerPoint as possible,
+ the output may look differently from PowerPoint due to the following reasons:
+ </p>
+ <ul>
+ <li>Java2D renders fonts differently vs PowerPoint. There are always some differences in the way the font glyphs are painted</li>
+ <li>HSLF uses java.awt.font.LineBreakMeasurer to break text into lines. PowerPoint may do it in a different way.</li>
+ <li>If a font from the presentation is not available, then the JDK default font will be used.</li>
+ </ul>
+ <p>
+ Current Limitations:
+ </p>
+ <ul>
+ <li>Some types of shapes are not yet supported (WordArt, complex auto-shapes)</li>
+ <li>Only Bitmap images (PNG, JPEG, DIB) can be rendered in Java</li>
+ </ul>
+ <source>
+ FileInputStream is = new FileInputStream("slideshow.ppt");
+ HSLFSlideShow ppt = new HSLFSlideShow(is);
+ is.close();
+
+ Dimension pgsize = ppt.getPageSize();
+
+ int idx = 1;
+ for (HSLFSlide slide : ppt.getSlides()) {
+
+ BufferedImage img = new BufferedImage(pgsize.width, pgsize.height, BufferedImage.TYPE_INT_RGB);
+ Graphics2D graphics = img.createGraphics();
+ // clear the drawing area
+ graphics.setPaint(Color.white);
+ graphics.fill(new Rectangle2D.Float(0, 0, pgsize.width, pgsize.height));
+
+ // render
+ slide.draw(graphics);
+
+ // save the output
+ FileOutputStream out = new FileOutputStream("slide-" + idx + ".png");
+ javax.imageio.ImageIO.write(img, "png", out);
+ out.close();
+
+ idx++;
+ }
+ </source>
+ </section>
+
+ </section>
+ <anchor id="HeadersFooters"/>
+ <section><title>How to extract Headers / Footers from an existing presentation</title>
+ <source>
+ FileInputStream is = new FileInputStream("slideshow.ppt");
+ HSLFSlideShow ppt = new HSLFSlideShow(is);
+ is.close();
+
+ // presentation-scope headers / footers
+ HeadersFooters hdd = ppt.getSlideHeadersFooters();
+ if (hdd.isFooterVisible()) {
+ String footerText = hdd.getFooterText();
+ }
+
+ // per-slide headers / footers
+ for (HSLFSlide slide : ppt.getSlides()) {
+ HeadersFooters hdd2 = slide.getHeadersFooters();
+ if (hdd2.isFooterVisible()) {
+ String footerText = hdd2.getFooterText();
+ }
+ if (hdd2.isUserDateVisible()) {
+ String customDate = hdd2.getDateTimeText();
+ }
+ if (hdd2.isSlideNumberVisible()) {
+ int slideNUm = slide.getSlideNumber();
+ }
+ }
+ </source>
+ </section>
+ <section><title>How to set Headers / Footers</title>
+ <source>
+ HSLFSlideShow ppt = new HSLFSlideShow();
+
+ // presentation-scope headers / footers
+ HeadersFooters hdd = ppt.getSlideHeadersFooters();
+ hdd.setSlideNumberVisible(true);
+ hdd.setFootersText("Created by POI-HSLF");
+ </source>
+ </section>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/slideshow/index.xml b/src/documentation/content/xdocs/components/slideshow/index.xml
new file mode 100644
index 0000000000..b963d928a7
--- /dev/null
+++ b/src/documentation/content/xdocs/components/slideshow/index.xml
@@ -0,0 +1,72 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-HSLF and and POI-XLSF - Java API To Access Microsoft Powerpoint Format Files</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Avik Sengupta" email="avik at apache dot org"/>
+ <person name="Nick Burch" email="nick at apache dot org"/>
+ <person name="Yegor Kozlov" email="yegor at apache dot org"/>
+ </authors>
+ </header>
+
+ <body>
+ <section>
+ <title>POI-HSLF</title>
+
+ <p>HSLF is the POI Project's pure Java implementation of the Powerpoint '97(-2007) file format. </p>
+ <p>HSLF provides a way to read, create or modify PowerPoint presentations. In particular, it provides:
+ </p>
+ <ul>
+ <li>api for data extraction (text, pictures, embedded objects, sounds)</li>
+ <li>usermodel api for creating, reading and modifying ppt files</li>
+ </ul>
+ <note>
+ This code currently lives the
+ <a href="https://github.com/apache/poi/tree/trunk/poi-scratchpad/">scratchpad area</a>
+ of the POI Git repository. To use this component, ensure
+ you have the Scratchpad Jar on your classpath, or a dependency
+ defined on the <em>poi-scratchpad</em> artifact - the main POI
+ jar is not enough! See the
+ <a href="site:components">POI Components Map</a>
+ for more details.
+ </note>
+ <p>The <a href="./quick-guide.html">quick guide</a> documentation provides
+ information on using this API. Comments and fixes gratefully accepted on the POI
+ dev mailing lists.</p>
+ </section>
+ <section>
+ <title>POI-XSLF</title>
+ <p>
+ XSLF is the POI Project's pure Java implementation of the PowerPoint 2007 OOXML (.xlsx) file format.
+ Whilst HSLF and XSLF provide similar features, there is not a common interface across the two of them at this time.
+ </p>
+ <p>
+ Please note that XSLF is still in early development and is a subject to incompatible changes in future.
+ </p>
+ <p>
+ A quick guide is available in the <a href="./xslf-cookbook.html">XSLF Cookbook</a>
+ </p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/slideshow/ppt-file-format.xml b/src/documentation/content/xdocs/components/slideshow/ppt-file-format.xml
new file mode 100644
index 0000000000..202df1d436
--- /dev/null
+++ b/src/documentation/content/xdocs/components/slideshow/ppt-file-format.xml
@@ -0,0 +1,367 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-HSLF - A Guide to the PowerPoint File Format</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nick Burch" email="nick at torchbox dot com"/>
+ <person name="Yegor Kozlov" email="yegor at dinom dot ru"/>
+ </authors>
+ </header>
+
+ <body>
+ <section><title>Records, Containers and Atoms</title>
+ <p>
+ PowerPoint documents are made up of a tree of records. A record may
+ contain either other records (in which case it is a Container),
+ or data (in which case it's an Atom). A record can't hold both.
+ </p>
+ <p>
+ PowerPoint documents don't have one overall container record. Instead,
+ there are a number of different container records to be found at
+ the top level.
+ </p>
+ <p>
+ Any numbers or strings stored in the records are always stored in
+ Little Endian format (least important bytes first). This is the case
+ no matter what platform the file was written on - be that a
+ Little Endian or a Big Endian system.
+ </p>
+ <p>
+ PowerPoint may have Escher (DDF) records embedded in it. These
+ are always held as the children of a PPDrawing record (record
+ type 1036). Escher records have the same format as PowerPoint
+ records.
+ </p>
+ </section>
+
+ <section><title>Record Headers</title>
+ <p>
+ All records, be they containers or atoms, have the same standard
+ 8 byte header. It is:
+ </p>
+ <ul><li>1/2 byte container flag</li>
+ <li>1.5 byte option field</li>
+ <li>2 byte record type</li>
+ <li>4 byte record length</li></ul>
+ <p>
+ If the first byte of the header, BINARY_AND with 0x0f, is 0x0f,
+ then the record is a container. Otherwise, it's an atom. The rest
+ of the first two bytes are used to store the "options" for the
+ record. Most commonly, this is used to indicate the version of
+ the record, but the exact usage is record specific.
+ </p>
+ <p>
+ The record type is a little endian number, which tells you what
+ kind of record you're dealing with. Each different kind of record
+ has its own value that gets stored here. PowerPoint records have
+ a type that's normally less than 6000 (decimal). Escher records
+ normally have a type between 0xF000 and 0xF1FF.
+ </p>
+ <p>
+ The record length is another little endian number. For an atom,
+ it's the size of the data part of the record, i.e. the length
+ of the record <em>less</em> its 8 byte record header. For a
+ container, it's the size of all the records that are children of
+ this record. That means that the size of a container record is the
+ length, plus 8 bytes for its record header.
+ </p>
+ </section>
+
+ <section><title>CurrentUserAtom, UserEditAtom and PersistPtrIncrementalBlock</title>
+ <p><strong>aka Records that care about the byte level position of other records</strong></p>
+ <p>
+ A small number of records contain byte level position offsets to other
+ records. If you change the position of any records in the file, then
+ there's a good chance that you will need to update some of these
+ special records.
+ </p>
+ <p>
+ First up, CurrentUserAtom. This is actually stored in a different
+ OLE2 (POIFS) stream to the main PowerPoint document. It contains
+ a few bits of information on who lasted edited the file. Most
+ importantly, at byte 8 of its contents, it stores (as a 32 bit
+ little endian number) the offset in the main stream to the most
+ recent UserEditAtom.
+ </p>
+ <p>
+ The UserEditAtom contains two byte level offsets (again as 32 bit
+ little endian numbers). At byte 12 is the offset to the
+ PersistPtrIncrementalBlock associated with this UserEditAtom
+ (each UserEditAtom has one and only one PersistPtrIncrementalBlock).
+ At byte 8, there's the offset to the previous UserEditAtom. If this
+ is 0, then you're at the first one.
+ </p>
+ <p>
+ Every time you do a non full save in PowerPoint, it tacks on another
+ UserEditAtom and another PersistPtrIncrementalBlock. The
+ CurrentUserAtom is updated to point to this new UserEditAtom, and the
+ new UserEditAtom points back to the previous UserEditAtom. You then
+ end up with a chain, starting from the CurrentUserAtom, linking
+ back through all the UserEditAtoms, until you reach the first one
+ from a full save.
+ </p>
+<source>
+/-------------------------------\
+| CurrentUserAtom (own stream) |
+| OffsetToCurrentEdit = 10562 |==\
+\-------------------------------/ |
+ |
+/==================================/
+| /-----------------------------------\
+| | PersistPtrIncrementalBlock @ 6144 |
+| \-----------------------------------/
+| /---------------------------------\ |
+| | UserEditAtom @ 6176 | |
+| | LastUserEditAtomOffset = 0 | |
+| | PersistPointersOffset = 6144 |==================/
+| \---------------------------------/
+| | /-----------------------------------\
+| \====================\ | PersistPtrIncrementalBlock @ 8646 |
+| | \-----------------------------------/
+| /---------------------------------\ | |
+| | UserEditAtom @ 8674 | | |
+| | LastUserEditAtomOffset = 6176 |=/ |
+| | PersistPointersOffset = 8646 |==================/
+| \---------------------------------/
+| | /------------------------------------\
+| \====================\ | PersistPtrIncrementalBlock @ 10538 |
+| | \------------------------------------/
+| /---------------------------------\ | |
+\==| UserEditAtom @ 10562 | | |
+ | LastUserEditAtomOffset = 8674 |=/ |
+ | PersistPointersOffset = 10538 |==================/
+ \---------------------------------/
+</source>
+ <p>
+ The PersistPtrIncrementalBlock contains byte offsets to all the
+ Slides, Notes, Documents and MasterSlides in the file. The first
+ PersistPtrIncrementalBlock will point to all the ones that
+ were present the first time the file was saved. Subsequent
+ PersistPtrIncrementalBlocks will contain pointers to all the ones
+ that were changed in that edit. To find the offset to a given
+ sheet in the latest version, then start with the most recent
+ PersistPtrIncrementalBlock. If this knows about the sheet, use the
+ offset it has. If it doesn't, then work back through older
+ PersistPtrIncrementalBlocks until you find one which does, and
+ use that.
+ </p>
+ <p>
+ Each PersistPtrIncrementalBlock can contain a number of entries
+ blocks. Each block holds information on a sequence of sheets.
+ Each block starts with a 32 bit little endian integer. Once read
+ into memory, the lower 20 bits contain the starting number for the
+ sequence of sheets to be described. The higher 12 bits contain
+ the count of the number of sheets described. Following that is
+ one 32 bit little endian integer for each sheet in the sequence,
+ the value being the offset to that sheet. If there is any data
+ left after parsing a block, then it corresponds to the next block.
+ </p>
+<source>
+hex on disk decimal description
+----------- ------- -----------
+0000 0 No options
+7217 6002 Record type is 6002
+2000 0000 32 Length of data is 32 bytes
+0100 5000 5242881 Count is 5 (12 highest bits)
+ Starting number is 1 (20 lowest bits)
+0000 0000 0 Sheet (1+0)=1 starts at offset 0
+900D 0000 3472 Sheet (1+1)=2 starts at offset 3472
+E403 0000 996 Sheet (1+2)=3 starts at offset 996
+9213 0000 5010 Sheet (1+3)=4 starts at offset 5010
+BE15 0000 5566 Sheet (1+4)=5 starts at offset 5566
+0900 1000 1048585 Count is 1 (12 highest bits)
+ Starting number is 9 (20 lowest bits)
+4418 0000 6212 Sheet (9+0)=9 starts at offset 9212
+</source>
+ </section>
+
+ <section><title>Paragraph and Text Styling</title>
+ <p>
+ There are quite a number of records that affect the styling
+ of text, and a smaller number that are responsible for the
+ styling of paragraphs.
+ </p>
+ <p>
+ By default, a given set of text will inherit paragraph and text
+ stylings from the appropriate master sheet. If anything differs
+ from the master sheet, then appropriate styling records will
+ follow the text record.
+ </p>
+ <p>
+ <em>(We don't currently know enough about master sheet styling
+ to write about it)</em>
+ </p>
+ <p>
+ Normally, powerpoint will have one text record (TextBytesAtom
+ or TextCharsAtom) for every paragraph, with a preceding
+ TextHeaderAtom to describe what sort of paragraph it is.
+ If any of the stylings differ from the master's, then a
+ StyleTextPropAtom will follow the text record. This contains
+ the paragraph style information, and the styling information
+ for each section of the text which has a different style.
+ (More on StyleTextPropAtom later)
+ </p>
+ <p>
+ For every font used, a FontEntityAtom must exist for that font.
+ The FontEntityAtoms live inside a FontCollection record, and
+ there's one of those inside Environment record inside the
+ Document record. <em>(More on Fonts to be discovered)</em>
+ </p>
+ </section>
+
+ <section><title>StyleTextPropAtom</title>
+ <p>
+ If the text or paragraph stylings for a given text record
+ differ from those of the appropriate master, then there will
+ be one of these records.
+ </p>
+ <p>
+ This record is made up of two lists of lists. Firstly,
+ there's a list of paragraph stylings - each made up of the
+ number of characters it applies two, followed by the matching
+ styling elements. Following that is the equivalent for
+ character stylings.
+ </p>
+ <p>
+ Each styling list (in either list) starts with the number
+ of characters it applies to, stored in a 2 byte little
+ endian number. If it is a paragraph styling, it will be
+ followed by a 2 byte number (of unknown use). After this is
+ a four byte number, which is a mask indicating which stylings
+ will follow. You then have an entry for each of the stylings
+ indicated in the mask. Finally, you move onto the next set
+ of stylings.
+ </p>
+ <p>
+ Each styling has a specific mask flag to indicate its
+ presence. (The list may be found towards the top of
+ org.apache.poi.hslf.record.StyleTextPropAtom.java, and is
+ too long to sensibly include here). For each styling entry
+ will occur in the order of its mask value (so one with mask
+ 1 will come first, followed by the next highest mask value).
+ Depending on the styling, it is either made up of a 2 byte
+ or 4 byte numeric value. The meaning of the value will
+ depend on the styling (eg for font.size, it is the font
+ size in points).
+ </p>
+ <p>
+ Some stylings are actually mask stylings. For these, the
+ value will be a 4 byte number. This is then processed as
+ mask, to indicate a number of different sub-stylings.
+ The styling for bold/italic/underline is one such example.
+ </p>
+<source>
+hex on disk decimal description
+----------- ------- -----------
+
+0000 0 No options
+A10F 4001 Record type is 4001
+8000 0000 128 Length of data is 128 bytes
+1E00 0000 30 The paragraph styling applies to 30 characters
+0000 0 Paragraph options are 0
+0018 0000 6144 0x0800=Text Alignment, 0x1000=Line Spacing
+0000 0 Text Alignment = Left
+5000 80 Line Spacing = 80
+
+1C00 0000 28 The paragraph styling applies to 28 characters
+0000 0 Paragraph options are 0
+0010 0000 4096 0x1000=Line Spacing
+5000 80 Line Spacing = 80
+
+1900 0000 25 The paragraph styling applies to 25 characters
+0000 0 Paragraph options are 0
+0018 0000 6144 0x0800=Text Alignment, 0x1000=Line Spacing
+0200 0 Text Alignment = Right
+5000 80 Line Spacing = 80
+
+6100 0000 61 The paragraph styling applies to 61 characters
+ (includes final CR)
+0000 0 Paragraph options are 0
+0018 0000 6144 0x0800=Text Alignment, 0x1000=Line Spacing
+0000 0 Text Alignment = Left
+5000 80 Line Spacing = 80
+
+1E00 0000 30 The character styling applies to 30 characters
+0100 0200 131073 0x0001=Char Props Mask, 0x20000=Font Size
+0100 1 Char Props 0x0001=Bold
+1400 20 Font Size = 20
+
+1C00 0000 28 The character styling applies to 28 characters
+0200 0600 393218 0x0002=Char Props Mask, 0x20000=Font Size, 0x40000=Font Color
+0200 2 Char Props 0x0002=Italic
+1400 20 Font Size = 20
+0000 0005 83886080 Blue
+
+1900 0000 25 The character styling applies to 25 characters
+0000 0600 393216 0x20000=Font Size, 0x40000=Font Color
+1400 20 Font Size = 20
+FF33 00FE 4261426175 Red
+
+6000 0000 96 The character styling applies to 96 characters
+0400 0300 196612 0x0004=Char Props Mask, 0x10000=Font Index, 0x20000=Font Size
+0400 4 Char Props 0x0004=Underlined
+0100 1 Font Index = 1 (2nd Font in table)
+1800 24 Font Size = 24
+</source>
+ </section>
+
+ <section><title>Fonts in PowerPoint</title>
+ <p>
+ PowerPoint stores information about the fonts used in FontEntityAtoms,
+ which live inside Document.Environment.FontCollection. For every different
+ font used, a FontEntityAtom must exist for that font. There is always at
+ least one FontEntityAtom in Document.Environment.FontCollection,
+ which describes the default font.
+ </p>
+ </section>
+
+ <section><title>FontEntityAtom</title>
+ <p>
+ The instance field of the record header contains the zero based index of the
+ font. Font index entries in StyleTextPropAtoms will refer to their required
+ font via this index.
+ </p>
+ <p>
+ The length of FontEntityAtoms is always 68 bytes. The first 64 bytes of
+ it hold the typeface name of the font to be used. This is stored as
+ a null-terminated string, and encoded as little endian unicode. (The
+ length of the string must not exceed 32 characters including the null
+ termination, so the typeface name cannot exceed 31 characters).
+ </p>
+
+ <p>
+ After the typeface name there are 4 bytes of bitmask flags. The details of these
+ can be found in the Windows API, under the LOGFONT structure.
+ The 65th byte is the output precision, which defines how closely the system chosen
+ font must match the requested font, in terms of height, width, pitch etc.
+ The 66th byte is the clipping precision, which defines how to clip characters
+ that occur partly outside the clipping region.
+ The 67th byte is the output quality, which defines how closely the system
+ must match the logical font's attributes to those of the physical font used.
+ The 68th (and final) byte is the pitch and family, which is used by the
+ system when matching fonts.
+ </p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/slideshow/ppt-wmf-emf-renderer.xml b/src/documentation/content/xdocs/components/slideshow/ppt-wmf-emf-renderer.xml
new file mode 100644
index 0000000000..7421db5733
--- /dev/null
+++ b/src/documentation/content/xdocs/components/slideshow/ppt-wmf-emf-renderer.xml
@@ -0,0 +1,209 @@
+<?xml version="1.0" encoding="UTF-8"?><!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Rendering slideshows, WMF, EMF and EMF+</title>
+ </header>
+ <body>
+ <note>Please be aware, that the documentation on this page reflects the current development, which might not
+ have been released. If you rely on an unreleased feature, either use a
+ <a href="site:download">nightly development build</a> or feel free to ask on the
+ <a href="site:mailinglists">mailing list</a> for the release schedule.</note>
+ <section>
+ <title>Rendering slideshows, WMF, EMF and EMF+</title>
+ <p>
+ For rendering slideshow (HSLF/XSLF), WMF, EMF and EMF+ pictures, POI provides an utility class
+ <a href="https://github.com/apache/poi/tree/trunk/poi-ooxml/src/main/java/org/apache/poi/xslf/util/PPTX2PNG.java?view=markup">
+ PPTX2PNG</a>:
+ </p>
+
+ <source><![CDATA[
+ Usage: PPTX2PNG [options] <.ppt/.pptx/.emf/.wmf file or 'stdin'>
+
+ Options:
+ -scale <float> scale factor
+ -fixSide <side> specify side (long,short,width,height) to fix - use <scale> as amount of pixels
+ -slide <integer> 1-based index of a slide to render
+ -format <type> png,gif,jpg,svg,pdf (log,null for testing)
+ -outdir <dir> output directory, defaults to origin of the ppt/pptx file
+ -outfile <file> output filename, defaults to "${basename}-${slideno}.${format}"
+ -outpat <pattern> output filename pattern, defaults to "${basename}-${slideno}.${format}"
+ patterns: basename, slideno, format, ext
+ -dump <file> dump the annotated records to a file
+ -quiet do not write to console (for normal processing)
+ -ignoreParse ignore parsing error and continue with the records read until the error
+ -extractEmbedded extract embedded parts
+ -inputType <type> default input file type (OLE2,WMF,EMF), default is OLE2 = Powerpoint
+ some files (usually wmf) don't have a header, i.e. an identifiable file magic
+ -textAsShapes text elements are saved as shapes in SVG, necessary for variable spacing
+ often found in math formulas
+ -charset <cs> sets the default charset to be used, defaults to Windows-1252
+ -emfHeaderBounds force the usage of the emf header bounds to calculate the bounding box
+
+ -fontdir <dir> (PDF only) font directories separated by ";" - use $HOME for current users home dir
+ defaults to the usual plattform directories
+ -fontTtf <regex> (PDF only) regex to match the .ttf filenames
+ -fontMap <map> ";"-separated list of font mappings <typeface from>:<typeface to>
+ ]]>
+ </source>
+
+ <section>
+ <title>Instructions to run</title>
+ <p>
+ Download the <a href="https://ci-builds.apache.org/job/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/">current nightly</a>
+ and for SVG/PDF the <a href="site:components/index/batikpdf">additional dependencies</a>.</p>
+ <p>Execute the java command (Unix-paths needs to be replaced for Windows - use "-charset" for non-western WMF/EMFs):</p>
+ <source>
+ java -cp poi-5.4.1.jar:poi-ooxml-5.4.1.jar:poi-ooxml-lite-5.4.1.jar:poi-scratchpad-5.4.1.jar:lib/*:ooxml-lib/*:auxiliary/* org.apache.poi.xslf.util.PPTX2PNG -format png -fixside long -scale 1000 -charset GBK file.pptx
+ </source>
+ <p>
+ If you want to use the renderer on the module path (JPMS) there a currently a few more steps necessary:
+ </p>
+ <ul>
+ <li>Create a build project using Maven, Gradle or your favorite build tool.</li>
+ <li>Alternatively, download the jars from https://repo1.maven.org/maven2/org/apache/poi/</li>
+ <li>Exclude poi-ooxml-full-5.4.1.jar,poi-javadoc-5.4.1.jar and auxiliary/xml-apis-1.4.01.jar (Java 11+) into new subdirectory "unused"</li>
+ <li>Move all other jars in current directory into a new subdirectory "poi"</li>
+ <li>Invoke PPTX2PNG:
+ <source>
+ java --module-path poi:lib:auxiliary:ooxml-lib --module org.apache.poi.ooxml/org.apache.poi.xslf.util.PPTX2PNG -format png -fixside long -scale 1000 file.pptx
+ </source>
+ </li>
+ </ul>
+ <note>
+ JDK 1.8 is by default using the PiscesRenderingEngine and affected by
+ <a href="https://github.com/AdoptOpenJDK/openjdk-build/issues/716">Busy loop hangs</a>.
+ To workaround this, use the MarlinRenderingEngine which is experimental provided starting from
+ <a href="https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8143849">openjdk8u252 (JDK-8143849)</a>
+ via <code>-Dsun.java2d.renderer=sun.java2d.marlin.MarlinRenderingEngine</code> or for older jdk builds,
+ <a href="https://github.com/bourgesl/marlin-renderer/wiki/How-to-use">preload the marlin jar</a>.
+ </note>
+ </section>
+
+ </section>
+ <section>
+ <title>Integrate rendering in your code</title>
+ <section>
+ <title>#1 - Use PPTX2PNG via file or stdin</title>
+ <p>For file system access, you need to save your slideshow/WMF/EMF/EMF+ first to disc and then call <code>
+ PPTX2PNG.main()
+ </code> with the corresponding parameters.
+ </p>
+
+ <p>for stdin access, you need to redirect <code>System.in</code> before:
+ </p>
+ <source><![CDATA[
+ /* the file content */
+ InputStream is = ...;
+ /* Save and set System.in */
+ InputStream oldIn = System.in;
+ try {
+ System.setIn(is);
+
+ String[] args = {
+ "-format", "png", // png,gif,jpg,svg or null for test
+ "-outdir", new File("out/").getCanonicalPath(),
+ "-outfile", "export.png",
+ "-fixside", "long",
+ "-scale", "800",
+ "-ignoreParse",
+ "stdin"
+ };
+ PPTX2PNG.main(args);
+
+ } finally {
+ System.setIn(oldIn);
+ }
+ ]]></source>
+ </section>
+ <section>
+ <title>#2 - Render WMF / EMF / EMF+ via the *Picture classes</title>
+ <source><![CDATA[
+ File f = samples.getFile("santa.wmf");
+ try (FileInputStream fis = new FileInputStream(f)) {
+ // for WMF
+ HwmfPicture wmf = new HwmfPicture(fis);
+
+ // for EMF / EMF+
+ HemfPicture emf = new HemfPicture(fis);
+
+ Dimension dim = wmf.getSize();
+ int width = Units.pointsToPixel(dim.getWidth());
+ // keep aspect ratio for height
+ int height = Units.pointsToPixel(dim.getHeight());
+ double max = Math.max(width, height);
+ if (max > 1500) {
+ width *= 1500/max;
+ height *= 1500/max;
+ }
+
+ BufferedImage bufImg = new BufferedImage(width, height, BufferedImage.TYPE_INT_ARGB);
+ Graphics2D g = bufImg.createGraphics();
+ g.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
+ g.setRenderingHint(RenderingHints.KEY_RENDERING, RenderingHints.VALUE_RENDER_QUALITY);
+ g.setRenderingHint(RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_BICUBIC);
+ g.setRenderingHint(RenderingHints.KEY_FRACTIONALMETRICS, RenderingHints.VALUE_FRACTIONALMETRICS_ON);
+
+ wmf.draw(g, new Rectangle2D.Double(0,0,width,height));
+
+ g.dispose();
+
+ ImageIO.write(bufImg, "PNG", new File("bla.png"));
+ }
+ ]]>
+ </source>
+ </section>
+ <section>
+ <title>#3 - Render slideshows directly</title>
+ <source><![CDATA[
+ File file = new File("example.pptx");
+ double scale = 1.5;
+ try (SlideShow<?, ?> ss = SlideShowFactory.create(file, null, true)) {
+ Dimension pgsize = ss.getPageSize();
+ int width = (int) (pgsize.width * scale);
+ int height = (int) (pgsize.height * scale);
+
+ for (Slide<?, ?> slide : ss.getSlides()) {
+ BufferedImage img = new BufferedImage(width, height, BufferedImage.TYPE_INT_ARGB);
+ Graphics2D graphics = img.createGraphics();
+
+ // default rendering options
+ graphics.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
+ graphics.setRenderingHint(RenderingHints.KEY_RENDERING, RenderingHints.VALUE_RENDER_QUALITY);
+ graphics.setRenderingHint(RenderingHints.KEY_INTERPOLATION, RenderingHints.VALUE_INTERPOLATION_BICUBIC);
+ graphics.setRenderingHint(RenderingHints.KEY_FRACTIONALMETRICS, RenderingHints.VALUE_FRACTIONALMETRICS_ON);
+ graphics.setRenderingHint(Drawable.BUFFERED_IMAGE, new WeakReference<>(img));
+
+ graphics.scale(scale, scale);
+
+ // draw stuff
+ slide.draw(graphics);
+
+ ImageIO.write(img, "PNG", new File("output.png"));
+ graphics.dispose();
+ img.flush();
+ }
+ }
+ ]]></source>
+ </section>
+ </section>
+ </body>
+</document> \ No newline at end of file
diff --git a/src/documentation/content/xdocs/components/slideshow/quick-guide.xml b/src/documentation/content/xdocs/components/slideshow/quick-guide.xml
new file mode 100644
index 0000000000..88d85d877c
--- /dev/null
+++ b/src/documentation/content/xdocs/components/slideshow/quick-guide.xml
@@ -0,0 +1,133 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-HSLF - A Quick Guide</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nick Burch" email="nick at torchbox dot com"/>
+ </authors>
+ </header>
+
+ <body>
+ <section><title>Basic Text Extraction</title>
+ <p>For basic text extraction, make use of
+ <code>org.apache.poi.sl.extractor.SlideShowExtractor</code>.
+ It accepts a slideshow which can be created from a file or stream via <code>org.apache.poi.sl.usermodel.SlideShowFactory</code>.
+ The <code>getText()</code> method can be used to get the text from the slides.
+ </p>
+ </section>
+
+ <section><title>Specific Text Extraction</title>
+ <p>To get specific bits of text, first create a <code>org.apache.poi.hslf.usermodel.HSLFSlideShow</code>
+(from a <code>org.apache.poi.hslf.usermodel.HSLFSlideShowImpl</code>, which accepts a file or an input
+stream). Use <code>getSlides()</code> and <code>getNotes()</code> to get the slides and notes.
+These can be queried to get their page ID (though they should be returned
+in the right order).</p>
+ <p>You can then call <code>getTextParagraphs()</code> on these, to get
+their blocks of text. (A list of <code>HSLFTextParagraph</code> normally holds all the text in a
+given area of the page, eg in the title bar, or in a box).
+From the <code>HSLFTextParagraph</code>, you can extract the text, and check
+what type of text it is (eg Body, Title). You can also call
+<code>getTextRuns()</code>, which will return the
+<code>HSLFTextRun</code>s that make up the <code>TextParagraph</code>. A
+<code>HSLFTextRun</code> is a text fragment, having the same character formatting.
+The paragraph formatting is defined in the parent <code>HSLFTextParagraph</code>.
+ </p>
+ </section>
+
+ <section><title>Poor Quality Text Extraction</title>
+ <p>If speed is the most important thing for you, you don't care
+ about getting duplicate blocks of text, you don't care about
+ getting text from master sheets, and you don't care about getting
+ old text, then
+ <code>org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor</code>
+ might be of use.</p>
+ <p>QuickButCruddyTextExtractor doesn't use the normal record
+ parsing code, instead it uses a tree structure blind search
+ method to get all text holding records. You will get all the text,
+ including lots of text you normally wouldn't ever want. However,
+ you will get it back very very fast!</p>
+ <p>There are two ways of getting the text back.
+ <code>getTextAsString()</code> will return a single string with all
+ the text in it. <code>getTextAsVector()</code> will return a
+ vector of strings, one for each text record found in the file.
+ </p>
+ </section>
+
+ <section><title>Changing Text</title>
+ <p>It is possible to change the text via
+ <code>HSLFTextParagraph.setText(List&lt;HSLFTextParagraph&gt;,String)</code> or
+ <code>HSLFTextRun.setText(String)</code>. It is possible to add additional TextRuns
+ with <code>HSLFTextParagraph.appendText(List&lt;HSLFTextParagraph&gt;,String,boolean)</code>
+ or <code>HSLFTextParagraph.addTextRun(HSLFTextRun)</code></p>
+ <p>When calling <code>HSLFTextParagraph.setText(List&lt;HSLFTextParagraph&gt;,String)</code>, all
+ the text will end up with the same formatting. When calling
+ <code>HSLFTextRun.setText(String)</code>, the text will retain
+ the old formatting of that <code>HSLFTextRun</code>.
+ </p>
+ </section>
+
+ <section><title>Adding Slides</title>
+ <p>You may add new slides by calling
+ <code>HSLFSlideShow.createSlide()</code>, which will add a new slide
+ to the end of the SlideShow. It is possible to re-order slides with <code>HSLFSlideShow.reorderSlide(...)</code>.
+ </p>
+ </section>
+
+ <section><title>Guide to key classes</title>
+ <ul>
+ <li><code>org.apache.poi.hslf.usermodel.HSLFSlideShowImpl</code>
+ Handles reading in and writing out files. Calls
+ <code>org.apache.poi.hslf.record.record</code> to build a tree
+ of all the records in the file, which it allows access to.
+ </li>
+ <li><code>org.apache.poi.hslf.record.Record</code>
+ Base class of all records. Also provides the main record generation
+ code, which will build up a tree of records for a file.
+ </li>
+ <li><code>org.apache.poi.hslf.usermodel.HSLFSlideShow</code>
+ Builds up model entries from the records, and presents a user facing
+ view of the file
+ </li>
+ <li><code>org.apache.poi.hslf.usermodel.HSLFSlide</code>
+ A user facing view of a Slide in a slideshow. Allows you to get at the
+ Text of the slide, and at any drawing objects on it.
+ </li>
+ <li><code>org.apache.poi.hslf.usermodel.HSLFTextParagraph</code>
+ A list of <code>HSLFTextParagraph</code>s holds all the text in a given area of the Slide, and will
+ contain one or more <code>HSLFTextRun</code>s.
+ </li>
+ <li><code>org.apache.poi.hslf.usermodel.HSLFTextRun</code>
+ Holds a run of text, all having the same character stylings. It is possible to modify text, and/or text stylings.
+ </li>
+ <li><code>org.apache.poi.sl.extractor.SlideShowExtractor</code>
+ Uses the model code to allow extraction of text from files
+ </li>
+ <li><code>org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor</code>
+ Uses the record code to extract all the text from files very fast,
+ but including deleted text (and other bits of Crud).
+ </li>
+ </ul>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/slideshow/xslf-cookbook.xml b/src/documentation/content/xdocs/components/slideshow/xslf-cookbook.xml
new file mode 100644
index 0000000000..4f72295b5f
--- /dev/null
+++ b/src/documentation/content/xdocs/components/slideshow/xslf-cookbook.xml
@@ -0,0 +1,304 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>XSLF Cookbook</title>
+ <authors>
+ <person email="yegor@apache.org" name="Yegor Kozlov" id="YK"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>XSLF Cookbook</title>
+ <p>
+ This page offers a short introduction into the XSLF API. More examples can be found in the
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xslf/">XSLF Examples</a>
+ in the POI Git repository.
+ </p>
+ <note>
+ Please note that XSLF is still in early development and is a subject to incompatible changes in a future release.
+ </note>
+ <section><title>Index of Features</title>
+ <ul>
+ <li><a href="#NewPresentation">Create a new presentation</a></li>
+ <li><a href="#ReadPresentation">Read an existing presentation</a></li>
+ <li><a href="#SlideLayout">Create a slide with a predefined layout</a></li>
+ <li><a href="#DeleteSlide">Delete slide</a></li>
+ <li><a href="#MoveSlide">Re-order slides</a></li>
+ <li><a href="#SlideSize">Change slide size</a></li>
+ <li><a href="#GetShapes">Read shapes</a></li>
+ <li><a href="#AddImage">Add image</a></li>
+ <li><a href="#ReadImages">Read images contained in a presentation</a></li>
+ <li><a href="#Text">Format text</a></li>
+ <li><a href="#Hyperlinks">Hyperlinks</a></li>
+ <li><a href="#PPTX2PNG">Convert .pptx slides into images</a></li>
+ <li><a href="#Merge">Merge multiple presentations together</a></li>
+ </ul>
+ </section>
+ <section><title>Cookbook</title>
+ <anchor id="NewPresentation"/>
+ <section><title>New Presentation</title>
+ <p>
+ The following code creates a new .pptx slide show and adds a blank slide to it:
+ </p>
+ <source>
+ //create a new empty slide show
+ XMLSlideShow ppt = new XMLSlideShow();
+
+ //add first slide
+ XSLFSlide blankSlide = ppt.createSlide();
+ </source>
+ </section>
+ <anchor id="ReadPresentation"/>
+ <section><title>Read an existing presentation and append a slide to it</title>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow(new FileInputStream("slideshow.pptx"));
+
+ //append a new slide to the end
+ XSLFSlide blankSlide = ppt.createSlide();
+ </source>
+ </section>
+
+ <anchor id="SlideLayout"/>
+ <section><title>Create a new slide from a predefined slide layout</title>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow(new FileInputStream("slideshow.pptx"));
+
+ // first see what slide layouts are available :
+ System.out.println("Available slide layouts:");
+ for(XSLFSlideMaster master : ppt.getSlideMasters()){
+ for(XSLFSlideLayout layout : master.getSlideLayouts()){
+ System.out.println(layout.getType());
+ }
+ }
+
+ // blank slide
+ XSLFSlide blankSlide = ppt.createSlide();
+
+ // there can be multiple masters each referencing a number of layouts
+ // for demonstration purposes we use the first (default) slide master
+ XSLFSlideMaster defaultMaster = ppt.getSlideMasters().get(0);
+
+ // title slide
+ XSLFSlideLayout titleLayout = defaultMaster.getLayout(SlideLayout.TITLE);
+ // fill the placeholders
+ XSLFSlide slide1 = ppt.createSlide(titleLayout);
+ XSLFTextShape title1 = slide1.getPlaceholder(0);
+ title1.setText("First Title");
+
+ // title and content
+ XSLFSlideLayout titleBodyLayout = defaultMaster.getLayout(SlideLayout.TITLE_AND_CONTENT);
+ XSLFSlide slide2 = ppt.createSlide(titleBodyLayout);
+
+ XSLFTextShape title2 = slide2.getPlaceholder(0);
+ title2.setText("Second Title");
+
+ XSLFTextShape body2 = slide2.getPlaceholder(1);
+ body2.clearText(); // unset any existing text
+ body2.addNewTextParagraph().addNewTextRun().setText("First paragraph");
+ body2.addNewTextParagraph().addNewTextRun().setText("Second paragraph");
+ body2.addNewTextParagraph().addNewTextRun().setText("Third paragraph");
+ </source>
+ </section>
+
+ <anchor id="DeleteSlide"/>
+ <section><title>Delete slide</title>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow(new FileInputStream("slideshow.pptx"));
+
+ ppt.removeSlide(0); // 0-based index of a slide to be removed
+ </source>
+ </section>
+
+ <anchor id="MoveSlide"/>
+ <section><title>Re-order slides</title>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow(new FileInputStream("slideshow.pptx"));
+ List&lt;XSLFSlide&gt; slides = ppt.getSlides();
+
+ XSLFSlide thirdSlide = slides.get(2);
+ ppt.setSlideOrder(thirdSlide, 0); // move the third slide to the beginning
+ </source>
+ </section>
+
+ <anchor id="SlideSize"/>
+ <section><title>How to retrieve or change slide size</title>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow();
+ //retrieve page size. Coordinates are expressed in points (72 dpi)
+ java.awt.Dimension pgsize = ppt.getPageSize();
+ int pgx = pgsize.width; //slide width in points
+ int pgy = pgsize.height; //slide height in points
+
+ //set new page size
+ ppt.setPageSize(new java.awt.Dimension(1024, 768));
+ </source>
+ </section>
+ <anchor id="GetShapes"/>
+ <section><title>How to read shapes contained in a particular slide</title>
+ <p>
+ The following code demonstrates how to iterate over shapes for each slide.
+ </p>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow(new FileInputStream("slideshow.pptx"));
+ // get slides
+ for (XSLFSlide slide : ppt.getSlides()) {
+ for (XSLFShape sh : slide.getShapes()) {
+ // name of the shape
+ String name = sh.getShapeName();
+
+ // shapes's anchor which defines the position of this shape in the slide
+ if (sh instanceof PlaceableShape) {
+ java.awt.geom.Rectangle2D anchor = ((PlaceableShape)sh).getAnchor();
+ }
+
+ if (sh instanceof XSLFConnectorShape) {
+ XSLFConnectorShape line = (XSLFConnectorShape) sh;
+ // work with Line
+ } else if (sh instanceof XSLFTextShape) {
+ XSLFTextShape shape = (XSLFTextShape) sh;
+ // work with a shape that can hold text
+ } else if (sh instanceof XSLFPictureShape) {
+ XSLFPictureShape shape = (XSLFPictureShape) sh;
+ // work with Picture
+ }
+ }
+ }
+ </source>
+ </section>
+ <anchor id="AddImage"/>
+ <section><title>Add Image to Slide</title>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow();
+ XSLFSlide slide = ppt.createSlide();
+
+ byte[] pictureData = IOUtils.toByteArray(new FileInputStream("image.png"));
+
+ XSLFPictureData pd = ppt.addPicture(pictureData, PictureData.PictureType.PNG);
+ XSLFPictureShape pic = slide.createPicture(pd);
+ </source>
+ </section>
+
+ <anchor id="ReadImages"/>
+ <section><title>Read Images contained within a presentation</title>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow(new FileInputStream("slideshow.pptx"));
+ for(XSLFPictureData data : ppt.getAllPictures()){
+ byte[] bytes = data.getData();
+ String fileName = data.getFileName();
+
+ }
+ </source>
+ </section>
+
+ <anchor id="Text"/>
+ <section><title>Basic text formatting</title>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow();
+ XSLFSlide slide = ppt.createSlide();
+
+ XSLFTextBox shape = slide.createTextBox();
+ XSLFTextParagraph p = shape.addNewTextParagraph();
+
+ XSLFTextRun r1 = p.addNewTextRun();
+ r1.setText("The");
+ r1.setFontColor(Color.blue);
+ r1.setFontSize(24.);
+
+ XSLFTextRun r2 = p.addNewTextRun();
+ r2.setText(" quick");
+ r2.setFontColor(Color.red);
+ r2.setBold(true);
+
+ XSLFTextRun r3 = p.addNewTextRun();
+ r3.setText(" brown");
+ r3.setFontSize(12.);
+ r3.setItalic(true);
+ r3.setStrikethrough(true);
+
+ XSLFTextRun r4 = p.addNewTextRun();
+ r4.setText(" fox");
+ r4.setUnderline(true);
+ </source>
+ </section>
+ <anchor id="Hyperlinks"/>
+ <section><title>How to create a hyperlink</title>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow();
+ XSLFSlide slide = ppt.createSlide();
+
+ // assign a hyperlink to a text run
+ XSLFTextBox shape = slide.createTextBox();
+ XSLFTextRun r = shape.addNewTextParagraph().addNewTextRun();
+ r.setText("Apache POI");
+ XSLFHyperlink link = r.createHyperlink();
+ link.setAddress("https://poi.apache.org");
+ </source>
+ </section>
+ <anchor id="PPTX2PNG"/>
+ <section><title>PPTX2PNG is an application that converts each slide of a .pptx slideshow into a PNG image</title>
+ <source>
+Usage: PPTX2PNG [options] &lt;pptx file&gt;
+Options:
+ -scale &lt;float&gt; scale factor (default is 1.0)
+ -slide &lt;integer&gt; 1-based index of a slide to render. Default is to render all slides.
+ </source>
+ <p>How it works:</p>
+ <p>
+ The XSLFSlide object implements a draw(Graphics2D graphics) method that recursively paints all shapes
+ in the slide into the supplied graphics canvas:
+ </p>
+ <source>
+ slide.draw(graphics);
+ </source>
+ <p>
+ where graphics is a class implementing java.awt.Graphics2D. In PPTX2PNG the graphic canvas is derived from
+ java.awt.image.BufferedImage, i.e. the destination is an image in memory, but in general case you can pass
+ any compliant implementation of java.awt.Graphics2D.
+ Find more information in the designated <a href="site:slrender">render page</a>, e.g. on how to render SVG images.
+ </p>
+ </section>
+ <anchor id="Merge"/>
+ <section>
+ <title>Merge multiple presentations together</title>
+ <source>
+ XMLSlideShow ppt = new XMLSlideShow();
+ String[] inputs = {"presentations1.pptx", "presentation2.pptx"};
+ for(String arg : inputs){
+ FileInputStream is = new FileInputStream(arg);
+ XMLSlideShow src = new XMLSlideShow(is);
+ is.close();
+
+ for(XSLFSlide srcSlide : src.getSlides()){
+ ppt.createSlide().importContent(srcSlide);
+ }
+ }
+
+ FileOutputStream out = new FileOutputStream("merged.pptx");
+ ppt.write(out);
+ out.close();
+ </source>
+ </section>
+
+ </section>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/chart.xml b/src/documentation/content/xdocs/components/spreadsheet/chart.xml
new file mode 100644
index 0000000000..8e4194af9a
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/chart.xml
@@ -0,0 +1,1532 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Chart record information</title>
+ <authors>
+ <person email="user@poi.apache.org" name="Glen Stampoultzis" id="GS"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>Introduction</title>
+ <p>
+ This document is intended as a work in progress for describing
+ our current understanding of how the chart records are
+ written to produce a valid chart.
+ </p>
+ </section>
+ <section><title>Bar chart</title>
+ <p>
+ The following records detail the records written for a
+ 'simple' bar chart.
+ </p>
+ <source>
+
+ ============================================
+ rectype = 0xec, recsize = 0xc8
+ -BEGIN DUMP---------------------------------
+ 00000000 0F 00 02 F0 C0 00 00 00 10 00 08 F0 08 00 00 00 ................
+ 00000010 02 00 00 00 02 04 00 00 0F 00 03 F0 A8 00 00 00 ................
+ 00000020 0F 00 04 F0 28 00 00 00 01 00 09 F0 10 00 00 00 ....(...........
+ 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000040 02 00 0A F0 08 00 00 00 00 04 00 00 05 00 00 00 ................
+ 00000050 0F 00 04 F0 70 00 00 00 92 0C 0A F0 08 00 00 00 ....p...........
+ 00000060 02 04 00 00 00 0A 00 00 93 00 0B F0 36 00 00 00 ............6...
+ 00000070 7F 00 04 01 04 01 BF 00 08 00 08 00 81 01 4E 00 ..............N.
+ 00000080 00 08 83 01 4D 00 00 08 BF 01 10 00 11 00 C0 01 ....M...........
+ 00000090 4D 00 00 08 FF 01 08 00 08 00 3F 02 00 00 02 00 M.........?.....
+ 000000A0 BF 03 00 00 08 00 00 00 10 F0 12 00 00 00 00 00 ................
+ 000000B0 04 00 C0 02 0A 00 F4 00 0E 00 66 01 20 00 E9 00 ..........f. ...
+ 000000C0 00 00 11 F0 00 00 00 00 ........
+ -END DUMP-----------------------------------
+ recordid = 0xec, size =200
+ [UNKNOWN RECORD:ec]
+ .id = ec
+ [/UNKNOWN RECORD]
+
+ ============================================
+ rectype = 0x5d, recsize = 0x1a
+ -BEGIN DUMP---------------------------------
+ 00000000 15 00 12 00 05 00 02 00 11 60 00 00 00 00 B8 03 .........`......
+ 00000010 87 03 00 00 00 00 00 00 00 00 ..........
+ -END DUMP-----------------------------------
+ recordid = 0x5d, size =26
+ [UNKNOWN RECORD:5d]
+ .id = 5d
+ [/UNKNOWN RECORD]
+
+ ============================================
+ rectype = 0x809, recsize = 0x10
+ -BEGIN DUMP---------------------------------
+ 00000000 00 06 20 00 FE 1C CD 07 C9 40 00 00 06 01 00 00 .. ......@......
+ -END DUMP-----------------------------------
+ recordid = 0x809, size =16
+ [BOF RECORD]
+ .version = 600
+ .type = 20
+ .build = 1cfe
+ .buildyear = 1997
+ .history = 40c9
+ .requiredversion = 106
+ [/BOF RECORD]
+
+ ============================================
+ rectype = 0x14, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x14, size =0
+ [HEADER]
+ .length = 0
+ .header = null
+ [/HEADER]
+
+ ============================================
+ rectype = 0x15, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x15, size =0
+ [FOOTER]
+ .footerlen = 0
+ .footer = null
+ [/FOOTER]
+
+ ============================================
+ rectype = 0x83, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x83, size =2
+ [HCENTER]
+ .hcenter = false
+ [/HCENTER]
+
+ ============================================
+ rectype = 0x84, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x84, size =2
+ [VCENTER]
+ .vcenter = false
+ [/VCENTER]
+
+ ============================================
+ rectype = 0xa1, recsize = 0x22
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 12 00 01 00 01 00 01 00 04 00 00 00 B8 03 ................
+ 00000010 00 00 00 00 00 00 E0 3F 00 00 00 00 00 00 E0 3F .......?.......?
+ 00000020 0F 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0xa1, size =34
+ [PRINTSETUP]
+ .papersize = 0
+ .scale = 18
+ .pagestart = 1
+ .fitwidth = 1
+ .fitheight = 1
+ .options = 4
+ .ltor = false
+ .landscape = false
+ .valid = true
+ .mono = false
+ .draft = false
+ .notes = false
+ .noOrientat = false
+ .usepage = false
+ .hresolution = 0
+ .vresolution = 952
+ .headermargin = 0.5
+ .footermargin = 0.5
+ .copies = 15
+ [/PRINTSETUP]
+
+ <!-- Comment to avoid forrest bug -->
+ ============================================
+ rectype = 0x33, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 03 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x33, size =2
+ [UNKNOWN RECORD:33]
+ .id = 33
+ [/UNKNOWN RECORD]
+
+ ============================================
+ rectype = 0x1060, recsize = 0xa
+ -BEGIN DUMP---------------------------------
+ 00000000 A0 23 08 16 C8 00 00 00 05 00 .#........
+ -END DUMP-----------------------------------
+ recordid = 0x1060, size =10
+ [FBI]
+ .xBasis = 0x23A0 (9120 )
+ .yBasis = 0x1608 (5640 )
+ .heightBasis = 0x00C8 (200 )
+ .scale = 0x0000 (0 )
+ .indexToFontTable = 0x0005 (5 )
+ [/FBI]
+
+ ============================================
+ rectype = 0x1060, recsize = 0xa
+ -BEGIN DUMP---------------------------------
+ 00000000 A0 23 08 16 C8 00 01 00 06 00 .#........
+ -END DUMP-----------------------------------
+ recordid = 0x1060, size =10
+ [FBI]
+ .xBasis = 0x23A0 (9120 )
+ .yBasis = 0x1608 (5640 )
+ .heightBasis = 0x00C8 (200 )
+ .scale = 0x0001 (1 )
+ .indexToFontTable = 0x0006 (6 )
+ [/FBI]
+
+ ============================================
+ rectype = 0x12, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x12, size =2
+ [PROTECT]
+ .rowheight = 0
+ [/PROTECT]
+
+ ============================================
+ Offset 0xf22 (3874)
+ rectype = 0x1001, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1001, size =2
+ [UNITS]
+ .units = 0x0000 (0 )
+ [/UNITS]
+
+ ============================================
+ Offset 0xf28 (3880)
+ rectype = 0x1002, recsize = 0x10
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 00 00 00 00 00 00 58 66 D0 01 40 66 22 01 ........Xf..@f".
+ -END DUMP-----------------------------------
+ recordid = 0x1002, size =16
+ [CHART]
+ .x = 0x00000000 (0 )
+ .y = 0x00000000 (0 )
+ .width = 0x01D06658 (30434904 )
+ .height = 0x01226640 (19031616 )
+ [/CHART]
+
+ ============================================
+ Offset 0xf3c (3900)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0xf40 (3904)
+ rectype = 0xa0, recsize = 0x4
+ -BEGIN DUMP---------------------------------
+ 00000000 01 00 01 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0xa0, size =4
+ [SCL]
+ .numerator = 0x0001 (1 )
+ .denominator = 0x0001 (1 )
+ [/SCL]
+
+ <!-- Comment to avoid forrest bug -->
+ ============================================
+ Offset 0xf48 (3912)
+ rectype = 0x1064, recsize = 0x8
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 01 00 00 00 01 00 ........
+ -END DUMP-----------------------------------
+ recordid = 0x1064, size =8
+ [PLOTGROWTH]
+ .horizontalScale = 0x00010000 (65536 )
+ .verticalScale = 0x00010000 (65536 )
+ [/PLOTGROWTH]
+
+ ============================================
+ Offset 0xf54 (3924)
+ rectype = 0x1032, recsize = 0x4
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 02 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0x1032, size =4
+ [FRAME]
+ .borderType = 0x0000 (0 )
+ .options = 0x0002 (2 )
+ .autoSize = false
+ .autoPosition = true
+ [/FRAME]
+
+ ============================================
+ Offset 0xf5c (3932)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0xf60 (3936)
+ rectype = 0x1007, recsize = 0xc
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 00 00 00 00 FF FF 09 00 4D 00 ..........M.
+ -END DUMP-----------------------------------
+ recordid = 0x1007, size =12
+ [LINEFORMAT]
+ .lineColor = 0x00000000 (0 )
+ .linePattern = 0x0000 (0 )
+ .weight = 0xFFFF (-1 )
+ .format = 0x0009 (9 )
+ .auto = true
+ .drawTicks = false
+ .unknown = false
+ .colourPaletteIndex = 0x004D (77 )
+ [/LINEFORMAT]
+
+ ============================================
+ Offset 0xf70 (3952)
+ rectype = 0x100a, recsize = 0x10
+ -BEGIN DUMP---------------------------------
+ 00000000 FF FF FF 00 00 00 00 00 01 00 01 00 4E 00 4D 00 ............N.M.
+ -END DUMP-----------------------------------
+ recordid = 0x100a, size =16
+ [AREAFORMAT]
+ .foregroundColor = 0x00FFFFFF (16777215 )
+ .backgroundColor = 0x00000000 (0 )
+ .pattern = 0x0001 (1 )
+ .formatFlags = 0x0001 (1 )
+ .automatic = true
+ .invert = false
+ .forecolorIndex = 0x004E (78 )
+ .backcolorIndex = 0x004D (77 )
+ [/AREAFORMAT]
+
+ ============================================
+ Offset 0xf84 (3972)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0xf88 (3976)
+ rectype = 0x1003, recsize = 0xc
+ -BEGIN DUMP---------------------------------
+ 00000000 01 00 01 00 20 00 1F 00 01 00 00 00 .... .......
+ -END DUMP-----------------------------------
+ recordid = 0x1003, size =12
+ [SERIES]
+ .categoryDataType = 0x0001 (1 )
+ .valuesDataType = 0x0001 (1 )
+ .numCategories = 0x0020 (32 )
+ .numValues = 0x001F (31 )
+ .bubbleSeriesType = 0x0001 (1 )
+ .numBubbleValues = 0x0000 (0 )
+ [/SERIES]
+
+ ============================================
+ Offset 0xf98 (3992)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ <!-- Comment to avoid forrest bug -->
+ ============================================
+ Offset 0xf9c (3996)
+ rectype = 0x1051, recsize = 0x8
+ -BEGIN DUMP---------------------------------
+ 00000000 00 01 00 00 00 00 00 00 ........
+ -END DUMP-----------------------------------
+ recordid = 0x1051, size =8
+ [AI]
+ .linkType = 0x00 (0 )
+ .referenceType = 0x01 (1 )
+ .options = 0x0000 (0 )
+ .customNumberFormat = false
+ .indexNumberFmtRecord = 0x0000 (0 )
+ .formulaOfLink = (org.apache.poi.hssf.record.LinkedDataFormulaField@1ee3914 )
+ [/AI]
+
+ ============================================
+ Offset 0xfa8 (4008)
+ rectype = 0x1051, recsize = 0x13
+ -BEGIN DUMP---------------------------------
+ 00000000 01 02 00 00 00 00 0B 00 3B 00 00 00 00 1E 00 01 ........;.......
+ 00000010 00 01 00 ...
+ -END DUMP-----------------------------------
+ recordid = 0x1051, size =19
+ [AI]
+ .linkType = 0x01 (1 )
+ .referenceType = 0x02 (2 )
+ .options = 0x0000 (0 )
+ .customNumberFormat = false
+ .indexNumberFmtRecord = 0x0000 (0 )
+ .formulaOfLink = (org.apache.poi.hssf.record.LinkedDataFormulaField@e5855a )
+ [/AI]
+
+ ============================================
+ Offset 0xfbf (4031)
+ rectype = 0x1051, recsize = 0x13
+ -BEGIN DUMP---------------------------------
+ 00000000 02 02 00 00 69 01 0B 00 3B 00 00 00 00 1F 00 00 ....i...;.......
+ 00000010 00 00 00 ...
+ -END DUMP-----------------------------------
+ recordid = 0x1051, size =19
+ [AI]
+ .linkType = 0x02 (2 )
+ .referenceType = 0x02 (2 )
+ .options = 0x0000 (0 )
+ .customNumberFormat = false
+ .indexNumberFmtRecord = 0x0169 (361 )
+ .formulaOfLink = (org.apache.poi.hssf.record.LinkedDataFormulaField@95fd19 )
+ [/AI]
+
+ ============================================
+ Offset 0xfd6 (4054)
+ rectype = 0x1051, recsize = 0x8
+ -BEGIN DUMP---------------------------------
+ 00000000 03 01 00 00 00 00 00 00 ........
+ -END DUMP-----------------------------------
+ recordid = 0x1051, size =8
+ [AI]
+ .linkType = 0x03 (3 )
+ .referenceType = 0x01 (1 )
+ .options = 0x0000 (0 )
+ .customNumberFormat = false
+ .indexNumberFmtRecord = 0x0000 (0 )
+ .formulaOfLink = (org.apache.poi.hssf.record.LinkedDataFormulaField@11b9fb1 )
+ [/AI]
+
+ ============================================
+ Offset 0xfe2 (4066)
+ rectype = 0x1006, recsize = 0x8
+ -BEGIN DUMP---------------------------------
+ 00000000 FF FF 00 00 00 00 00 00 ........
+ -END DUMP-----------------------------------
+ recordid = 0x1006, size =8
+ [DATAFORMAT]
+ .pointNumber = 0xFFFF (-1 )
+ .seriesIndex = 0x0000 (0 )
+ .seriesNumber = 0x0000 (0 )
+ .formatFlags = 0x0000 (0 )
+ .useExcel4Colors = false
+ [/DATAFORMAT]
+
+ ============================================
+ Offset 0xfee (4078)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0xff2 (4082)
+ rectype = 0x105f, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x105f, size =2
+ [UNKNOWN RECORD]
+ .id = 105f
+ [/UNKNOWN RECORD]
+
+ ============================================
+ Offset 0xff8 (4088)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0xffc (4092)
+ rectype = 0x1045, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1045, size =2
+ [SeriesToChartGroup]
+ .chartGroupIndex = 0x0000 (0 )
+ [/SeriesToChartGroup]
+
+ ============================================
+ Offset 0x1002 (4098)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0x1006 (4102)
+ rectype = 0x1044, recsize = 0x4
+ -BEGIN DUMP---------------------------------
+ 00000000 0A 00 00 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0x1044, size =4
+ [SHTPROPS]
+ .flags = 0x000A (10 )
+ .chartTypeManuallyFormatted = false
+ .plotVisibleOnly = true
+ .doNotSizeWithWindow = false
+ .defaultPlotDimensions = true
+ .autoPlotArea = false
+ .empty = 0x00 (0 )
+ [/SHTPROPS]
+
+ ============================================
+ Offset 0x100e (4110)
+ rectype = 0x1024, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 02 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1024, size =2
+ [DEFAULTTEXT]
+ .categoryDataType = 0x0002 (2 )
+ [/DEFAULTTEXT]
+
+ ============================================
+ Offset 0x1014 (4116)
+ rectype = 0x1025, recsize = 0x20
+ -BEGIN DUMP---------------------------------
+ 00000000 02 02 01 00 00 00 00 00 DB FF FF FF C4 FF FF FF ................
+ 00000010 00 00 00 00 00 00 00 00 B1 00 4D 00 50 2B 00 00 ..........M.P+..
+ -END DUMP-----------------------------------
+ recordid = 0x1025, size =32
+ [TEXT]
+ .horizontalAlignment = 0x02 (2 )
+ .verticalAlignment = 0x02 (2 )
+ .displayMode = 0x0001 (1 )
+ .rgbColor = 0x00000000 (0 )
+ .x = 0xFFFFFFDB (-37 )
+ .y = 0xFFFFFFC4 (-60 )
+ .width = 0x00000000 (0 )
+ .height = 0x00000000 (0 )
+ .options1 = 0x00B1 (177 )
+ .autoColor = true
+ .showKey = false
+ .showValue = false
+ .vertical = false
+ .autoGeneratedText = true
+ .generated = true
+ .autoLabelDeleted = false
+ .autoBackground = true
+ .rotation = 0
+ .showCategoryLabelAsPercentage = false
+ .showValueAsPercentage = false
+ .showBubbleSizes = false
+ .showLabel = false
+ .indexOfColorValue = 0x004D (77 )
+ .options2 = 0x2B50 (11088 )
+ .dataLabelPlacement = 0
+ .textRotation = 0x0000 (0 )
+ [/TEXT]
+
+ ============================================
+ Offset 0x1038 (4152)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ <!-- Comment to avoid forrest bug -->
+ ============================================
+ Offset 0x103c (4156)
+ rectype = 0x104f, recsize = 0x14
+ -BEGIN DUMP---------------------------------
+ 00000000 02 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000010 00 00 00 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0x104f, size =20
+ [UNKNOWN RECORD]
+ .id = 104f
+ [/UNKNOWN RECORD]
+
+ ============================================
+ Offset 0x1054 (4180)
+ rectype = 0x1026, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 05 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1026, size =2
+ [FONTX]
+ .fontIndex = 0x0005 (5 )
+ [/FONTX]
+
+ ============================================
+ Offset 0x105a (4186)
+ rectype = 0x1051, recsize = 0x8
+ -BEGIN DUMP---------------------------------
+ 00000000 00 01 00 00 00 00 00 00 ........
+ -END DUMP-----------------------------------
+ recordid = 0x1051, size =8
+ [AI]
+ .linkType = 0x00 (0 )
+ .referenceType = 0x01 (1 )
+ .options = 0x0000 (0 )
+ .customNumberFormat = false
+ .indexNumberFmtRecord = 0x0000 (0 )
+ .formulaOfLink = (org.apache.poi.hssf.record.LinkedDataFormulaField@913fe2 )
+ [/AI]
+
+ ============================================
+ Offset 0x1066 (4198)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0x106a (4202)
+ rectype = 0x1024, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 03 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1024, size =2
+ [DEFAULTTEXT]
+ .categoryDataType = 0x0003 (3 )
+ [/DEFAULTTEXT]
+
+ ============================================
+ Offset 0x1070 (4208)
+ rectype = 0x1025, recsize = 0x20
+ -BEGIN DUMP---------------------------------
+ 00000000 02 02 01 00 00 00 00 00 DB FF FF FF C4 FF FF FF ................
+ 00000010 00 00 00 00 00 00 00 00 B1 00 4D 00 50 2B 00 00 ..........M.P+..
+ -END DUMP-----------------------------------
+ recordid = 0x1025, size =32
+ [TEXT]
+ .horizontalAlignment = 0x02 (2 )
+ .verticalAlignment = 0x02 (2 )
+ .displayMode = 0x0001 (1 )
+ .rgbColor = 0x00000000 (0 )
+ .x = 0xFFFFFFDB (-37 )
+ .y = 0xFFFFFFC4 (-60 )
+ .width = 0x00000000 (0 )
+ .height = 0x00000000 (0 )
+ .options1 = 0x00B1 (177 )
+ .autoColor = true
+ .showKey = false
+ .showValue = false
+ .vertical = false
+ .autoGeneratedText = true
+ .generated = true
+ .autoLabelDeleted = false
+ .autoBackground = true
+ .rotation = 0
+ .showCategoryLabelAsPercentage = false
+ .showValueAsPercentage = false
+ .showBubbleSizes = false
+ .showLabel = false
+ .indexOfColorValue = 0x004D (77 )
+ .options2 = 0x2B50 (11088 )
+ .dataLabelPlacement = 0
+ .textRotation = 0x0000 (0 )
+ [/TEXT]
+
+ ============================================
+ Offset 0x1094 (4244)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0x1098 (4248)
+ rectype = 0x104f, recsize = 0x14
+ -BEGIN DUMP---------------------------------
+ 00000000 02 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000010 00 00 00 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0x104f, size =20
+ [UNKNOWN RECORD]
+ .id = 104f
+ [/UNKNOWN RECORD]
+
+ ============================================
+ Offset 0x10b0 (4272)
+ rectype = 0x1026, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 06 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1026, size =2
+ [FONTX]
+ .fontIndex = 0x0006 (6 )
+ [/FONTX]
+
+ ============================================
+ Offset 0x10b6 (4278)
+ rectype = 0x1051, recsize = 0x8
+ -BEGIN DUMP---------------------------------
+ 00000000 00 01 00 00 00 00 00 00 ........
+ -END DUMP-----------------------------------
+ recordid = 0x1051, size =8
+ [AI]
+ .linkType = 0x00 (0 )
+ .referenceType = 0x01 (1 )
+ .options = 0x0000 (0 )
+ .customNumberFormat = false
+ .indexNumberFmtRecord = 0x0000 (0 )
+ .formulaOfLink = (org.apache.poi.hssf.record.LinkedDataFormulaField@1f934ad )
+ [/AI]
+
+ ============================================
+ Offset 0x10c2 (4290)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0x10c6 (4294)
+ rectype = 0x1046, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 01 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1046, size =2
+ [AXISUSED]
+ .numAxis = 0x0001 (1 )
+ [/AXISUSED]
+
+ ============================================
+ Offset 0x10cc (4300)
+ rectype = 0x1041, recsize = 0x12
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 DF 01 00 00 DD 00 00 00 B3 0B 00 00 56 0B ..............V.
+ 00000010 00 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1041, size =18
+ [AXISPARENT]
+ .axisType = 0x0000 (0 )
+ .x = 0x000001DF (479 )
+ .y = 0x000000DD (221 )
+ .width = 0x00000BB3 (2995 )
+ .height = 0x00000B56 (2902 )
+ [/AXISPARENT]
+
+ ============================================
+ Offset 0x10e2 (4322)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0x10e6 (4326)
+ rectype = 0x104f, recsize = 0x14
+ -BEGIN DUMP---------------------------------
+ 00000000 02 00 02 00 3A 00 00 00 5E 00 00 00 58 0D 00 00 ....:...^...X...
+ 00000010 E5 0E 00 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0x104f, size =20
+ [UNKNOWN RECORD]
+ .id = 104f
+ [/UNKNOWN RECORD]
+
+ ============================================
+ Offset 0x10fe (4350)
+ rectype = 0x101d, recsize = 0x12
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000010 00 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x101d, size =18
+ [AXIS]
+ .axisType = 0x0000 (0 )
+ .reserved1 = 0x00000000 (0 )
+ .reserved2 = 0x00000000 (0 )
+ .reserved3 = 0x00000000 (0 )
+ .reserved4 = 0x00000000 (0 )
+ [/AXIS]
+
+ ============================================
+ Offset 0x1114 (4372)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0x1118 (4376)
+ rectype = 0x1020, recsize = 0x8
+ -BEGIN DUMP---------------------------------
+ 00000000 01 00 01 00 01 00 01 00 ........
+ -END DUMP-----------------------------------
+ recordid = 0x1020, size =8
+ [CATSERRANGE]
+ .crossingPoint = 0x0001 (1 )
+ .labelFrequency = 0x0001 (1 )
+ .tickMarkFrequency = 0x0001 (1 )
+ .options = 0x0001 (1 )
+ .valueAxisCrossing = true
+ .crossesFarRight = false
+ .reversed = false
+ [/CATSERRANGE]
+
+ ============================================
+ Offset 0x1124 (4388)
+ rectype = 0x1062, recsize = 0x12
+ -BEGIN DUMP---------------------------------
+ 00000000 1C 90 39 90 02 00 00 00 01 00 00 00 00 00 1C 90 ..9.............
+ 00000010 FF 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1062, size =18
+ [AXCEXT]
+ .minimumCategory = 0x901C (-28644 )
+ .maximumCategory = 0x9039 (-28615 )
+ .majorUnitValue = 0x0002 (2 )
+ .majorUnit = 0x0000 (0 )
+ .minorUnitValue = 0x0001 (1 )
+ .minorUnit = 0x0000 (0 )
+ .baseUnit = 0x0000 (0 )
+ .crossingPoint = 0x901C (-28644 )
+ .options = 0x00FF (255 )
+ .defaultMinimum = true
+ .defaultMaximum = true
+ .defaultMajor = true
+ .defaultMinorUnit = true
+ .isDate = true
+ .defaultBase = true
+ .defaultCross = true
+ .defaultDateSettings = true
+ [/AXCEXT]
+
+ ============================================
+ Offset 0x113a (4410)
+ rectype = 0x101e, recsize = 0x1e
+ -BEGIN DUMP---------------------------------
+ 00000000 02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000010 00 00 00 00 00 00 00 00 23 00 4D 00 2D 00 ........#.M.-.
+ -END DUMP-----------------------------------
+ recordid = 0x101e, size =30
+ [TICK]
+ .majorTickType = 0x02 (2 )
+ .minorTickType = 0x00 (0 )
+ .labelPosition = 0x03 (3 )
+ .background = 0x01 (1 )
+ .labelColorRgb = 0x00000000 (0 )
+ .zero1 = 0x0000 (0 )
+ .zero2 = 0x0000 (0 )
+ .options = 0x0023 (35 )
+ .autoTextColor = true
+ .autoTextBackground = true
+ .rotation = 0
+ .autorotate = true
+ .tickColor = 0x004D (77 )
+ .zero3 = 0x002D (45 )
+ [/TICK]
+
+ ============================================
+ Offset 0x115c (4444)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0x1160 (4448)
+ rectype = 0x101d, recsize = 0x12
+ -BEGIN DUMP---------------------------------
+ 00000000 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000010 00 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x101d, size =18
+ [AXIS]
+ .axisType = 0x0001 (1 )
+ .reserved1 = 0x00000000 (0 )
+ .reserved2 = 0x00000000 (0 )
+ .reserved3 = 0x00000000 (0 )
+ .reserved4 = 0x00000000 (0 )
+ [/AXIS]
+
+ <!-- Comment to avoid forrest bug -->
+ ============================================
+ Offset 0x1176 (4470)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0x117a (4474)
+ rectype = 0x101f, recsize = 0x2a
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000020 00 00 00 00 00 00 00 00 1F 01 ..........
+ -END DUMP-----------------------------------
+ recordid = 0x101f, size =42
+ [VALUERANGE]
+ .minimumAxisValue = (0.0 )
+ .maximumAxisValue = (0.0 )
+ .majorIncrement = (0.0 )
+ .minorIncrement = (0.0 )
+ .categoryAxisCross = (0.0 )
+ .options = 0x011F (287 )
+ .automaticMinimum = true
+ .automaticMaximum = true
+ .automaticMajor = true
+ .automaticMinor = true
+ .automaticCategoryCrossing = true
+ .logarithmicScale = false
+ .valuesInReverse = false
+ .crossCategoryAxisAtMaximum = false
+ .reserved = true
+ [/VALUERANGE]
+
+ ============================================
+ Offset 0x11a8 (4520)
+ rectype = 0x101e, recsize = 0x1e
+ -BEGIN DUMP---------------------------------
+ 00000000 02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000010 00 00 00 00 00 00 00 00 23 00 4D 00 00 00 ........#.M...
+ -END DUMP-----------------------------------
+ recordid = 0x101e, size =30
+ [TICK]
+ .majorTickType = 0x02 (2 )
+ .minorTickType = 0x00 (0 )
+ .labelPosition = 0x03 (3 )
+ .background = 0x01 (1 )
+ .labelColorRgb = 0x00000000 (0 )
+ .zero1 = 0x0000 (0 )
+ .zero2 = 0x0000 (0 )
+ .options = 0x0023 (35 )
+ .autoTextColor = true
+ .autoTextBackground = true
+ .rotation = 0
+ .autorotate = true
+ .tickColor = 0x004D (77 )
+ .zero3 = 0x0000 (0 )
+ [/TICK]
+
+ ============================================
+ Offset 0x11ca (4554)
+ rectype = 0x1021, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 01 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1021, size =2
+ [AXISLINEFORMAT]
+ .axisType = 0x0001 (1 )
+ [/AXISLINEFORMAT]
+
+ ============================================
+ Offset 0x11d0 (4560)
+ rectype = 0x1007, recsize = 0xc
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 00 00 00 00 FF FF 09 00 4D 00 ..........M.
+ -END DUMP-----------------------------------
+ recordid = 0x1007, size =12
+ [LINEFORMAT]
+ .lineColor = 0x00000000 (0 )
+ .linePattern = 0x0000 (0 )
+ .weight = 0xFFFF (-1 )
+ .format = 0x0009 (9 )
+ .auto = true
+ .drawTicks = false
+ .unknown = false
+ .colourPaletteIndex = 0x004D (77 )
+ [/LINEFORMAT]
+
+ ============================================
+ Offset 0x11e0 (4576)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0x11e4 (4580)
+ rectype = 0x1035, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1035, size =0
+ [PLOTAREA]
+ [/PLOTAREA]
+
+ ============================================
+ Offset 0x11e8 (4584)
+ rectype = 0x1032, recsize = 0x4
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 03 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0x1032, size =4
+ [FRAME]
+ .borderType = 0x0000 (0 )
+ .options = 0x0003 (3 )
+ .autoSize = true
+ .autoPosition = true
+ [/FRAME]
+
+ ============================================
+ Offset 0x11f0 (4592)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0x11f4 (4596)
+ rectype = 0x1007, recsize = 0xc
+ -BEGIN DUMP---------------------------------
+ 00000000 80 80 80 00 00 00 00 00 00 00 17 00 ............
+ -END DUMP-----------------------------------
+ recordid = 0x1007, size =12
+ [LINEFORMAT]
+ .lineColor = 0x00808080 (8421504 )
+ .linePattern = 0x0000 (0 )
+ .weight = 0x0000 (0 )
+ .format = 0x0000 (0 )
+ .auto = false
+ .drawTicks = false
+ .unknown = false
+ .colourPaletteIndex = 0x0017 (23 )
+ [/LINEFORMAT]
+
+ ============================================
+ Offset 0x1204 (4612)
+ rectype = 0x100a, recsize = 0x10
+ -BEGIN DUMP---------------------------------
+ 00000000 C0 C0 C0 00 00 00 00 00 01 00 00 00 16 00 4F 00 ..............O.
+ -END DUMP-----------------------------------
+ recordid = 0x100a, size =16
+ [AREAFORMAT]
+ .foregroundColor = 0x00C0C0C0 (12632256 )
+ .backgroundColor = 0x00000000 (0 )
+ .pattern = 0x0001 (1 )
+ .formatFlags = 0x0000 (0 )
+ .automatic = false
+ .invert = false
+ .forecolorIndex = 0x0016 (22 )
+ .backcolorIndex = 0x004F (79 )
+ [/AREAFORMAT]
+
+ ============================================
+ Offset 0x1218 (4632)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0x121c (4636)
+ rectype = 0x1014, recsize = 0x14
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000010 00 00 00 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0x1014, size =20
+ [CHARTFORMAT]
+ .xPosition = 0
+ .yPosition = 0
+ .width = 0
+ .height = 0
+ .grBit = 0
+ [/CHARTFORMAT]
+
+ ============================================
+ Offset 0x1234 (4660)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0x1238 (4664)
+ rectype = 0x1017, recsize = 0x6
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 96 00 00 00 ......
+ -END DUMP-----------------------------------
+ recordid = 0x1017, size =6
+ [BAR]
+ .barSpace = 0x0000 (0 )
+ .categorySpace = 0x0096 (150 )
+ .formatFlags = 0x0000 (0 )
+ .horizontal = false
+ .stacked = false
+ .displayAsPercentage = false
+ .shadow = false
+ [/BAR]
+
+ ============================================
+ Offset 0x1242 (4674)
+ rectype = 0x1022, recsize = 0xa
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 00 00 00 00 00 00 0F 00 ..........
+ -END DUMP-----------------------------------
+ recordid = 0x1022, size =10
+ [UNKNOWN RECORD]
+ .id = 1022
+ [/UNKNOWN RECORD]
+
+ ============================================
+ Offset 0x1250 (4688)
+ rectype = 0x1015, recsize = 0x14
+ -BEGIN DUMP---------------------------------
+ 00000000 D6 0D 00 00 1E 06 00 00 B5 01 00 00 D5 00 00 00 ................
+ 00000010 03 01 1F 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0x1015, size =20
+ [LEGEND]
+ .xAxisUpperLeft = 0x00000DD6 (3542 )
+ .yAxisUpperLeft = 0x0000061E (1566 )
+ .xSize = 0x000001B5 (437 )
+ .ySize = 0x000000D5 (213 )
+ .type = 0x03 (3 )
+ .spacing = 0x01 (1 )
+ .options = 0x001F (31 )
+ .autoPosition = true
+ .autoSeries = true
+ .autoXPositioning = true
+ .autoYPositioning = true
+ .vertical = true
+ .dataTable = false
+ [/LEGEND]
+
+ ============================================
+ Offset 0x1268 (4712)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0x126c (4716)
+ rectype = 0x104f, recsize = 0x14
+ -BEGIN DUMP---------------------------------
+ 00000000 05 00 02 00 D6 0D 00 00 1E 06 00 00 00 00 00 00 ................
+ 00000010 00 00 00 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0x104f, size =20
+ [UNKNOWN RECORD]
+ .id = 104f
+ [/UNKNOWN RECORD]
+
+ ============================================
+ Offset 0x1284 (4740)
+ rectype = 0x1025, recsize = 0x20
+ -BEGIN DUMP---------------------------------
+ 00000000 02 02 01 00 00 00 00 00 DB FF FF FF C4 FF FF FF ................
+ 00000010 00 00 00 00 00 00 00 00 B1 00 4D 00 70 37 00 00 ..........M.p7..
+ -END DUMP-----------------------------------
+ recordid = 0x1025, size =32
+ [TEXT]
+ .horizontalAlignment = 0x02 (2 )
+ .verticalAlignment = 0x02 (2 )
+ .displayMode = 0x0001 (1 )
+ .rgbColor = 0x00000000 (0 )
+ .x = 0xFFFFFFDB (-37 )
+ .y = 0xFFFFFFC4 (-60 )
+ .width = 0x00000000 (0 )
+ .height = 0x00000000 (0 )
+ .options1 = 0x00B1 (177 )
+ .autoColor = true
+ .showKey = false
+ .showValue = false
+ .vertical = false
+ .autoGeneratedText = true
+ .generated = true
+ .autoLabelDeleted = false
+ .autoBackground = true
+ .rotation = 0
+ .showCategoryLabelAsPercentage = false
+ .showValueAsPercentage = false
+ .showBubbleSizes = false
+ .showLabel = false
+ .indexOfColorValue = 0x004D (77 )
+ .options2 = 0x3770 (14192 )
+ .dataLabelPlacement = 0
+ .textRotation = 0x0000 (0 )
+ [/TEXT]
+
+ ============================================
+ Offset 0x12a8 (4776)
+ rectype = 0x1033, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1033, size =0
+ [BEGIN]
+ [/BEGIN]
+
+ ============================================
+ Offset 0x12ac (4780)
+ rectype = 0x104f, recsize = 0x14
+ -BEGIN DUMP---------------------------------
+ 00000000 02 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+ 00000010 00 00 00 00 ....
+ -END DUMP-----------------------------------
+ recordid = 0x104f, size =20
+ [UNKNOWN RECORD]
+ .id = 104f
+ [/UNKNOWN RECORD]
+
+ ============================================
+ Offset 0x12c4 (4804)
+ rectype = 0x1051, recsize = 0x8
+ -BEGIN DUMP---------------------------------
+ 00000000 00 01 00 00 00 00 00 00 ........
+ -END DUMP-----------------------------------
+ recordid = 0x1051, size =8
+ [AI]
+ .linkType = 0x00 (0 )
+ .referenceType = 0x01 (1 )
+ .options = 0x0000 (0 )
+ .customNumberFormat = false
+ .indexNumberFmtRecord = 0x0000 (0 )
+ .formulaOfLink = (org.apache.poi.hssf.record.LinkedDataFormulaField@1d05c81 )
+ [/AI]
+
+ ============================================
+ Offset 0x12d0 (4816)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ <!-- Comment to avoid forrest bug -->
+ ============================================
+ Offset 0x12d4 (4820)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0x12d8 (4824)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0x12dc (4828)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ Offset 0x12e0 (4832)
+ rectype = 0x1034, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0x1034, size =0
+ [END]
+ [/END]
+
+ ============================================
+ rectype = 0x200, recsize = 0xe
+ -BEGIN DUMP---------------------------------
+ 00000000 00 00 00 00 1F 00 00 00 00 00 01 00 00 00 ..............
+ -END DUMP-----------------------------------
+ recordid = 0x200, size =14
+ [DIMENSIONS]
+ .firstrow = 0
+ .lastrow = 1f
+ .firstcol = 0
+ .lastcol = 1
+ .zero = 0
+ [/DIMENSIONS]
+
+ ============================================
+ rectype = 0x1065, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 02 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1065, size =2
+ [SINDEX]
+ .index = 0x0002 (2 )
+ [/SINDEX]
+
+ ============================================
+ rectype = 0x1065, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 01 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1065, size =2
+ [SINDEX]
+ .index = 0x0001 (1 )
+ [/SINDEX]
+
+ ============================================
+ rectype = 0x1065, recsize = 0x2
+ -BEGIN DUMP---------------------------------
+ 00000000 03 00 ..
+ -END DUMP-----------------------------------
+ recordid = 0x1065, size =2
+ [SINDEX]
+ .index = 0x0003 (3 )
+ [/SINDEX]
+
+ ============================================
+ rectype = 0xa, recsize = 0x0
+ -BEGIN DUMP---------------------------------
+ **NO RECORD DATA**
+ -END DUMP-----------------------------------
+ recordid = 0xa, size =0
+ [EOF]
+ [/EOF]
+
+
+ </source>
+ <p>
+ The next section breaks those records down into an easier
+ to read format:
+ </p>
+ <source>
+[UNKNOWN RECORD:ec]
+[UNKNOWN RECORD:5d]
+[BOF RECORD]
+ [HEADER]
+ [FOOTER]
+ [HCENTER]
+ [VCENTER]
+ [PRINTSETUP]
+ [UNKNOWN RECORD:33]
+ [FBI]
+ [FBI]
+ [PROTECT]
+ [UNITS]
+ [CHART]
+ [BEGIN]
+ [SCL] // zoom magnification
+ [PLOTGROWTH] // font scaling
+ [FRAME] // border around text
+ [BEGIN] // default line and area format
+ [LINEFORMAT]
+ [AREAFORMAT]
+ [END]
+ [SERIES] // start of series
+ [BEGIN]
+ [AI] // LINK_TYPE_TITLE_OR_TEXT
+ [AI] // LINK_TYPE_VALUES
+ [AI] // LINK_TYPE_CATEGORIES
+ [AI] // ??
+ [DATAFORMAT] // Formatting applies to series?
+ [BEGIN] // ??
+ [UNKNOWN RECORD]
+ [END]
+ [SeriesToChartGroup] // Used to support > 1 chart?
+ [END]
+ [SHTPROPS] // Some defaults for how chart is displayed.
+ [DEFAULTTEXT] // Describes the characteristics of the next
+ // record
+ [TEXT] // Details of the text that follows in the
+ // next section
+ [BEGIN]
+ [UNKNOWN RECORD] // POS record... looks like I missed this one.
+ // docs seem to indicate it's better to use
+ // defaults...
+ [FONTX] // index to font record.
+ [AI] // link to text? seems to be linking to nothing
+ [END]
+ [DEFAULTTEXT] // contains a category type of 3 which is not
+ // documented (sigh).
+ [TEXT] // defines position, color etc for text on chart.
+ [BEGIN]
+ [UNKNOWN RECORD] // Another pos record
+ [FONTX] // font
+ [AI] // reference type is DIRECT (not sure what this
+ // means)
+ [END]
+ [AXISUSED] // number of axis on the chart.
+ [AXISPARENT] // axis size and location
+ [BEGIN] // beginning of axis details
+ [UNKNOWN RECORD] // Another pos record.
+ [AXIS] // Category axis
+ [BEGIN]
+ [CATSERRANGE] // defines tick marks and other stuff
+ [AXCEXT] // unit information
+ [TICK] // tick formating characteristics
+ [END]
+ [AXIS] // Value axis
+ [BEGIN]
+ [VALUERANGE] // defines tick marks and other stuff
+ [TICK] // tick formating characteristics
+ [AXISLINEFORMAT] // major grid line axis format
+ [LINEFORMAT] // what do the lines look like?
+ [END]
+ [PLOTAREA] // marks that the frame following belongs
+ // to the frame.
+ [FRAME] // border
+ [BEGIN]
+ [LINEFORMAT] // border line
+ [AREAFORMAT] // border area
+ [END]
+ [CHARTFORMAT] // marks a chart group
+ [BEGIN]
+ [BAR] // indicates a bar chart
+ [UNKNOWN RECORD] // apparently this record is ignoreable
+ [LEGEND] // positioning for the legend
+ [BEGIN]
+ [UNKNOWN RECORD] // another position record.
+ [TEXT] // details of the text that follows
+ // in the next section
+ [BEGIN]
+ [UNKNOWN RECORD] // yet another pos record
+ [AI] // another link (of type direct)
+ [END]
+ [END]
+ [END]
+ [END]
+ [END]
+ [DIMENSIONS]
+ [SINDEX]
+ [SINDEX]
+ [SINDEX]
+[EOF]
+ </source>
+ <p>
+ Just a quick note on some of the unknown records:
+ </p>
+ <ul>
+ <li>EC: MSODRAWING - A Microsoft drawing record. (Need to
+ track down where this is documented).</li>
+ <li>5D: OBJ: Description of a drawing object. (This is going to
+ be a PITA to implement).</li>
+ <li>33: Not documented. :-(</li>
+ <li>105f: Not documented. :-(</li>
+ <li>104f: POS: Position record (should be able to safely leave this out).</li>
+ <li>1022: CHARTFORMATLINK: Can be left out.</li>
+ </ul>
+ <p>
+ It is currently suspected that many of those records could be
+ left out when generating a bar chart from scratch. The way
+ we will be proceeding with this is to write code that generates
+ most of these records and then start removing them to see
+ how this effects the chart in excel.
+ </p>
+ </section>
+ <section><title>Inserting the Chart into the Workbook</title>
+ <ul>
+ <li>
+ Unknown record (sid=00eb) is inserted before the SST
+ record.
+ </li>
+ </ul>
+ <source>
+ ============================================
+ rectype = 0xeb, recsize = 0x5a
+ -BEGIN DUMP---------------------------------
+ 00000000 0F 00 00 F0 52 00 00 00 00 00 06 F0 18 00 00 00 ....R...........
+ 00000010 01 08 00 00 02 00 00 00 02 00 00 00 01 00 00 00 ................
+ 00000020 01 00 00 00 03 00 00 00 33 00 0B F0 12 00 00 00 ........3.......
+ 00000030 BF 00 08 00 08 00 81 01 09 00 00 08 C0 01 40 00 ..............@.
+ 00000040 00 08 40 00 1E F1 10 00 00 00 0D 00 00 08 0C 00 ..@.............
+ 00000050 00 08 17 00 00 08 F7 00 00 10 ..........
+ -END DUMP-----------------------------------
+ recordid = 0xeb, size =90
+ [UNKNOWN RECORD:eb]
+ .id = eb
+ [/UNKNOWN RECORD]
+
+ ============================================
+ </source>
+ <ul>
+ <li>
+ Any extra font records are inserted as needed
+ </li>
+ <li>
+ Chart records inserted after DBCell records.
+ </li>
+ </ul>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/converting.xml b/src/documentation/content/xdocs/components/spreadsheet/converting.xml
new file mode 100644
index 0000000000..0700cc4e43
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/converting.xml
@@ -0,0 +1,232 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Upgrading to POI 3.5, including converting existing HSSF Usermodel code to SS Usermodel (for XSSF and HSSF)</title>
+ <authors>
+ <person email="nick@apache.org" name="Nick Burch" id="NB"/>
+ </authors>
+ </header>
+ <body>
+<section><title>Things that have to be changed when upgrading to POI 3.5</title>
+ <p>Wherever possible, we have tried to ensure that you can use your
+ existing POI code with POI 3.5 without requiring any changes. However,
+ Java doesn't always make that easy, and unfortunately there are a
+ few changes that may be required for some users.</p>
+ <section><title>org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.CellValue</title>
+ <p>Annoyingly, java will not let you access a static inner class via
+ a child of the parent one. So, all references to
+ <em>org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.CellValue</em>
+ will need to be changed to
+ <em>org.apache.poi.ss.usermodel.FormulaEvaluator.CellValue</em>
+ </p>
+ </section>
+ <section><title>org.apache.poi.hssf.usermodel.HSSFRow.MissingCellPolicy</title>
+ <p>Annoyingly, java will not let you access a static inner class via
+ a child of the parent one. So, all references to
+ <em>org.apache.poi.hssf.usermodel.HSSFRow.MissingCellPolicy</em>
+ will need to be changed to
+ <em>org.apache.poi.ss.usermodel.Row.MissingCellPolicy</em>
+ </p>
+ </section>
+ <section><title>DDF and org.apache.poi.hssf.record.RecordFormatException</title>
+ <p>Previously, record level errors within DDF would throw an
+ exception from the hssf class hierarchy. Now, record level errors
+ within DDF will throw a more general RecordFormatException,
+ <em>org.apache.poi.util.RecordFormatException</em></p>
+ <p>In addition, org.apache.poi.hssf.record.RecordFormatException
+ has been changed to inherit from the new
+ <em>org.apache.poi.util.RecordFormatException</em>, so you may
+ wish to change catches of the hssf version to the new util version.
+ </p>
+ </section>
+ </section>
+ <section><title>Converting existing HSSF Usermodel code to SS Usermodel (for XSSF and HSSF)</title>
+
+ <section><title>Why change?</title>
+ <p>If you have existing HSSF usermodel code that works just
+ fine, and you don't want to use the new OOXML XSSF support,
+ then you probably don't need to. Your existing HSSF only code
+ will continue to work just fine.</p>
+ <p>However, if you want to be able to work with both HSSF for
+ your .xls files, and also XSSF for .xslx files, then you will
+ need to make some slight tweaks to your code.</p>
+ </section>
+ <section><title>org.apache.poi.ss.usermodel</title>
+ <p>The new SS usermodel (org.apache.poi.ss.usermodel) is very
+ heavily based on the old HSSF usermodel
+ (org.apache.poi.hssf.usermodel). The main difference is that
+ the package name and class names have been tweaked to remove
+ HSSF from them. Otherwise, the new SS Usermodel interfaces
+ should provide the same functionality.</p>
+ </section>
+ <section><title>Constructors</title>
+ <p>Calling the empty HSSFWorkbook remains as the way to
+ create a new, empty Workbook object. To open an existing
+ Workbook, you should now call WorkbookFactory.create(inp).</p>
+ <p>For all other cases when you would have called a
+ Usermodel constructor, such as 'new HSSFRichTextString()' or
+ 'new HSSFDataFormat', you should instead use a CreationHelper.
+ There's a method on the Workbook to get a CreationHelper, and
+ the CreationHelper will then handle constructing new objects
+ for you.</p>
+ </section>
+ <section><title>Other Code</title>
+ <p>For all other code, generally change a reference from
+ org.apache.poi.hssf.usermodel.HSSFFoo to a reference to
+ org.apache.poi.ss.usermodel.Foo. Method signatures should
+ otherwise remain the same, and it should all then work for
+ both XSSF and HSSF.</p>
+ </section>
+ </section>
+ <section><title>Worked Examples</title>
+ <section><title>Old HSSF Code</title>
+<source><![CDATA[
+// import org.apache.poi.hssf.usermodel.*;
+
+HSSFWorkbook wb = new HSSFWorkbook();
+// create a new sheet
+HSSFSheet s = wb.createSheet();
+// declare a row object reference
+HSSFRow r = null;
+// declare a cell object reference
+HSSFCell c = null;
+// create 2 cell styles
+HSSFCellStyle cs = wb.createCellStyle();
+HSSFCellStyle cs2 = wb.createCellStyle();
+HSSFDataFormat df = wb.createDataFormat();
+
+// create 2 fonts objects
+HSSFFont f = wb.createFont();
+HSSFFont f2 = wb.createFont();
+
+// Set font 1 to 12 point type, blue and bold
+f.setFontHeightInPoints((short) 12);
+f.setColor( HSSFColor.RED.index );
+f.setBoldweight(HSSFFont.BOLDWEIGHT_BOLD);
+
+// Set font 2 to 10 point type, red and bold
+f2.setFontHeightInPoints((short) 10);
+f2.setColor( HSSFFont.RED.index );
+f2.setBoldweight(HSSFFont.BOLDWEIGHT_BOLD);
+
+// Set cell style and formatting
+cs.setFont(f);
+cs.setDataFormat(df.getFormat("#,##0.0"));
+
+// Set the other cell style and formatting
+cs2.setBorderBottom(cs2.BORDER_THIN);
+cs2.setDataFormat(HSSFDataFormat.getBuiltinFormat("text"));
+cs2.setFont(f2);
+
+
+// Define a few rows
+for(short rownum = (short)0; rownum < 30; rownum++) {
+ HSSFRow r = s.createRow(rownum);
+ for(short cellnum = (short)0; cellnum < 10; cellnum += 2) {
+ HSSFCell c = r.createCell(cellnum);
+ HSSFCell c2 = r.createCell(cellnum+1);
+
+ c.setCellValue((double)rownum + (cellnum/10));
+ c2.setCellValue(new HSSFRichTextString("Hello! " + cellnum);
+ }
+}
+
+// Save
+FileOutputStream out = new FileOutputStream("workbook.xls");
+wb.write(out);
+out.close();
+ ]]></source>
+ </section>
+ <section><title>New, generic SS Usermodel Code</title>
+<source><![CDATA[
+// import org.apache.poi.ss.usermodel.*;
+
+Workbook[] wbs = new Workbook[] { new HSSFWorkbook(), new XSSFWorkbook() };
+for(int i=0; i<wbs.length; i++) {
+ Workbook wb = wbs[i];
+ CreationHelper createHelper = wb.getCreationHelper();
+
+ // create a new sheet
+ Sheet s = wb.createSheet();
+ // declare a row object reference
+ Row r = null;
+ // declare a cell object reference
+ Cell c = null;
+ // create 2 cell styles
+ CellStyle cs = wb.createCellStyle();
+ CellStyle cs2 = wb.createCellStyle();
+ DataFormat df = wb.createDataFormat();
+
+ // create 2 fonts objects
+ Font f = wb.createFont();
+ Font f2 = wb.createFont();
+
+ // Set font 1 to 12 point type, blue and bold
+ f.setFontHeightInPoints((short) 12);
+ f.setColor( IndexedColors.RED.getIndex() );
+ f.setBoldweight(Font.BOLDWEIGHT_BOLD);
+
+ // Set font 2 to 10 point type, red and bold
+ f2.setFontHeightInPoints((short) 10);
+ f2.setColor( IndexedColors.RED.getIndex() );
+ f2.setBoldweight(Font.BOLDWEIGHT_BOLD);
+
+ // Set cell style and formatting
+ cs.setFont(f);
+ cs.setDataFormat(df.getFormat("#,##0.0"));
+
+ // Set the other cell style and formatting
+ cs2.setBorderBottom(cs2.BORDER_THIN);
+ cs2.setDataFormat(df.getFormat("text"));
+ cs2.setFont(f2);
+
+
+ // Define a few rows
+ for(int rownum = 0; rownum < 30; rownum++) {
+ Row r = s.createRow(rownum);
+ for(int cellnum = 0; cellnum < 10; cellnum += 2) {
+ Cell c = r.createCell(cellnum);
+ Cell c2 = r.createCell(cellnum+1);
+
+ c.setCellValue((double)rownum + (cellnum/10));
+ c2.setCellValue(
+ createHelper.createRichTextString("Hello! " + cellnum)
+ );
+ }
+ }
+
+ // Save
+ String filename = "workbook.xls";
+ if(wb instanceof XSSFWorkbook) {
+ filename = filename + "x";
+ }
+
+ FileOutputStream out = new FileOutputStream(filename);
+ wb.write(out);
+ out.close();
+}
+ ]]></source>
+ </section>
+</section>
+</body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/diagram1.xml b/src/documentation/content/xdocs/components/spreadsheet/diagram1.xml
new file mode 100644
index 0000000000..438da0e7c8
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/diagram1.xml
@@ -0,0 +1,40 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>HSSF</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Andrew C. Oliver" email="acoliver@apache.org"/>
+ <person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
+ </authors>
+ </header>
+
+ <body>
+ <section>
+ <title>Usermodel Class Diagram by Matthew Young</title>
+ <p>
+ <img src="images/usermodel.gif" alt="Usermodel"/>
+ </p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/diagrams.xml b/src/documentation/content/xdocs/components/spreadsheet/diagrams.xml
new file mode 100644
index 0000000000..208cabfa6f
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/diagrams.xml
@@ -0,0 +1,56 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>HSSF</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Andrew C. Oliver" email="acoliver@apache.org"/>
+ <person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
+ </authors>
+ </header>
+
+ <body>
+ <section><title>Overview</title>
+ <p>
+ This section is intended for diagrams (UML/etc) that help
+ explain HSSF.
+ </p>
+ <ul>
+ <li>
+ <a href="diagram1.html">HSSF usermodel class diagram</a> -
+ by Matthew Young (myoung at westernasset dot com)
+ </li>
+ </ul>
+ <p>
+ Have more? Add a new &quot;bug&quot; to the bug database with [DOCUMENTATION]
+ prefacing the description and a link to the file on an http server
+ somewhere. If you don't have your own webserver, then you can email it
+ to (acoliver at apache dot org) provided its &lt; 5MB. Diagrams should be
+ in some format that can be read at least on Linux and Windows. Diagrams
+ that can be edited are preferable, but lets face it, there aren't too
+ many good affordable UML tools yet! And no they don't HAVE to be UML...
+ just useful.
+ </p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/eval-devguide.xml b/src/documentation/content/xdocs/components/spreadsheet/eval-devguide.xml
new file mode 100644
index 0000000000..2d49b0aa09
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/eval-devguide.xml
@@ -0,0 +1,591 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Developing Formula Evaluation</title>
+ <authors>
+ <person email="amoweb@yahoo.com" name="Amol Deshmukh" id="AD"/>
+ <person email="yegor@apache.org" name="Yegor Kozlov" id="YK"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>Introduction</title>
+ <p>
+ This document is for developers wishing to contribute to the
+ FormulaEvaluator API functionality.
+ </p>
+ <p>
+ When evaluating workbooks you may encounter an <code>org.apache.poi.ss.formula.eval.NotImplementedException</code>
+ which indicates that a function is not (yet) supported by POI. Is there a workaround?
+ Yes, the POI framework makes it easy to add implementation of new functions. Prior to POI-3.8
+ you had to checkout the source code from svn and make a custom build with your function implementation.
+ Since POI-3.8 you can register new functions in run-time.
+ </p>
+ <p>
+ Currently, contribution is desired for implementing the standard MS
+ Excel functions. Placeholder classes for these have been created,
+ contributors only need to insert implementation for the
+ individual <code>evaluate()</code> methods that do the actual evaluation.
+ </p>
+ </section>
+ <section><title>Overview of FormulaEvaluator </title>
+ <p>
+ Briefly, a formula string (along with the sheet and workbook that
+ form the context in which the formula is evaluated) is first parsed
+ into Reverse Polish Notation (RPN) tokens using the <code>FormulaParser</code> class.
+ (If you don't know what RPN tokens are, now is a good time to
+ read <a href="http://www-stone.ch.cam.ac.uk/documentation/rrf/rpn.html">
+ Anthony Stone's description of RPN</a>.)
+ </p>
+ <section><title> The big picture</title>
+ <p>
+ RPN tokens are mapped to <code>Eval</code> classes. (The class hierarchy for the <code>Eval</code>s
+ is best understood if you view it in a class diagram
+ viewer.) Depending on the type of RPN token (also called <code>Ptg</code>s
+ henceforth since that is what the <code>FormulaParser</code> calls the classes), a
+ specific type of <code>Eval</code> wrapper is constructed to wrap the RPN token and
+ is pushed on the stack, unless the <code>Ptg</code> is an <code>OperationPtg</code>. If it is an
+ <code>OperationPtg</code>, an <code>OperationEval</code> instance is created for the specific
+ type of <code>OperationPtg</code>. And depending on how many operands it takes,
+ that many <code>Eval</code>s are popped of the stack and passed in an array to
+ the <code>OperationEval</code> instance's evaluate method which returns an <code>Eval</code>
+ of subtype <code>ValueEval</code>. Thus an operation in the formula is evaluated.
+ </p>
+ <note> An <code>Eval</code> is of subinterface <code>ValueEval</code> or <code>OperationEval</code>.
+ Operands are always <code>ValueEval</code>s, and operations are always <code>OperationEval</code>s.</note>
+ <p>
+ <code>OperationEval.evaluate(Eval[])</code> returns an <code>Eval</code> which is supposed
+ to be an instance of one of the implementations of
+ <code>ValueEval</code>. The <code>ValueEval</code> resulting from <code>evaluate()</code> is pushed on the
+ stack and the next RPN token is evaluated. This continues until
+ eventually there are no more RPN tokens, at which point, if the formula
+ string was correctly parsed, there should be just one <code>Eval</code> on the
+ stack &mdash; which contains the result of evaluating the formula.
+ </p>
+ <p>
+ Two special <code>Ptg</code>s &mdash; <code>AreaPtg</code> and <code>ReferencePtg</code> &mdash;
+ are handled a little differently, but the code should be self
+ explanatory for that. Very briefly, the cells included in <code>AreaPtg</code> and
+ <code>RefPtg</code> are examined and their values are populated in individual
+ <code>ValueEval</code> objects which are set into the implementations of
+ <code>AreaEval</code> and <code>RefEval</code>.
+ </p>
+ <p>
+ <code>OperationEval</code>s for the standard operators have been implemented and tested.
+ </p>
+ </section>
+ </section>
+ <section><title>What functions are supported?</title>
+ <p>
+ As of release 5.2.0, POI implements 202 built-in functions,
+ see <a href="#appendixA">Appendix A</a> for the list of supported functions with an implementation.
+ You can programmatically list supported / unsupported functions using the following helper methods:
+ </p>
+<source>import org.apache.poi.ss.formula.ss.formula.WorkbookEvaluator;
+
+// list of functions that POI can evaluate
+Collection&lt;String&gt; supportedFuncs = WorkbookEvaluator.getSupportedFunctionNames();
+
+// list of functions that are not supported by POI
+Collection&lt;String&gt; unsupportedFuncs = WorkbookEvaluator.getNotSupportedFunctionNames();
+</source>
+ <section><title>I need a function that isn't supported!</title>
+ <p>
+ If you need a function that POI doesn't currently support, you have two options.
+ You can create the function yourself, and have your program add it to POI at
+ run-time. Doing this will help you get the function you need as soon as possible.
+ The other option is to create the function yourself, and build it into the POI library,
+ possibly contributing the code to the POI project. Doing this will help you get the
+ function you need, but you'll have to build POI from source yourself. And if you
+ contribute the code, you'll help others who need the function in the future, because
+ it will already be supported in the next release of POI. The two options require
+ almost identical code, but the process of deploying the function is different.
+ If your function is a User Defined Function, you'll always take the run-time option,
+ as POI doesn't distribute UDFs.
+ </p>
+ <p>
+ In the sections ahead, we'll implement the Excel <code>SQRTPI()</code> function, first
+ at run-time, and then we'll show how change it to a library-based implementation.
+ </p>
+ </section>
+ </section>
+ <section><title>Two base interfaces to start your implementation</title>
+ <p>
+ All Excel formula function classes implement either the
+ <code>org.apache.poi.hssf.record.formula.functions.Function</code> or the
+ <code>org.apache.poi.hssf.record.formula.functions.FreeRefFunction</code> interface.
+ <code>Function</code> is a common interface for the functions defined in the Binary Excel File Format (BIFF8): these are "classic" Excel functions like <code>SUM</code>, <code>COUNT</code>, <code>LOOKUP</code>, <em>etc</em>.
+ <code>FreeRefFunction</code> is a common interface for the functions from the Excel Analysis ToolPak, for User Defined Functions that you create,
+ and for Excel built-in functions that have been defined since BIFF8 was defined.
+ In the future these two interfaces are expected be unified into one, but for now you have to start your implementation from two slightly different roots.
+ </p>
+
+ <section><title>Which interface to start from?</title>
+ <p>
+ You are about to implement a function and don't know which interface to start from: <code>Function</code> or <code>FreeRefFunction</code>.
+ You should use <code>Function</code> if the function is part of the Excel BIFF8
+ definition, and <code>FreeRefFunction</code> for a function that is part of the Excel Analysis ToolPak, was added to Excel after BIFF8, or that you are creating yourself.
+ </p>
+ <p>
+ You can check the list of Analysis ToolPak functions defined in <code>org.apache.poi.ss.formula.atp.AnalysisToolPak.createFunctionsMap()</code>
+ to see if the function is part of the Analysis ToolPak.
+ The list of BIFF8 functions is defined as a text file, in the
+ <code>src/resources/main/org/apache/poi/ss/formula/function/functionMetadata.txt</code> file.
+ </p>
+ <p>
+ You can also use the following code to check which base class your function should implement, if it is not a User Defined function (UDFs must implement <code>FreeRefFunction</code>):
+ </p>
+<source>import org.apache.poi.hssf.record.formula.atp.AnalysisToolPak;
+
+if (!AnalysisToolPak.isATPFunction(functionName)){
+ // the function must implement org.apache.poi.hssf.record.formula.functions.Function
+} else {
+ // the function must implement org.apache.poi.hssf.record.formula.functions.FreeRefFunction
+}
+</source>
+ </section>
+ </section>
+ <section><title>Implementing a function.</title>
+ <p>
+ Here is the fun part: let's walk through the implementation of the Excel function <code>SQRTPI()</code>,
+ which POI doesn not currently support.
+ </p>
+ <p>
+ <code>AnalysisToolPak.isATPFunction("SQRTPI")</code> returns true, so this is an Analysis ToolPak function.
+ Thus the base interface must be <code>FreeRefFunction</code>. The same would be true if we were implementing
+ a UDF.
+ </p>
+ <p>
+ Because we're taking the run-time deployment option, we'll create this new function in a source
+ file in our own program. Our function will return an <code>Eval</code> that is either
+ it's proper result, or an <code>ErrorEval</code> that describes the error. All that work
+ is done in the function's <code>evaluate()</code> method:
+ </p>
+<source>package ...;
+import org.apache.poi.ss.formula.eval.EvaluationException;
+import org.apache.poi.ss.formula.eval.ErrorEval;
+import org.apache.poi.ss.formula.eval.NumberEval;
+import org.apache.poi.ss.formula.eval.OperandResolver;
+import org.apache.poi.ss.formula.eval.ValueEval;
+import org.apache.poi.ss.formula.functions.FreeRefFunction;
+
+public final class SqrtPi implements FreeRefFunction {
+
+ public ValueEval evaluate(ValueEval[] args, OperationEvaluationContext ec) {
+ ValueEval arg0 = args[0];
+ int srcRowIndex = ec.getRowIndex();
+ int srcColumnIndex = ec.getColumnIndex();
+ try {
+ // Retrieves a single value from a variety of different argument types according to standard
+ // Excel rules. Does not perform any type conversion.
+ ValueEval ve = OperandResolver.getSingleValue(arg0, srcRowIndex, srcColumnIndex);
+
+ // Applies some conversion rules if the supplied value is not already a number.
+ // Throws EvaluationException(#VALUE!) if the supplied parameter is not a number
+ double arg = OperandResolver.coerceValueToDouble(ve);
+
+ // this where all the heavy-lifting happens
+ double result = Math.sqrt(arg*Math.PI);
+
+ // Excel uses the error code #NUM! instead of IEEE NaN and Infinity,
+ // so when a numeric function evaluates to Double.NaN or Double.Infinity,
+ // be sure to translate the result to the appropriate error code
+ if (Double.isNaN(result) || Double.isInfinite(result)) {
+ throw new EvaluationException(ErrorEval.NUM_ERROR);
+ }
+
+ return new NumberEval(result);
+ } catch (EvaluationException e){
+ return e.getErrorEval();
+ }
+ }
+}
+</source>
+ <p>
+ If our function had been one of the BIFF8 Excel built-ins, it would have been based on
+ the <code>Function</code> interface instead.
+ There are sub-interfaces of <code>Function</code> that make life easier when implementing numeric functions
+ or functions
+ with a small, fixed number of arguments:
+ </p>
+ <ul>
+ <li><code>org.apache.poi.hssf.record.formula.functions.NumericFunction</code></li>
+ <li><code>org.apache.poi.hssf.record.formula.functions.Fixed0ArgFunction</code></li>
+ <li><code>org.apache.poi.hssf.record.formula.functions.Fixed1ArgFunction</code></li>
+ <li><code>org.apache.poi.hssf.record.formula.functions.Fixed2ArgFunction</code></li>
+ <li><code>org.apache.poi.hssf.record.formula.functions.Fixed3ArgFunction</code></li>
+ <li><code>org.apache.poi.hssf.record.formula.functions.Fixed4ArgFunction</code></li>
+ </ul>
+ <p>
+ Since <code>SQRTPI()</code> takes exactly one argument, we would start our implementation from
+ <code>Fixed1ArgFunction</code>. The differences for a BIFF8 <code>Fixed1ArgFunction</code>
+ are pretty small:
+ </p>
+<source>package ...;
+import org.apache.poi.ss.formula.eval.EvaluationException;
+import org.apache.poi.ss.formula.eval.ErrorEval;
+import org.apache.poi.ss.formula.eval.NumberEval;
+import org.apache.poi.ss.formula.eval.OperandResolver;
+import org.apache.poi.ss.formula.eval.ValueEval;
+import org.apache.poi.ss.formula.functions.Fixed1ArgFunction;
+
+public final class SqrtPi extends Fixed1ArgFunction {
+
+ public ValueEval evaluate(int srcRowIndex, int srcColumnIndex, ValueEval arg0) {
+ try {
+ ...
+ }
+}
+</source>
+ <p>
+ Now when the implementation is ready we need to register it with the formula evaluator.
+ This is the same no matter which kind of function we're creating. We simply add the
+ following line to the program that is using POI:
+ </p>
+<source>WorkbookEvaluator.registerFunction("SQRTPI", SqrtPi);
+</source>
+ <p>
+ Voila! The formula evaluator now recognizes <code>SQRTPI()</code>!
+ </p>
+ <section><title>Moving the function into the library</title>
+ <p>
+ If we choose instead to implement our function as part of the POI
+ library, the code is nearly identical. All POI functions
+ are part of one of two Java packages: <code>org.apache.poi.ss.formula.functions</code>
+ for BIFF8 Excel built-in functions, and <code>org.apache.poi.ss.formula.atp</code>
+ for Analysis ToolPak functions. The function still needs to implement the
+ appropriate base class, just as before. To implement our <code>SQRTPI()</code>
+ function in the POI library, we need to move the source code to
+ <code>poi/src/main/java/org/apache/poi/ss/formula/atp/SqrtPi.java</code> in
+ the POI source code, change the <code>package</code> statement, and add a
+ singleton instance:
+ </p>
+<source>package org.apache.poi.ss.formula.atp;
+...
+public final class SqrtPi implements FreeRefFunction {
+
+ public static final FreeRefFunction instance = new SqrtPi();
+
+ private SqrtPi() {
+ // Enforce singleton
+ }
+ ...
+}
+</source>
+ <p>
+ If our function had been one of the BIFF8 Excel built-ins, we would instead have moved
+ the source code to
+ <code>poi/src/main/java/org/apache/poi/ss/formula/functions/SqrtPi.java</code> in
+ the POI source code, and changed the <code>package</code> statement to:
+ </p>
+<source>package org.apache.poi.ss.formula.functions;
+</source>
+ <p>
+ POI library functions are registered differently from run-time-deployed functions.
+ Again, the techniques differ for the two types of library functions (remembering
+ that POI never releases the third type, UDFs).
+ For our Analysis ToolPak function, we have to update the list of functions in
+ <code>org.apache.poi.ss.formula.atp.AnalysisToolPak.createFunctionsMap()</code>:
+ </p>
+<source>...
+private Map&lt;String, FreeRefFunction&gt; createFunctionsMap() {
+ Map&lt;String, FreeRefFunction&gt; m = new HashMap&lt;&gt;(114);
+ ...
+ r(m, "SQRTPI", SqrtPi.instance);
+ ...
+}
+...
+</source>
+ <p>
+ If our function had been one of the BIFF8 Excel built-ins,
+ the registration instead would require updating an entry in the formula-function table,
+ <code>poi/src/main/resources/org/apache/poi/ss/formula/function/functionMetadata.txt</code>:
+ </p>
+<source>...
+#Columns: (index, name, minParams, maxParams, returnClass, paramClasses, isVolatile, hasFootnote )
+...
+359 SQRTPI 1 1 V V
+...
+</source>
+ <p>
+ and also updating the list of function implementation list in
+ <code>org.apache.poi.ss.formula.eval.FunctionEval.produceFunctions()</code>:
+ </p>
+<source>...
+private static Function[] produceFunctions() {
+ ...
+ retval[359] = new SqrtPi();
+ ...
+}
+...
+</source>
+ </section>
+ <section><title>Floating Point Arithmetic in Excel</title>
+ <p>
+ Excel uses the IEEE Standard for Double Precision Floating Point numbers
+ except two cases where it does not adhere to IEEE 754:
+ </p>
+ <ol>
+ <li>Positive and Negative Infinities: Infinities occur when you divide by 0.
+ Excel does not support infinities, rather, it gives a #DIV/0! error in these cases.
+ </li>
+ <li>Not-a-Number (NaN): NaN is used to represent invalid operations
+ (such as infinity/infinity, infinity-infinity, or the square root of -1).
+ NaNs allow a program to continue past an invalid operation.
+ Excel instead immediately generates an error such as #NUM! or #DIV/0!.
+ </li>
+ </ol>
+ <p>
+ Be aware of these two cases when saving results of your scientific calculations in Excel:
+ “where are my Infinities and NaNs? They are gone!”
+ </p>
+ </section>
+ <section><title>Testing Framework</title>
+ <p>
+ Automated testing of the implemented Function is easy.
+ The source code for this is in the file: <code>org.apache.poi.hssf.record.formula.GenericFormulaTestCase.java</code>.
+ This class has a reference to the test xls file (not <em>a</em> test xls, <em>the</em> test xls :) )
+ which may need to be changed for your environment. Once you do that, in the test xls,
+ locate the entry for the function that you have implemented and enter different tests
+ in a cell in the FORMULA row. Then copy the "value of" the formula that you entered in the
+ cell just below it (this is easily done in excel as:
+ [copy the formula cell] > [go to cell below] > Edit > Paste Special > Values > "ok").
+ You can enter multiple such formulas and paste their values in the cell below and the
+ test framework will automatically test if the formula evaluation matches the expected
+ value (Again, hard to put in words, so if you will, please take time to quickly look
+ at the code and the currently entered tests in the patch attachment "FormulaEvalTestData.xls"
+ file).
+ </p>
+ <note>This style of testing appears to have been abandoned. This section needs to be completely rewritten.</note>
+ </section>
+ </section>
+ <anchor id="appendixA"/>
+ <section>
+ <title>Appendix A &mdash; Functions supported by POI</title>
+ <p>
+ Functions supported by POI (as of v5.2.0 release)
+ </p>
+<source>ABS
+ACOS
+ACOSH
+ADDRESS
+AND
+AREAS
+ASIN
+ASINH
+ATAN
+ATAN2
+ATANH
+AVEDEV
+AVERAGE
+AVERAGEIFS
+BIN2DEC
+CEILING
+CHAR
+CHOOSE
+CLEAN
+CODE
+COLUMN
+COLUMNS
+COMBIN
+COMPLEX
+CONCAT
+CONCATENATE
+COS
+COSH
+COUNT
+COUNTA
+COUNTBLANK
+COUNTIF
+COUNTIFS
+DATE
+DATEVALUE
+DAY
+DAYS360
+DEC2BIN
+DEC2HEX
+DEGREES
+DELTA
+DEVSQ
+DGET
+DMAX
+DMIN
+DOLLAR
+DSUM
+EDATE
+EOMONTH
+ERROR.TYPE
+EVEN
+EXACT
+EXP
+FACT
+FACTDOUBLE
+FALSE
+FIND
+FIXED
+FLOOR
+FREQUENCY
+FV
+GEOMEAN
+HEX2DEC
+HLOOKUP
+HOUR
+HYPERLINK
+IF
+IFERROR
+IFNA
+IFS
+IMAGINARY
+IMREAL
+INDEX
+INDIRECT
+INT
+INTERCEPT
+IPMT
+IRR
+ISBLANK
+ISERR
+ISERROR
+ISEVEN
+ISLOGICAL
+ISNA
+ISNONTEXT
+ISNUMBER
+ISODD
+ISREF
+ISTEXT
+LARGE
+LEFT
+LEN
+LN
+LOG
+LOG10
+LOOKUP
+LOWER
+MATCH
+MAX
+MAXA
+MAXIFS
+MDETERM
+MEDIAN
+MID
+MIN
+MINA
+MINIFS
+MINUTE
+MINVERSE
+MIRR
+MMULT
+MOD
+MODE
+MONTH
+MROUND
+NA
+NETWORKDAYS
+NOT
+NOW
+NPER
+NPV
+OCT2DEC
+ODD
+OFFSET
+OR
+PERCENTILE
+PERCENTRANK
+PERCENTRANK.EXC
+PERCENTRANK.INC
+PI
+PMT
+POISSON
+POWER
+PPMT
+PRODUCT
+PROPER
+PV
+QUOTIENT
+RADIANS
+RAND
+RANDBETWEEN
+RANK
+RATE
+REPLACE
+REPT
+RIGHT
+ROMAN
+ROUND
+ROUNDDOWN
+ROUNDUP
+ROW
+ROWS
+SEARCH
+SECOND
+SIGN
+SIN
+SINGLE
+SINH
+SLOPE
+SMALL
+SQRT
+STDEV
+SUBSTITUTE
+SUBTOTAL
+SUM
+SUMIF
+SUMIFS
+SUMPRODUCT
+SUMSQ
+SUMX2MY2
+SUMX2PY2
+SUMXMY2
+SWITCH
+T
+T.DIST
+T.DIST.2T
+T.DIST.RT
+TAN
+TANH
+TDIST
+TEXT
+TEXTJOIN
+TIME
+TIMEVALUE
+TODAY
+TRANSPOSE
+TREND
+TRIM
+TRUE
+TRUNC
+UPPER
+VALUE
+VAR
+VARP
+VLOOKUP
+WEEKDAY
+WEEKNUM
+WORKDAY
+XLOOKUP
+XMATCH
+YEAR
+YEARFRAC</source>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/eval.xml b/src/documentation/content/xdocs/components/spreadsheet/eval.xml
new file mode 100644
index 0000000000..aee0c38008
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/eval.xml
@@ -0,0 +1,410 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Formula Evaluation</title>
+ </header>
+ <body>
+ <section><title>Introduction</title>
+ <p>The POI formula evaluation code enables you to calculate the result of
+ formulas in Excels sheets read-in, or created in POI. This document explains
+ how to use the API to evaluate your formulas.
+ </p>
+ </section>
+
+ <anchor id="WhyEvaluate"/>
+ <section><title>Why do I need to evaluate formulas?</title>
+ <p>The Excel file format (both .xls and .xlsx) stores a "cached" result for
+ every formula along with the formula itself. This means that when the file
+ is opened, it can be quickly displayed, without needing to spend a long
+ time calculating all of the formula results. It also means that when reading
+ a file through Apache POI, the result is quickly available to you too!
+ </p>
+ <p>After making changes with Apache POI to either Formula Cells themselves,
+ or those that they depend on, you should normally perform a Formula
+ Evaluation to have these "cached" results updated. This is normally done
+ after all changes have been performed, but before you write the file out.
+ If you don't do this, there's a good chance that when you open the file in
+ Excel, until you go to the cell and hit enter or F9, you will either see
+ the old value or '#VALUE!' for the cell. (Sometimes Excel will notice
+ itself, and trigger a recalculation on load, but unless you know you are
+ using volatile functions it's generally best to trigger a <a href="#recalculation">Recalulation</a>
+ through POI)
+ </p>
+ </section>
+
+ <anchor id="Status"/>
+ <section><title>Status</title>
+ <p>The code currently provides implementations for all the arithmatic operators.
+ It also provides implementations for approx. 140 built in
+ functions in Excel. The framework however makes it easy to add
+ implementation of new functions. See the <a href="eval-devguide.html"> Formula
+ evaluation development guide</a> and <a href="../../apidocs/dev/org/apache/poi/hssf/record/formula/functions/package-summary.html">javadocs</a>
+ for details. </p>
+ <p> Both HSSFWorkbook and XSSFWorkbook are supported, so you can
+ evaluate formulas on both .xls and .xlsx files.</p>
+ <p> User-defined functions are <a href="user-defined-functions.html">supported</a>,
+ but must be rewritten in Java and registered with the macro-enabled workbook in order to be evaluated.
+ </p>
+ </section>
+ <section><title>User API How-TO</title>
+ <p>The following code demonstrates how to use the FormulaEvaluator
+ in the context of other POI excel reading code.
+ </p>
+ <p>There are several ways in which you can use the FormulaEvalutator API.</p>
+
+ <anchor id="Evaluate"/>
+ <section><title>Using FormulaEvaluator.<strong>evaluate</strong>(Cell cell)</title>
+ <p>This evaluates a given cell, and returns the new value,
+ without affecting the cell</p>
+ <source>
+FileInputStream fis = new FileInputStream("c:/temp/test.xls");
+Workbook wb = new HSSFWorkbook(fis); //or new XSSFWorkbook("c:/temp/test.xls")
+Sheet sheet = wb.getSheetAt(0);
+FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
+
+// suppose your formula is in B3
+CellReference cellReference = new CellReference("B3");
+Row row = sheet.getRow(cellReference.getRow());
+Cell cell = row.getCell(cellReference.getCol());
+
+CellValue cellValue = evaluator.evaluate(cell);
+
+switch (cellValue.getCellType()) {
+ case Cell.CELL_TYPE_BOOLEAN:
+ System.out.println(cellValue.getBooleanValue());
+ break;
+ case Cell.CELL_TYPE_NUMERIC:
+ System.out.println(cellValue.getNumberValue());
+ break;
+ case Cell.CELL_TYPE_STRING:
+ System.out.println(cellValue.getStringValue());
+ break;
+ case Cell.CELL_TYPE_BLANK:
+ break;
+ case Cell.CELL_TYPE_ERROR:
+ break;
+
+ // CELL_TYPE_FORMULA will never happen
+ case Cell.CELL_TYPE_FORMULA:
+ break;
+}
+ </source>
+ <p>Thus using the retrieved value (of type
+ FormulaEvaluator.CellValue - a nested class) returned
+ by FormulaEvaluator is similar to using a Cell object
+ containing the value of the formula evaluation. CellValue is
+ a simple value object and does not maintain reference
+ to the original cell.
+ </p>
+ </section>
+
+ <anchor id="EvaluateFormulaCell"/>
+ <section><title>Using FormulaEvaluator.<strong>evaluateFormulaCell</strong>(Cell cell)</title>
+ <p><strong>evaluateFormulaCell</strong>(Cell cell)
+ will check to see if the supplied cell is a formula cell.
+ If it isn't, then no changes will be made to it. If it is,
+ then the formula is evaluated. The value for the formula
+ is saved alongside it, to be displayed in excel. The
+ formula remains in the cell, just with a new value</p>
+ <p>The return of the function is the type of the
+ formula result, such as Cell.CELL_TYPE_BOOLEAN</p>
+ <source>
+FileInputStream fis = new FileInputStream("/somepath/test.xls");
+Workbook wb = new HSSFWorkbook(fis); //or new XSSFWorkbook("/somepath/test.xls")
+Sheet sheet = wb.getSheetAt(0);
+FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
+
+// suppose your formula is in B3
+CellReference cellReference = new CellReference("B3");
+Row row = sheet.getRow(cellReference.getRow());
+Cell cell = row.getCell(cellReference.getCol());
+
+if (cell!=null) {
+ switch (evaluator.evaluateFormulaCell(cell)) {
+ case Cell.CELL_TYPE_BOOLEAN:
+ System.out.println(cell.getBooleanCellValue());
+ break;
+ case Cell.CELL_TYPE_NUMERIC:
+ System.out.println(cell.getNumericCellValue());
+ break;
+ case Cell.CELL_TYPE_STRING:
+ System.out.println(cell.getStringCellValue());
+ break;
+ case Cell.CELL_TYPE_BLANK:
+ break;
+ case Cell.CELL_TYPE_ERROR:
+ System.out.println(cell.getErrorCellValue());
+ break;
+
+ // CELL_TYPE_FORMULA will never occur
+ case Cell.CELL_TYPE_FORMULA:
+ break;
+ }
+}
+ </source>
+ </section>
+
+ <anchor id="EvaluateInCell"/>
+ <section><title>Using FormulaEvaluator.<strong>evaluateInCell</strong>(Cell cell)</title>
+ <p><strong>evaluateInCell</strong>(Cell cell) will check to
+ see if the supplied cell is a formula cell. If it isn't,
+ then no changes will be made to it. If it is, then the
+ formula is evaluated, and the new value saved into the cell,
+ in place of the old formula.</p>
+ <source>
+FileInputStream fis = new FileInputStream("/somepath/test.xls");
+Workbook wb = new HSSFWorkbook(fis); //or new XSSFWorkbook("/somepath/test.xls")
+Sheet sheet = wb.getSheetAt(0);
+FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
+
+// suppose your formula is in B3
+CellReference cellReference = new CellReference("B3");
+Row row = sheet.getRow(cellReference.getRow());
+Cell cell = row.getCell(cellReference.getCol());
+
+if (cell!=null) {
+ switch (evaluator.<strong>evaluateInCell</strong>(cell).getCellType()) {
+ case Cell.CELL_TYPE_BOOLEAN:
+ System.out.println(cell.getBooleanCellValue());
+ break;
+ case Cell.CELL_TYPE_NUMERIC:
+ System.out.println(cell.getNumericCellValue());
+ break;
+ case Cell.CELL_TYPE_STRING:
+ System.out.println(cell.getStringCellValue());
+ break;
+ case Cell.CELL_TYPE_BLANK:
+ break;
+ case Cell.CELL_TYPE_ERROR:
+ System.out.println(cell.getErrorCellValue());
+ break;
+
+ // CELL_TYPE_FORMULA will never occur
+ case Cell.CELL_TYPE_FORMULA:
+ break;
+ }
+}
+
+ </source>
+ </section>
+
+ <anchor id="EvaluateAll"/>
+ <section><title>Re-calculating all formulas in a Workbook</title>
+ <source>
+FileInputStream fis = new FileInputStream("/somepath/test.xls");
+Workbook wb = new HSSFWorkbook(fis); //or new XSSFWorkbook("/somepath/test.xls")
+FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
+for (Sheet sheet : wb) {
+ for (Row r : sheet) {
+ for (Cell c : r) {
+ if (c.getCellType() == Cell.CELL_TYPE_FORMULA) {
+ evaluator.evaluateFormulaCell(c);
+ }
+ }
+ }
+}
+ </source>
+
+ <p>Alternately, if you know which of HSSF or XSSF you're working
+ with, then you can call the static
+ <strong>evaluateAllFormulaCells</strong> method on the appropriate
+ HSSFFormulaEvaluator or XSSFFormulaEvaluator class.</p>
+ </section>
+ </section>
+
+ <anchor id="recalculation"/>
+ <section><title>Recalculation of Formulas</title>
+ <p>
+ In certain cases you may want to force Excel to re-calculate formulas when the workbook is opened.
+ Consider the following example:
+ </p>
+ <p>
+ Open Excel and create a new workbook. On the first sheet set A1=1, B1=1, C1=A1+B1.
+ Excel automatically calculates formulas and the value in C1 is 2. So far so good.
+ </p>
+ <p>
+ Now modify the workbook with POI:
+ </p>
+ <source>
+ Workbook wb = WorkbookFactory.create(new FileInputStream("workbook.xls"));
+
+ Sheet sh = wb.getSheetAt(0);
+ sh.getRow(0).getCell(0).setCellValue(2); // set A1=2
+
+ FileOutputStream out = new FileOutputStream("workbook2.xls");
+ wb.write(out);
+ out.close();
+ </source>
+ <p>
+ Now open workbook2.xls in Excel and the value in C1 is still 2 while you expected 3. Wrong? No!
+ The point is that Excel caches previously calculated results and you need to trigger recalculation to updated them.
+ It is not an issue when you are creating new workbooks from scratch, but important to remember when you are modifing
+ existing workbooks with formulas. This can be done in two ways:
+ </p>
+ <p>
+ 1. Re-evaluate formulas with POI's FormulaEvaluator:
+ </p>
+ <source>
+ Workbook wb = WorkbookFactory.create(new FileInputStream("workbook.xls"));
+
+ Sheet sh = wb.getSheetAt(0);
+ sh.getRow(0).getCell(0).setCellValue(2); // set A1=2
+
+ wb.getCreationHelper().createFormulaEvaluator().evaluateAll();
+ </source>
+ <p>
+ 2. Delegate re-calculation to Excel. The application will perform a full recalculation when the workbook is opened:
+ </p>
+ <source>
+ Workbook wb = WorkbookFactory.create(new FileInputStream("workbook.xls"));
+
+ Sheet sh = wb.getSheetAt(0);
+ sh.getRow(0).getCell(0).setCellValue(2); // set A1=2
+
+ wb.setForceFormulaRecalculation(true);
+ </source>
+ </section>
+
+ <anchor id="external"/>
+ <section><title>External (Cross-Workbook) references</title>
+ <p>It is possible for a formula in an Excel spreadsheet to
+ refer to a Named Range or Cell in a different workbook.
+ These cross-workbook references are normally called <em>External
+ References</em>. These are formulas which look something like:</p>
+ <source>
+ =SUM([Finances.xlsx]Numbers!D10:D25)
+ =SUM('C:\Data\[Finances.xlsx]Numbers'!D10:D25)
+ =SUM([Finances.xlsx]Range20)
+ </source>
+ <p>If you don't have access to these other workbooks, then you
+ should call
+ <a href="../../apidocs/dev/org/apache/poi/ss/usermodel/FormulaEvaluator.html#setIgnoreMissingWorkbooks(boolean)">setIgnoreMissingWorkbooks(true)</a>
+ to tell the Formula Evaluator to skip evaluating any external
+ references it can't look up.</p>
+ <p>In order for POI to be able to evaluate external references, it
+ needs access to the workbooks in question. As these don't necessarily
+ have the same names on your system as in the workbook, you need to
+ give POI a map of external references to open workbooks, through
+ the
+ <a href="../../apidocs/dev/org/apache/poi/ss/usermodel/FormulaEvaluator.html#setupReferencedWorkbooks(java.util.Map)">setupReferencedWorkbooks(java.util.Map&lt;java.lang.String,FormulaEvaluator&gt; workbooks)</a>
+ method. You should normally do something like:</p>
+ <source>
+// Create a FormulaEvaluator to use
+FormulaEvaluator mainWorkbookEvaluator = workbook.getCreationHelper().createFormulaEvaluator();
+
+// Track the workbook references
+Map&lt;String,FormulaEvaluator> workbooks = new HashMap&lt;String, FormulaEvaluator>();
+// Add this workbook
+workbooks.put("report.xlsx", mainWorkbookEvaluator);
+// Add two others
+workbooks.put("input.xls", WorkbookFactory.create("C:\\temp\\input22.xls").getCreationHelper().createFormulaEvaluator());
+workbooks.put("lookups.xlsx", WorkbookFactory.create("/home/poi/data/tmp-lookups.xlsx").getCreationHelper().createFormulaEvaluator());
+
+// Attach them
+mainWorkbookEvaluator.setupReferencedWorkbooks(workbooks);
+
+// Evaluate
+mainWorkbookEvaluator.evaluateAll();
+ </source>
+ </section>
+
+ <anchor id="Performance"/>
+ <section><title>Performance Notes</title>
+ <ul>
+ <li>Generally you should have to create only one FormulaEvaluator
+ instance per Workbook. The FormulaEvaluator will cache
+ evaluations of dependent cells, so if you have multiple
+ formulas all depending on a cell then subsequent evaluations
+ will be faster.
+ </li>
+ <li>You should normally perform all of your updates to cells,
+ before triggering the evaluation, rather than doing one
+ cell at a time. By waiting until all the updates/sets are
+ performed, you'll be able to take best advantage of the caching
+ for complex formulas.
+ </li>
+ <li>If you do end up making changes to cells part way through
+ evaluation, you should call <em>notifySetFormula</em> or
+ <em>notifyUpdateCell</em> to trigger suitable cache clearance.
+ Alternately, you could instantiate a new FormulaEvaluator,
+ which will start with empty caches.
+ </li>
+ <li>Also note that FormulaEvaluator maintains a reference to
+ the sheet and workbook, so ensure that the evaluator instance
+ is available for garbage collection when you are done with it
+ (in other words don't maintain long lived reference to
+ FormulaEvaluator if you don't really need to - unless
+ all references to the sheet and workbook are removed, these
+ don't get garbage collected and continue to occupy potentially
+ large amounts of memory).
+ </li>
+ <li>CellValue instances however do not maintain reference to the
+ Cell or the sheet or workbook, so these can be long-lived
+ objects without any adverse effect on performance.
+ </li>
+ </ul>
+ </section>
+ <section><title>Formula Evaluation Debugging</title>
+ <p>POI is not perfect and you may stumble across formula evaluation problems (Java exceptions
+ or just different results) in your special use case. To support an easy detailed analysis, a special
+ logging of the full evaluation is provided.</p>
+ <p>POI 5.1.0 and above uses <a href="https://logging.apache.org/log4j/2.x/">Log4J 2.x</a> as a logging framework. Try to set up a logging
+ configuration that lets you see the info and other log messages.</p>
+ <p>Example use:</p>
+ <source>
+ // open your file
+ Workbook wb = new HSSFWorkbook(new FileInputStream("foobar.xls"));
+ FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
+
+ // get your cell
+ Cell cell = wb.getSheet(0).getRow(0).getCell(0); // just a dummy example
+
+ // perform debug output for the next evaluate-call only
+ evaluator.setDebugEvaluationOutputForNextEval(true);
+ evaluator.evaluateFormulaCell(cell);
+ evaluator.evaluateFormulaCell(cell); // no logging performed for this next evaluate-call
+ </source>
+ <p>The special Logger called "POI.FormulaEval" is used (useful if you use the CommonsLogger and a detailed logging configuration).
+ The used log levels are WARN and INFO (for detailed parameter info and results) - the level are so high to allow this
+ special logging without being disturbed by the bunch of DEBUG log entries from other classes.</p>
+ </section>
+
+ <anchor id="sxssf"/>
+ <section><title>Formula Evaluation and SXSSF</title>
+ <p>For versions before 3.13 final, no formula evaluation is possible with
+ SXSSF.</p>
+ <p>If you are using POI 3.13 final or newer, formula evaluation is possible with SXSSF,
+ but with some caveats.</p>
+ <p>The biggest restriction is that, since evaluating a cell needs that cell in memory
+ and any others it depends on, only pure-function formulas and formulas referencing
+ nearby cells can be evaluated with SXSSF. If a formula references a cell that hasn't
+ yet been written, or one which has already been flushed to disk, then it won't be
+ possible to evaluate it.</p>
+ <p>Because of this, a call to <em>wb.getCreationHelper().createFormulaEvaluator().evaluateAll();</em>
+ will very rarely work on SXSSF, as it's very rare that all the cells wil be available
+ and in memory at any time! Instead, it is suggested to evaluate formula cells just
+ after writing them, or shortly after when cells they depend on are added. Just make
+ sure that all cells needing or needed for evaluation are inside the window.</p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/examples.xml b/src/documentation/content/xdocs/components/spreadsheet/examples.xml
new file mode 100644
index 0000000000..87feff0b59
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/examples.xml
@@ -0,0 +1,274 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>HSSF and XSSF Examples</title>
+ <authors>
+ <person id="YK" name="Yegor Kozlov" email="user@poi.apache.org"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>HSSF and XSSF common examples</title>
+ <p>Apache POI comes with a number of examples that demonstrate how you
+ can use the POI API to create documents from "real life".
+ The examples below based on common XSSF-HSSF interfaces so that you
+ can generate either *.xls or *.xlsx output just by setting a
+ command-line argument:
+ </p>
+ <source>
+ BusinessPlan -xls
+ or
+ BusinessPlan -xlsx
+ </source>
+ <p>All sample source is available in <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/">SVN</a></p>
+ <p>In addition, there are a handful of
+ <a href="#hssf-only">HSSF only</a> and
+ <a href="#xssf-only">XSSF only</a> examples as well.
+ </p>
+
+ <section><title>Available Examples</title>
+ <p>
+ The following examples are available:
+ </p>
+ <ul>
+ <li><a href="#ss-common">Common HSSF and XSSF</a><ul>
+ <li><a href="#business-plan">Business Plan</a></li>
+ <li><a href="#calendar">Calendar</a></li>
+ <li><a href="#loan-calculator">Loan Calculator</a></li>
+ <li><a href="#timesheet">Timesheet</a></li>
+ <li><a href="#conditional-formats">Conditional Formats</a></li>
+ <li><a href="#common-formulas">Formula Examples</a></li>
+ <li><a href="#add-dimensioned-image">Add Dimensioned Image</a></li>
+ <li><a href="#aligned-cells">Aligned Cells</a></li>
+ <li><a href="#cell-style-details">Cell Style Details</a></li>
+ <li><a href="#linked-dropdown">Linked Dropdown Lists</a></li>
+ <li><a href="#performance-test">Common SS Performance Test</a></li>
+ <li><a href="#to-html">To HTML</a></li>
+ <li><a href="#to-csv">To CSV</a></li>
+ </ul></li>
+ <li><a href="#hssf-only">HSSF-Only</a></li>
+ <li><a href="#xssf-only">XSSF-Only</a></li>
+ </ul>
+ </section>
+
+ <anchor id="ss-common" />
+ <anchor id="business-plan" />
+ <section><title>Business Plan</title>
+ <p> The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/BusinessPlan.java">BusinessPlan</a>
+ application creates a sample business plan with three phases, weekly iterations and time highlighting. Demonstrates advanced cell formatting
+ (number and date formats, alignments, fills, borders) and various settings for organizing data in a sheet (freezed panes, grouped rows).
+ </p>
+ <figure src="images/businessplan.jpg" alt="business plan demo"/>
+ </section>
+
+ <anchor id="calendar" />
+ <section><title>Calendar</title>
+ <p> The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/CalendarDemo.java">Calendar</a>
+ demo creates a multi sheet calendar. Each month is on a separate sheet.
+ </p>
+ <figure src="images/calendar.jpg" alt="calendar demo"/>
+ </section>
+
+ <anchor id="loan-calculator" />
+ <section><title>Loan Calculator</title>
+ <p> The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/LoanCalculator.java">LoanCalculator</a>
+ demo creates a simple loan calculator. Demonstrates advance usage of cell formulas and named ranges.
+ </p>
+ <figure src="images/loancalc.jpg" alt="loan calculator demo"/>
+ </section>
+
+ <anchor id="timesheet" />
+ <section><title>Timesheet</title>
+ <p> The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/TimesheetDemo.java">Timesheet</a>
+ demo creates a weekly timesheet with automatic calculation of total hours. Demonstrates advance usage of cell formulas.
+ </p>
+ <figure src="images/timesheet.jpg" alt="timesheet demo"/>
+ </section>
+
+ <anchor id="conditional-formats" />
+ <section><title>Conditional Formats</title>
+ <p> The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/ConditionalFormats.java">ConditionalFormats</a>
+ demo is a collection of short examples showing what you can do with Excel conditional formatting in POI:
+ </p>
+ <ul>
+ <li>Highlight cells based on their values</li>
+ <li>Highlight a range of cells based on a formula</li>
+ <li>Hide errors</li>
+ <li>Hide the duplicate values</li>
+ <li>Highlight duplicate entries in a column</li>
+ <li>Highlight items that are in a list on the worksheet</li>
+ <li>Highlight payments that are due in the next thirty days</li>
+ <li>Shade alternating rows on the worksheet</li>
+ <li>Shade bands of rows on the worksheet</li>
+ </ul>
+ </section>
+
+ <anchor id="common-formulas" />
+ <section><title>Formula Examples</title>
+ <p>The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/formula/CalculateMortgage.java">CalculateMortgage</a>
+ example demonstrates a simple user-defined function to calculate
+ principal and interest.</p>
+ <p>The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/formula/CheckFunctionsSupported.java">CheckFunctionsSupported</a>
+ example shows how to test what functions and formulas aren't
+ supported from a given file.</p>
+ <p>The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/formula/SettingExternalFunction.java">SettingExternalFunction</a>
+ example demonstrates how to use externally provided (third-party)
+ formula add-ins.</p>
+ <p>The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/formula/UserDefinedFunctionExample.java">UserDefinedFunctionExample</a>
+ example demonstrates how to invoke a User Defined Function for a
+ given Workbook instance using POI's UDFFinder implementation.</p>
+ </section>
+
+ <anchor id="add-dimensioned-image" />
+ <section><title>Add Dimensioned Image</title>
+ <p>The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/AddDimensionedImage.java">AddDimensionedImage</a>
+ example demonstrates how to add an image to a worksheet and set that
+ images size to a specific number of millimetres irrespective of the
+ width of the columns or height of the rows.</p>
+ </section>
+
+ <anchor id="aligned-cells" />
+ <section><title>Aligned Cells</title>
+ <p>The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/AligningCells.java">AligningCells</a>
+ example demonstrates how various alignment options work.</p>
+ </section>
+
+ <anchor id="cell-style-details" />
+ <section><title>Cell Style Details</title>
+ <p>The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/CellStyleDetails.java">CellStyleDetails</a>
+ example demonstrates how to read excel styles for cells.</p>
+ </section>
+
+ <anchor id="linked-dropdown" />
+ <section><title>Linked Dropdown Lists</title>
+ <p>The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/LinkedDropDownLists.java">LinkedDropDownLists</a>
+ example demonstrates one technique that may be used to create linked
+ or dependent drop down lists.</p>
+ </section>
+
+ <anchor id="performance-test" />
+ <section><title>Common SS Performance Test</title>
+ <p>The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/SSPerformanceTest.java">SSPerformanceTest</a>
+ example provides a way to create simple example files of varying
+ sizes, and to calculate how long they take. Useful for benchmarking
+ your system, and to also test if slow performance is due to Apache
+ POI itself or to your own code.</p>
+ </section>
+
+ <anchor id="to-html" />
+ <section><title>ToHtml</title>
+ <p> The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/html/ToHtml.java">ToHtml</a>
+ example shows how to display a spreadsheet in HTML using the classes for spreadsheet display.
+ </p>
+ </section>
+
+ <anchor id="to-csv" />
+ <section><title>ToCSV</title>
+ <p>The <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/ToCSV.java">ToCSV</a>
+ example demonstrates <em>one</em> way to convert an Excel spreadsheet into a CSV file.
+ </p>
+ </section>
+ </section>
+
+ <anchor id="hssf-only" />
+ <section><title>HSSF-only Examples</title>
+ <p>All the HSSF-only examples can be found in
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/">SVN</a></p>
+ <ul>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/CellComments.java">CellComments</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/HyperlinkFormula.java">HyperlinkFormula</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/EventExample.java">EventExample</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/OfficeDrawingWithGraphics.java">OfficeDrawingWithGraphics</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/CreateDateCells.java">CreateDateCells</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/NewWorkbook.java">NewWorkbook</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/EmeddedObjects.java">EmeddedObjects</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/Hyperlinks.java">Hyperlinks</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/OfficeDrawing.java">OfficeDrawing</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/HSSFReadWrite.java">HSSFReadWrite</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/NewSheet.java">NewSheet</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/SplitAndFreezePanes.java">SplitAndFreezePanes</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/InCellLists.java">InCellLists</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/RepeatingRowsAndColumns.java">RepeatingRowsAndColumns</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/MergedCells.java">MergedCells</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/CellTypes.java">CellTypes</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/ZoomSheet.java">ZoomSheet</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/ReadWriteWorkbook.java">ReadWriteWorkbook</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/CreateCells.java">CreateCells</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/Alignment.java">Alignment</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/FrillsAndFills.java">FrillsAndFills</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/AddDimensionedImage.java">AddDimensionedImage</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/Borders.java">Borders</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/NewLinesInCells.java">NewLinesInCells</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/WorkingWithFonts.java">WorkingWithFonts</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/BigExample.java">BigExample</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/Outlines.java">Outlines</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/eventusermodel/XLS2CSVmra.java">XLS2CSVmra</a></li>
+ </ul>
+ </section>
+
+ <anchor id="xssf-only" />
+ <section><title>XSSF-only Examples</title>
+ <p>All the XSSF-only examples can be found in
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/">SVN</a></p>
+ <ul>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/CellComments.java">CellComments</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/HeadersAndFooters.java">HeadersAndFooters</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/CreateUserDefinedDataFormats.java">CreateUserDefinedDataFormats</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/CreatePivotTable.java">CreatePivotTable</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/CreatePivotTable2.java">CreatePivotTable2</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/FillsAndColors.java">FillsAndColors</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/WorkingWithBorders.java">WorkingWithBorders</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/BigGridDemo.java">BigGridDemo</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/CreateTable.java">CreateTable</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/CalendarDemo.java">CalendarDemo</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/AligningCells.java">AligningCells</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/SplitAndFreezePanes.java">SplitAndFreezePanes</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/WorkingWithPageSetup.java">WorkingWithPageSetup</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/WorkingWithPictures.java">WorkingWithPictures</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/MergingCells.java">MergingCells</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/CustomXMLMapping.java">CustomXMLMapping</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/SelectedSheet.java">SelectedSheet</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/EmbeddedObjects.java">EmbeddedObjects</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/WorkbookProperties.java">WorkbookProperties</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/NewLinesInCells.java">NewLinesInCells</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/Outlining.java">Outlining</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/CreateCell.java">CreateCell</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/IterateCells.java">IterateCells</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/BarChart.java">BarChart</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/BarAndLineChart.java">BarAndLineChart</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/LineChart.java">LineChart</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/ScatterChart.java">ScatterChart</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/WorkingWithFonts.java">WorkingWithFonts</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/HyperlinkExample.java">HyperlinkExample</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/ShiftRows.java">ShiftRows</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/WorkingWithRichText.java">WorkingWithRichText</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/usermodel/FitSheetToOnePage.java">FitSheetToOnePage</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/streaming/HybridStreaming.java">HybridStreaming</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/streaming/Outlining.java">Outlining (SXSSF output)</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/streaming/DeferredGeneration.java">DeferredGeneration (SXSSF output)</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/streaming/SavePasswordProtectedXlsx.java">SavePasswordProtectedXlsx (SXSSF output)</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/eventusermodel/XLSX2CSV.java">XLSX2CSV (streaming read)</a></li>
+ <li><a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/eventusermodel/FromHowTo.java">FromHowTo (streaming read)</a></li>
+ </ul>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/excelant.xml b/src/documentation/content/xdocs/components/spreadsheet/excelant.xml
new file mode 100644
index 0000000000..01e03c3213
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/excelant.xml
@@ -0,0 +1,317 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>ExcelAnt - Ant Tasks for Validating Excel Spreadsheets</title>
+ <authors>
+ <person email="jon@loquatic.com" name="Jon Svede" id="JDS"/>
+ <person email="brian.bush@nrel.gov" name="Brian Bush" id="BWB"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>ExcelAnt - Ant Tasks for Validating Excel Spreadsheets</title>
+
+ <section><title>Introduction</title>
+ <p>ExcelAnt is a set of Ant tasks that make it possible to verify or test
+ a workbook without having to write Java code. Of course, the tasks themselves
+ are written in Java, but to use this framework you only need to know a little
+ bit about Ant.</p>
+ <p>This document covers the basic usage and set up of ExcelAnt.</p>
+ <p>This document will assume basic familiarity with Ant and Ant build files.</p>
+ </section>
+ <section><title>Setup</title>
+ <p>To start with ExcelAnt, you'll need to have the POI 3.8 or higher jar files. If you test only .xls
+workbooks then you need to have the following jars in your path:</p>
+ <ul>
+ <li>poi-excelant-$version-YYYYDDMM.jar</li>
+ <li>poi-$version-YYYYDDMM.jar</li>
+ <li>poi-ooxml-$version-YYYYDDMM.jar</li>
+ </ul>
+ <p> If you evaluate .xlsx workbooks then you need to add these: </p>
+ <ul>
+ <li>poi-ooxml-lite-$version-YYYYDDMM.jar</li>
+ <li>xmlbeans.jar</li>
+ </ul>
+ <p>For example, if you have these jars in a lib/ dir in your project, your build.xml
+ might look like this:</p>
+<source><![CDATA[
+<property name="lib.dir" value="lib" />
+
+<path id="excelant.path">
+ <pathelement location="${lib.dir}/poi-excelant-3.8-beta1-20101230.jar" />
+ <pathelement location="${lib.dir}/poi-3.8-beta1-20101230.jar" />
+ <pathelement location="${lib.dir}/poi-ooxml-3.8-beta1-20101230.jar" />
+</path>
+]]></source>
+ <p>Next, you'll need to define the Ant tasks. There are several ways to use ExcelAnt:</p>
+
+<ul><li>The traditional way:</li></ul>
+<source><![CDATA[
+ <typedef resource="org/apache/poi/ss/excelant/antlib.xml" classpathref="excelant.path" />
+]]></source>
+<p>
+ Where excelant.path refers to the classpath with POI jars.
+ Using this approach the provided extensions will live in the default namespace. Note that the default task/typenames (evaluate, test) may be too generic and should either be explicitly overridden or used with a namespace.
+</p>
+<ul><li>Similar, but assigning a namespace URI:</li></ul>
+<source><![CDATA[
+<project name="excelant-demo" xmlns:poi="antlib:org.apache.poi.ss.excelant">
+
+ <typedef resource="org/apache/poi/ss/excelant/antlib.xml"
+ classpathref="excelant.classpath"
+ uri="antlib:org.apache.poi.ss.excelant"/>
+
+ <target name="test-nofile">
+ <poi:excelant>
+
+ </poi:excelant>
+ </target>
+</project>
+]]></source>
+ </section>
+
+ <section><title>A Simple Example</title>
+ <p>The simplest example of using Excel is the ability to validate that POI is giving you back
+ the value you expect it to. Does this mean that POI is inaccurate? Hardly. There are cases
+ where POI is unable to evaluate cells for a variety of reasons. If you need to write code
+ to integrate a worksheet into an app, you may want to know that it's going to work before
+ you actually try to write that code. ExcelAnt helps with that.</p>
+
+ <p>Consider the <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/excelant/simple-mortgage-calculation.xls">mortgage-calculation.xls</a>
+ file found in the Examples (link broken / file is missing). This sheet is shown below:</p>
+
+ <figure src="images/simple-xls-with-function.jpg" alt="mortgage calculation spreadsheet"/>
+
+ <p>This sheet calculates the principal and interest payment for a mortgage based
+ on the amount of the loan, term and rate. To write a simple ExcelAnt test you
+ need to tell ExcelAnt about the file like this:</p>
+<source><![CDATA[
+<property name="xls.file" value="" />
+
+<target name="simpleTest">
+ <excelant fileName="${xls.file}">
+ <test name="checkValue" showFailureDetail="true">
+ <evaluate showDelta="true" cell="'MortgageCalculator'!$B$4" expectedValue="790.7936" precision="1.0e-4" />
+ </test>
+ </excelant>
+</target>
+]]></source>
+
+
+ <p>This code sets up ExcelAnt to access the file defined in the ant property
+ xls.file. Then it creates a 'test' named 'checkValue'. Finally it tries
+ to evaluate the B4 on the sheet named 'MortgageCalculator'. There are some assumptions
+ here that are worth explaining. For starters, ExcelAnt is focused on the testing
+ numerically oriented sheets. The &lt;evaluate&gt; task is actually evaluating the
+ cell as a formula using a FormulaEvaluator instance from POI. Therefore it will fail
+ if you point it to a cell that doesn't contain a formula or a test a plain old number.</p>
+
+ <p>Having said all that, here is what the output looks like:</p>
+
+<source><![CDATA[
+simpleTest:
+ [excelant] ExcelAnt version 0.4.0 Copyright 2011
+ [excelant] Using input file: resources/excelant.xls
+ [excelant] 1/1 tests passed.
+BUILD SUCCESSFUL
+Total time: 391 milliseconds
+]]></source>
+
+ </section>
+
+ <section><title>Setting Values into a Cell</title>
+ <p>So now we know that at a minimum POI can use our sheet to calculate the existing value.
+ This is an important point: in many cases sheets have dependencies, i.e., cells they reference.
+ As is often the case, these cells may have dependencies, which may have dependencies, etc.
+ The point is that sometimes a dependent cell may get adjusted by a macro or a function
+ and it may be that POI doesn't have the capabilities to do the same thing. This test
+ verifies that we can rely on POI to retrieve the default value, based on the stored values
+ of the sheet. Now we want to know if we can manipulate those dependencies and verify
+ the output.</p>
+
+ <p>To verify that we can manipulate cell values, we need a way in ExcelAnt to set a value.
+ This is provided by the following task types:</p>
+ <ul>
+ <li>setDouble() - sets the specified cell as a double.</li>
+ <li>setFormula() - sets the specified cell as a formula.</li>
+ <li>setString() = sets the specified cell as a String.</li>
+ </ul>
+
+ <p>For the purposes of this example we'll use the &lt;setDouble&gt; task. Let's
+ start with a $240,000, 30 year loan at 11% (let's pretend it's like 1984). Here
+ is how we will set that up:</p>
+
+<source><![CDATA[
+ <setDouble cell="'MortgageCalculator'!$B$1" value="240000"/>
+ <setDouble cell="'MortgageCalculator'!$B$2" value ="0.11"/>
+ <setDouble cell="'MortgageCalculator'!$B$3" value ="30"/>
+ <evaluate showDelta="true" cell="'MortgageCalculator'!$B$4" expectedValue="2285.576149" precision="1.0e-4" />
+]]></source>
+
+ <p>Don't forget that we're verifying the behavior so you need to put all this
+ into the sheet. That is how I got the result of $2,285 and change. So save your
+ changes and run it; you should get the following: </p>
+
+<source><![CDATA[
+Buildfile: C:\opt\eclipse\workspaces\excelant\excelant.examples\build.xml
+simpleTest:
+ [excelant] ExcelAnt version 0.4.0 Copyright 2011
+ [excelant] Using input file: resources/excelant.xls
+ [excelant] 1/1 tests passed.
+BUILD SUCCESSFUL
+Total time: 406 milliseconds
+]]></source>
+
+</section>
+
+ <section><title>Getting More Details</title>
+
+ <p>This is great, it's working! However, suppose you want to see a little more detail. The
+ ExcelAnt tasks leverage the Ant logging so you can add the -verbose and -debug flags to
+ the Ant command line to get more detail. Try adding -verbose. Here is what
+ you should see:</p>
+
+<source><![CDATA[
+simpleTest:
+ [excelant] ExcelAnt version 0.4.0 Copyright 2011
+ [excelant] Using input file: resources/excelant.xls
+ [evaluate] test precision = 1.0E-4 global precision = 0.0
+ [evaluate] Using evaluate precision of 1.0E-4
+ [excelant] 1/1 tests passed.
+BUILD SUCCESSFUL
+Total time: 406 milliseconds
+]]></source>
+
+
+ <p>We see a little more detail. Notice that we see that there is a setting for global precision.
+ Up until now we've been setting the precision on each evaluate that we call. This
+ is obviously useful but it gets cumbersome. It would be better if there were a way
+ that we could specify a global precision - and there is. There is a &lt;precision&gt;
+ tag that you can specify as a child of the &lt;excelant&gt; tag. Let's go back to
+ our original task we set up earlier and modify it:</p>
+
+<source><![CDATA[
+<property name="xls.file" value="" />
+
+<target name="simpleTest">
+ <excelant fileName="${xls.file}">
+ <precision value="1.0e-3"/>
+ <test name="checkValue" showFailureDetail="true">
+ <evaluate showDelta="true" cell="'MortgageCalculator'!$B$4" expectedValue="790.7936" />
+ </test>
+ </excelant>
+</target>
+]]></source>
+
+ <p>In this example we have set the global precision to 1.0e-3. This means that
+ in the absence of something more stringent, all tests in the task will use
+ the global precision. We can still override this by specifying the
+ precision attribute of all of our &lt;evaluate&gt; task. Let's first run
+ this task with the global precision and the -verbose flag:</p>
+
+<source><![CDATA[
+simpleTest:
+[excelant] ExcelAnt version 0.4.0 Copyright 2011
+[excelant] Using input file: resources/excelant.xls
+[excelant] setting precision for the test checkValue
+ [test] setting globalPrecision to 0.0010 in the evaluator
+[evaluate] test precision = 0.0 global precision = 0.0010
+[evaluate] Using global precision of 0.0010
+[excelant] 1/1 tests passed.
+]]></source>
+
+
+ <p>As the output clearly shows, the test itself has no precision but there is
+ the global precision. Additionally, it tells us we're going to use that
+ more stringent global value. Now suppose that for this test we want
+ to use a more stringent precision, say 1.0e-4. We can do that by adding
+ the precision attribute back to the &lt;evaluate&gt; task:</p>
+
+<source><![CDATA[
+<excelant fileName="${xls.file}">
+ <precision value="1.0e-3"/>
+ <test name="checkValue" showFailureDetail="true">
+ <setDouble cell="'MortgageCalculator'!$B$1" value="240000"/>
+ <setDouble cell="'MortgageCalculator'!$B$2" value ="0.11"/>
+ <setDouble cell="'MortgageCalculator'!$B$3" value ="30"/>
+ <evaluate showDelta="true" cell="'MortgageCalculator'!$B$4" expectedValue="2285.576149" precision="1.0e-4" />
+ </test>
+</excelant>
+]]></source>
+
+
+ <p>Now when you re-run this test with the verbose flag you will see that
+ your test ran and passed with the higher precision:</p>
+<source><![CDATA[
+simpleTest:
+ [excelant] ExcelAnt version 0.4.0 Copyright 2011
+ [excelant] Using input file: resources/excelant.xls
+ [excelant] setting precision for the test checkValue
+ [test] setting globalPrecision to 0.0010 in the evaluator
+ [evaluate] test precision = 1.0E-4 global precision = 0.0010
+ [evaluate] Using evaluate precision of 1.0E-4 over the global precision of 0.0010
+ [excelant] 1/1 tests passed.
+BUILD SUCCESSFUL
+Total time: 390 milliseconds
+]]></source>
+ </section>
+
+ <section><title>Leveraging User Defined Functions</title>
+ <p>POI has an excellent feature (besides ExcelAnt) called <a href="user-defined-functions.html">User Defined Functions</a>,
+ that allows you to write Java code that will be used in place of custom VB
+ code or macros is a spreadsheet. If you have read the documentation and written
+ your own FreeRefFunction implmentations, ExcelAnt can make use of this code.
+ For each &lt;excelant&gt; task you define you can nest a &lt;udf&gt; tag
+ which allows you to specify the function alias and the class name.</p>
+
+ <p>Consider the previous example of the mortgage calculator. What if, instead
+ of being a formula in a cell, it was a function defined in a VB macro? As luck
+ would have it, we already have an example of this in the examples from the
+ User Defined Functions example, so let's use that. In the example spreadsheet
+ there is a tab for MortgageCalculatorFunction, which will use. If you look in
+ cell B4, you see that rather than a messy cell based formula, there is only the function
+ call. Let's not get bogged down in the function/Java implementation, as these
+ are covered in the User Defined Function documentation. Let's just add
+ a new target and test to our existing build file:</p>
+<source><![CDATA[
+ <target name="functionTest">
+ <excelant fileName="${xls.file}">
+ <udf functionAlias="calculatePayment" class="org.apache.poi.ss.examples.formula.CalculateMortgage"/>
+ <precision value="1.0e-3"/>
+ <test name="checkValue" showFailureDetail="true">
+ <setDouble cell="'MortgageCalculator'!$B$1" value="240000"/>
+ <setDouble cell="'MortgageCalculator'!$B$2" value ="0.11"/>
+ <setDouble cell="'MortgageCalculator'!$B$3" value ="30"/>
+ <evaluate showDelta="true" cell="'MortgageCalculatorFunction'!$B$4" expectedValue="2285.576149" precision="1.0e-4" />
+ </test>
+ </excelant>
+ </target>
+]]></source>
+
+ <p>So if you look at this carefully it looks the same as the previous examples. We
+ still use the global precision, we're still setting values, and we still want
+ to evaluate a cell. The only real differences are the sheet name and the
+ addition of the function.</p>
+ </section>
+ </section>
+</body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/formula.xml b/src/documentation/content/xdocs/components/spreadsheet/formula.xml
new file mode 100644
index 0000000000..3e2ed30647
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/formula.xml
@@ -0,0 +1,120 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Formula Support</title>
+ <authors>
+ <person email="avik@apache.org" name="Avik Sengupta" id="AS"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>Introduction</title>
+ <p>
+ This document describes the current state of formula support in POI.
+ The information in this document currently applies to the 3.13 version of POI.
+ Since this area is a work in progress, this document will be updated with new
+ features as and when they are added.
+ </p>
+
+ </section>
+ <section><title>The basics</title>
+ <p>
+ In org.apache.poi.ss.usermodel.Cell
+ <strong> setCellFormula(&quot;formulaString&quot;) </strong> is used to add a
+ formula to a sheet, and <strong> getCellFormula() </strong> is used to retrieve
+ the string representation of a formula.
+ </p>
+ <p>
+ We aim to support the complete excel grammar for formulas. Thus, the string that
+ you pass in to the <em> setCellFormula </em> call should be what you expect to
+ type into excel. Also, note that you should NOT add a "=" to the front of the string.
+ </p>
+ <p>
+ Please note that localized versions of Excel allow to enter localized
+ function-names. However internally Excel stores the English names and thus POI
+ only supports these and not the localized ones. Also note that only commas may be
+ used to separate arguments, as per the Excel English style, alternate delimeters
+ used in other localizations are not supported.
+ </p>
+ </section>
+ <section><title>Supported Features</title>
+ <ul>
+ <li>References: single cell &amp; area, 2D &amp; 3D, relative &amp; absolute</li>
+ <li>Literals: number, text, boolean, error and array</li>
+ <li>Operators: arithmetic and logical, some region operators</li>
+ <li>Built-in functions: over 350 recognised, 280 evaluatable</li>
+ <li>Add-in functions: 24 from Analysis Toolpack</li>
+ <li>Array Formulas: via Sheet.setArrayFormula() and Sheet.removeArrayFormula()</li>
+ <li>Region operators: union, intersection</li>
+ </ul>
+ </section>
+ <section><title>Not yet supported</title>
+ <ul>
+ <li>Manipulating table formulas (In Excel, formulas that look like "{=...}" as opposed to "=...")</li>
+ <li>Parsing of previously uncalled add-in functions</li>
+ <li>Preservation of whitespace in formulas (when POI manipulates them)</li>
+ </ul>
+ </section>
+
+ <section><title>Supported Functions</title>
+ <p>To get the list of formula functions that POI supports, you need to
+ call some code!</p>
+ <p>The methods you need are available on
+ <a href="../../apidocs/dev/org/apache/poi/ss/formula/eval/FunctionEval.html">org.apache.poi.ss.formula.eval.FunctionEval</a>.
+ To find which functions your copy of Apache POI supports, use
+ <a href="../../apidocs/dev/org/apache/poi/ss/formula/eval/FunctionEval.html#getSupportedFunctionNames()">getSupportedFunctionNames()</a>
+ to get a list of the implemented function names. For the list of functions that
+ POI knows the name of, but doesn't currently implement, use
+ <a href="../../apidocs/dev/org/apache/poi/ss/formula/eval/FunctionEval.html#getNotSupportedFunctionNames()">getNotSupportedFunctionNames()</a>
+ </p>
+ </section>
+
+ <section><title>Internals</title>
+ <p>
+ Formulas in Excel are stored as sequences of tokens in Reverse Polish Notation order. The
+ <a href="https://sc.openoffice.org/excelfileformat.pdf">open office XLS spec</a> is the best
+ documentation you will find for the format.
+ </p>
+
+ <p>
+ The tokens used by excel are modeled as individual *Ptg classes in the <strong>
+ org.apache.poi.hssf.record.formula</strong> package.
+ </p>
+ <p>
+ The task of parsing a formula string into an array of RPN ordered tokens is done by the <strong>
+ org.apache.poi.ss.formula.FormulaParser</strong> class. This class implements a hand
+ written recursive descent parser.
+ </p>
+ <p>
+ Formula tokens in Excel are stored in one of three possible <em> operand classes </em>:
+ Reference, Value and Array. Based on the location of a token, its class can change
+ in complicated and undocumented ways. While we have support for most cases, we
+ are not sure if we have covered all bases (since there is no documentation for this area.)
+ We would therefore like you to report any
+ occurrence of #VALUE! in a cell upon opening a POI generated workbook in excel. (Check that
+ typing the formula into Excel directly gives a valid result.)
+ </p>
+ <p>Check out the <a href="site:javadocs">javadocs </a> for details.
+ </p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/hacking-hssf.xml b/src/documentation/content/xdocs/components/spreadsheet/hacking-hssf.xml
new file mode 100644
index 0000000000..784aafbf22
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/hacking-hssf.xml
@@ -0,0 +1,89 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Hacking HSSF</title>
+ <authors>
+ <person email="user@poi.apache.org" name="Glen Stampoultzis" id="GJS"/>
+ <person email="acoliver@apache.org" name="Andrew Oliver" id="AO"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>Where Can I Find Documentation on Feature X</title>
+ <p>
+ You might find the
+ 'Excel 97 Developer's Kit' (out of print, Microsoft Press, no
+ restrictive covenants, available on Amazon.com) helpful for
+ understanding the file format.
+ </p>
+ <p>
+ Also useful is the <a href="https://sc.openoffice.org/excelfileformat.pdf">open office XLS spec</a>. We
+ are collaborating with the maintainer of the spec so if you think you can add something to their
+ document just send through your changes.
+ </p>
+ </section>
+ <section><title>Help, I Can't Find Feature X Documented Anywhere</title>
+ <ol>
+ <li>
+ Look at OpenOffice.org or Gnumeric sources if its implemented there.
+ </li>
+ <li>
+ Use org.apache.poi.hssf.dev.BiffViewer to view the structure of the
+ file. Experiment by adding one criteria entry at a time. See what it
+ does to the structure, infer behavior and structure from it. Using the
+ unix diff command (or get cygwin from www.cygwin.com for windows) you
+ can figure out a lot very quickly. Unimplemented records show up as
+ 'UNKNOWN' and prints a hex dump.
+ </li>
+ </ol>
+ </section>
+ <section><title>Low-level Record Generation</title>
+ <p>
+ Low level records can be time consuming to created. We created a record
+ generator to help generate some of the simpler tasks.
+ </p>
+ <p>
+ We use XML
+ descriptors to generate the Java code (which sure beats the heck out of
+ the PERL scripts originally used ;-) for low level records. The
+ generator is kinda alpha-ish right now and could use some enhancement,
+ so you may find that to be about 1/2 of the work. Notice this is in
+ org.apache.poi.hssf.record.definitions.
+ </p>
+ </section>
+ <section><title>Important Notice</title>
+ <p>One thing to note: If you are making a large code contribution we need to ensure
+ any participants in this process have never
+ signed a "Non Disclosure Agreement" with Microsoft, and have not
+ received any information covered by such an agreement. If they have
+ they'll not be able to participate in the POI project. For large contributions we
+ may ask you to sign an agreement.</p>
+ </section>
+ <section><title>What Can I Work On?</title>
+ <p>Ask in the dev mailing list for advice.</p>
+ </section>
+ <section><title>What Else Should I Know?</title>
+ <p>Make sure you <a href="site:guidelines">read the contributing section</a>
+ as it contains more generation information about contributing to POI in general.</p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/how-to.xml b/src/documentation/content/xdocs/components/spreadsheet/how-to.xml
new file mode 100644
index 0000000000..17582e914c
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/how-to.xml
@@ -0,0 +1,884 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>The New Halloween Document</title>
+ <authors>
+ <person email="acoliver2@users.sourceforge.net" name="Andrew C. Oliver" id="AO"/>
+ <person email="user@poi.apache.org" name="Glen Stampoultzis" id="GJS"/>
+ <person email="nick@apache.org" name="Nick Burch" id="NB"/>
+ <person email="sergeikozello@mail.ru" name="Sergei Kozello" id="SK"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>How to use the HSSF API</title>
+
+ <section><title>Capabilities</title>
+ <p>This release of the how-to outlines functionality for the
+ current svn trunk.
+ Those looking for information on previous releases should
+ look in the documentation distributed with that release.</p>
+ <p>
+ HSSF allows numeric, string, date or formula cell values to be written to
+ or read from an XLS file. Also
+ in this release is row and column sizing, cell styling (bold,
+ italics, borders,etc), and support for both built-in and user
+ defined data formats. Also available is
+ an event-based API for reading XLS files.
+ It differs greatly from the read/write API
+ and is intended for intermediate developers who need a smaller
+ memory footprint.
+ </p>
+ </section>
+ <section><title>Different APIs</title>
+ <p>There are a few different ways to access the HSSF API. These
+ have different characteristics, so you should read up on
+ all to select the best for you.</p>
+ <ul>
+ <li><a href="#user_api">User API (HSSF and XSSF)</a></li>
+ <li><a href="#event_api">Event API (HSSF Only)</a></li>
+ <li><a href="#record_aware_event_api">Event API with extensions to be Record Aware (HSSF Only)</a></li>
+ <li><a href="#xssf_sax_api">XSSF and SAX (Event API)</a></li>
+ <li><a href="#sxssf">SXSSF (Streaming User API)</a></li>
+ <li><a href="#low_level_api">Low Level API</a></li>
+ </ul>
+ </section>
+ </section>
+ <section><title>General Use</title>
+ <anchor id="user_api" />
+ <section><title>User API (HSSF and XSSF)</title>
+ <section><title>Writing a new file</title>
+
+ <p>The high level API (package: org.apache.poi.ss.usermodel)
+ is what most people should use. Usage is very simple.
+ </p>
+ <p>Workbooks are created by creating an instance of
+ org.apache.poi.ss.usermodel.Workbook. Either create
+ a concrete class directly
+ (org.apache.poi.hssf.usermodel.HSSFWorkbook or
+ org.apache.poi.xssf.usermodel.XSSFWorkbook), or use
+ the handy factory class
+ org.apache.poi.ss.usermodel.WorkbookFactory.
+ </p>
+ <p>Sheets are created by calling createSheet() from an existing
+ instance of Workbook, the created sheet is automatically added in
+ sequence to the workbook. Sheets do not in themselves have a sheet
+ name (the tab at the bottom); you set
+ the name associated with a sheet by calling
+ Workbook.setSheetName(sheetindex,&quot;SheetName&quot;,encoding).
+ For HSSF, the name may be in 8bit format
+ (HSSFWorkbook.ENCODING_COMPRESSED_UNICODE)
+ or Unicode (HSSFWorkbook.ENCODING_UTF_16). Default
+ encoding for HSSF is 8bit per char. For XSSF, the name
+ is automatically handled as unicode.
+ </p>
+ <p>Rows are created by calling createRow(rowNumber) from an existing
+ instance of Sheet. Only rows that have cell values should be
+ added to the sheet. To set the row's height, you just call
+ setRowHeight(height) on the row object. The height must be given in
+ twips, or 1/20th of a point. If you prefer, there is also a
+ setRowHeightInPoints method.
+ </p>
+ <p>Cells are created by calling createCell(column, type) from an
+ existing Row. Only cells that have values should be added to the
+ row. Cells should have their cell type set to either
+ Cell.CELL_TYPE_NUMERIC or Cell.CELL_TYPE_STRING depending on
+ whether they contain a numeric or textual value. Cells must also have
+ a value set. Set the value by calling setCellValue with either a
+ String or double as a parameter. Individual cells do not have a
+ width; you must call setColumnWidth(colindex, width) (use units of
+ 1/256th of a character) on the Sheet object. (You can't do it on
+ an individual basis in the GUI either).</p>
+ <p>Cells are styled with CellStyle objects which in turn contain
+ a reference to an Font object. These are created via the
+ Workbook object by calling createCellStyle() and createFont().
+ Once you create the object you must set its parameters (colors,
+ borders, etc). To set a font for an CellStyle call
+ setFont(fontobj).
+ </p>
+ <p>Once you have generated your workbook, you can write it out by
+ calling write(outputStream) from your instance of Workbook, passing
+ it an OutputStream (for instance, a FileOutputStream or
+ ServletOutputStream). You must close the OutputStream yourself. HSSF
+ does not close it for you.
+ </p>
+ <p>Here is some example code (excerpted and adapted from
+ org.apache.poi.hssf.dev.HSSF test class):</p>
+<source><![CDATA[
+short rownum;
+
+// create a new file
+FileOutputStream out = new FileOutputStream("workbook.xls");
+// create a new workbook
+Workbook wb = new HSSFWorkbook();
+// create a new sheet
+Sheet s = wb.createSheet();
+// declare a row object reference
+Row r = null;
+// declare a cell object reference
+Cell c = null;
+// create 3 cell styles
+CellStyle cs = wb.createCellStyle();
+CellStyle cs2 = wb.createCellStyle();
+CellStyle cs3 = wb.createCellStyle();
+DataFormat df = wb.createDataFormat();
+// create 2 fonts objects
+Font f = wb.createFont();
+Font f2 = wb.createFont();
+
+//set font 1 to 12 point type
+f.setFontHeightInPoints((short) 12);
+//make it blue
+f.setColor( (short)0xc );
+// make it bold
+//arial is the default font
+f.setBoldweight(Font.BOLDWEIGHT_BOLD);
+
+//set font 2 to 10 point type
+f2.setFontHeightInPoints((short) 10);
+//make it red
+f2.setColor( (short)Font.COLOR_RED );
+//make it bold
+f2.setBoldweight(Font.BOLDWEIGHT_BOLD);
+
+f2.setStrikeout( true );
+
+//set cell stlye
+cs.setFont(f);
+//set the cell format
+cs.setDataFormat(df.getFormat("#,##0.0"));
+
+//set a thin border
+cs2.setBorderBottom(cs2.BORDER_THIN);
+//fill w fg fill color
+cs2.setFillPattern((short) CellStyle.SOLID_FOREGROUND);
+//set the cell format to text see DataFormat for a full list
+cs2.setDataFormat(HSSFDataFormat.getBuiltinFormat("text"));
+
+// set the font
+cs2.setFont(f2);
+
+// set the sheet name in Unicode
+wb.setSheetName(0, "\u0422\u0435\u0441\u0442\u043E\u0432\u0430\u044F " +
+ "\u0421\u0442\u0440\u0430\u043D\u0438\u0447\u043A\u0430" );
+// in case of plain ascii
+// wb.setSheetName(0, "HSSF Test");
+// create a sheet with 30 rows (0-29)
+int rownum;
+for (rownum = (short) 0; rownum < 30; rownum++)
+{
+ // create a row
+ r = s.createRow(rownum);
+ // on every other row
+ if ((rownum % 2) == 0)
+ {
+ // make the row height bigger (in twips - 1/20 of a point)
+ r.setHeight((short) 0x249);
+ }
+
+ //r.setRowNum(( short ) rownum);
+ // create 10 cells (0-9) (the += 2 becomes apparent later
+ for (short cellnum = (short) 0; cellnum < 10; cellnum += 2)
+ {
+ // create a numeric cell
+ c = r.createCell(cellnum);
+ // do some goofy math to demonstrate decimals
+ c.setCellValue(rownum * 10000 + cellnum
+ + (((double) rownum / 1000)
+ + ((double) cellnum / 10000)));
+
+ String cellValue;
+
+ // create a string cell (see why += 2 in the
+ c = r.createCell((short) (cellnum + 1));
+
+ // on every other row
+ if ((rownum % 2) == 0)
+ {
+ // set this cell to the first cell style we defined
+ c.setCellStyle(cs);
+ // set the cell's string value to "Test"
+ c.setCellValue( "Test" );
+ }
+ else
+ {
+ c.setCellStyle(cs2);
+ // set the cell's string value to "\u0422\u0435\u0441\u0442"
+ c.setCellValue( "\u0422\u0435\u0441\u0442" );
+ }
+
+
+ // make this column a bit wider
+ s.setColumnWidth((short) (cellnum + 1), (short) ((50 * 8) / ((double) 1 / 20)));
+ }
+}
+
+//draw a thick black border on the row at the bottom using BLANKS
+// advance 2 rows
+rownum++;
+rownum++;
+
+r = s.createRow(rownum);
+
+// define the third style to be the default
+// except with a thick black border at the bottom
+cs3.setBorderBottom(cs3.BORDER_THICK);
+
+//create 50 cells
+for (short cellnum = (short) 0; cellnum < 50; cellnum++)
+{
+ //create a blank type cell (no value)
+ c = r.createCell(cellnum);
+ // set it to the thick black border style
+ c.setCellStyle(cs3);
+}
+
+//end draw thick black border
+
+
+// demonstrate adding/naming and deleting a sheet
+// create a sheet, set its title then delete it
+s = wb.createSheet();
+wb.setSheetName(1, "DeletedSheet");
+wb.removeSheetAt(1);
+//end deleted sheet
+
+// write the workbook to the output stream
+// close our file (don't blow out our file handles
+wb.write(out);
+out.close();
+ ]]></source>
+ </section>
+ <section><title>Reading or modifying an existing file</title>
+
+<p>Reading in a file is equally simple. To read in a file, create a
+new instance of org.apache.poi.poifs.Filesystem, passing in an open InputStream, such as a FileInputStream
+for your XLS, to the constructor. Construct a new instance of
+org.apache.poi.hssf.usermodel.HSSFWorkbook passing the
+Filesystem instance to the constructor. From there you have access to
+all of the high level model objects through their assessor methods
+(workbook.getSheet(sheetNum), sheet.getRow(rownum), etc).
+</p>
+<p>Modifying the file you have read in is simple. You retrieve the
+object via an assessor method, remove it via a parent object's remove
+method (sheet.removeRow(hssfrow)) and create objects just as you
+would if creating a new xls. When you are done modifying cells just
+call workbook.write(outputstream) just as you did above.</p>
+<p>An example of this can be seen in
+<a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/usermodel/HSSFReadWrite.java">org.apache.poi.hssf.usermodel.examples.HSSFReadWrite</a>.</p>
+ </section>
+ </section>
+
+ <anchor id="event_api" />
+ <section><title>Event API (HSSF Only)</title>
+
+ <p>The event API is newer than the User API. It is intended for intermediate
+ developers who are willing to learn a little bit of the low level API
+ structures. Its relatively simple to use, but requires a basic
+ understanding of the parts of an Excel file (or willingness to
+ learn). The advantage provided is that you can read an XLS with a
+ relatively small memory footprint.
+ </p>
+ <p>One important thing to note with the basic Event API is that it
+ triggers events only for things actually stored within the file.
+ With the XLS file format, it is quite common for things that
+ have yet to be edited to simply not exist in the file. This means
+ there may well be apparent "gaps" in the record stream, which
+ you either need to work around, or use the
+ <a href="#record_aware_event_api">Record Aware</a> extension
+ to the Event API.</p>
+ <p>To use this API you construct an instance of
+ org.apache.poi.hssf.eventmodel.HSSFRequest. Register a class you
+ create that supports the
+ org.apache.poi.hssf.eventmodel.HSSFListener interface using the
+ HSSFRequest.addListener(yourlistener, recordsid). The recordsid
+ should be a static reference number (such as BOFRecord.sid) contained
+ in the classes in org.apache.poi.hssf.record. The trick is you
+ have to know what these records are. Alternatively you can call
+ HSSFRequest.addListenerForAllRecords(mylistener). In order to learn
+ about these records you can either read all of the javadoc in the
+ org.apache.poi.hssf.record package or you can just hack up a
+ copy of org.apache.poi.hssf.dev.EFHSSF and adapt it to your
+ needs. TODO: better documentation on records.</p>
+ <p>Once you've registered your listeners in the HSSFRequest object
+ you can construct an instance of
+ org.apache.poi.poifs.filesystem.FileSystem (see POIFS howto) and
+ pass it your XLS file inputstream. You can either pass this, along
+ with the request you constructed, to an instance of HSSFEventFactory
+ via the HSSFEventFactory.processWorkbookEvents(request, Filesystem)
+ method, or you can get an instance of DocumentInputStream from
+ Filesystem.createDocumentInputStream(&quot;Workbook&quot;) and pass
+ it to HSSFEventFactory.processEvents(request, inputStream). Once you
+ make this call, the listeners that you constructed receive calls to
+ their processRecord(Record) methods with each Record they are
+ registered to listen for until the file has been completely read.
+ </p>
+ <p>A code excerpt from org.apache.poi.hssf.dev.EFHSSF (which is
+ in CVS or the source distribution) is reprinted below with excessive
+ comments:</p>
+<source><![CDATA[
+/**
+ * This example shows how to use the event API for reading a file.
+ */
+public class EventExample
+ implements HSSFListener
+{
+ private SSTRecord sstrec;
+
+ /**
+ * This method listens for incoming records and handles them as required.
+ * @param record The record that was found while reading.
+ */
+ public void processRecord(Record record)
+ {
+ switch (record.getSid())
+ {
+ // the BOFRecord can represent either the beginning of a sheet or the workbook
+ case BOFRecord.sid:
+ BOFRecord bof = (BOFRecord) record;
+ if (bof.getType() == bof.TYPE_WORKBOOK)
+ {
+ System.out.println("Encountered workbook");
+ // assigned to the class level member
+ } else if (bof.getType() == bof.TYPE_WORKSHEET)
+ {
+ System.out.println("Encountered sheet reference");
+ }
+ break;
+ case BoundSheetRecord.sid:
+ BoundSheetRecord bsr = (BoundSheetRecord) record;
+ System.out.println("New sheet named: " + bsr.getSheetname());
+ break;
+ case RowRecord.sid:
+ RowRecord rowrec = (RowRecord) record;
+ System.out.println("Row found, first column at "
+ + rowrec.getFirstCol() + " last column at " + rowrec.getLastCol());
+ break;
+ case NumberRecord.sid:
+ NumberRecord numrec = (NumberRecord) record;
+ System.out.println("Cell found with value " + numrec.getValue()
+ + " at row " + numrec.getRow() + " and column " + numrec.getColumn());
+ break;
+ // SSTRecords store an array of unique strings used in Excel.
+ case SSTRecord.sid:
+ sstrec = (SSTRecord) record;
+ for (int k = 0; k < sstrec.getNumUniqueStrings(); k++)
+ {
+ System.out.println("String table value " + k + " = " + sstrec.getString(k));
+ }
+ break;
+ case LabelSSTRecord.sid:
+ LabelSSTRecord lrec = (LabelSSTRecord) record;
+ System.out.println("String cell found with value "
+ + sstrec.getString(lrec.getSSTIndex()));
+ break;
+ }
+ }
+
+ /**
+ * Read an excel file and spit out what we find.
+ *
+ * @param args Expect one argument that is the file to read.
+ * @throws IOException When there is an error processing the file.
+ */
+ public static void main(String[] args) throws IOException
+ {
+ // create a new file input stream with the input file specified
+ // at the command line
+ FileInputStream fin = new FileInputStream(args[0]);
+ // create a new org.apache.poi.poifs.filesystem.Filesystem
+ POIFSFileSystem poifs = new POIFSFileSystem(fin);
+ // get the Workbook (excel part) stream in a InputStream
+ InputStream din = poifs.createDocumentInputStream("Workbook");
+ // construct out HSSFRequest object
+ HSSFRequest req = new HSSFRequest();
+ // lazy listen for ALL records with the listener shown above
+ req.addListenerForAllRecords(new EventExample());
+ // create our event factory
+ HSSFEventFactory factory = new HSSFEventFactory();
+ // process our events based on the document input stream
+ factory.processEvents(req, din);
+ // once all the events are processed close our file input stream
+ fin.close();
+ // and our document input stream (don't want to leak these!)
+ din.close();
+ System.out.println("done.");
+ }
+}
+]]></source>
+ </section>
+
+ <anchor id="record_aware_event_api" />
+ <section><title>Record Aware Event API (HSSF Only)</title>
+<p>
+This is an extension to the normal
+<a href="#event_api">Event API</a>. With this, your listener
+will be called with extra, dummy records. These dummy records should
+alert you to records which aren't present in the file (eg cells that have
+yet to be edited), and allow you to handle these.
+</p>
+<p>
+There are three dummy records that your HSSFListener will be called with:
+</p>
+<ul>
+ <li>org.apache.poi.hssf.eventusermodel.dummyrecord.MissingRowDummyRecord
+ <br />
+ This is called during the row record phase (which typically occurs before
+ the cell records), and indicates that the row record for the given
+ row is not present in the file.</li>
+ <li>org.apache.poi.hssf.eventusermodel.dummyrecord.MissingCellDummyRecord
+ <br />
+ This is called during the cell record phase. It is called when a cell
+ record is encountered which leaves a gap between it an the previous one.
+ You can get multiple of these, before the real cell record.</li>
+ <li>org.apache.poi.hssf.eventusermodel.dummyrecord.LastCellOfRowDummyRecord
+ <br />
+ This is called after the last cell of a given row. It indicates that there
+ are no more cells for the row, and also tells you how many cells you have
+ had. For a row with no cells, this will be the only record you get.</li>
+</ul>
+<p>
+To use the Record Aware Event API, you should create an
+org.apache.poi.hssf.eventusermodel.MissingRecordAwareHSSFListener, and pass
+it your HSSFListener. Then, register the MissingRecordAwareHSSFListener
+to the event model, and start that as normal.
+</p>
+<p>
+One example use for this API is to write a CSV outputter, which always
+outputs a minimum number of columns, even where the file doesn't contain
+some of the rows or cells. It can be found at
+<code>/poi-examples/src/main/java/org/apache/poi/examples/hssf/eventusermodel/XLS2CSVmra.java</code>,
+and may be called on the command line, or from within your own code.
+The latest version is always available from
+<a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/hssf/eventusermodel/">git</a>.
+</p>
+<p>
+<em>In POI versions before 3.0.3, this code lived in the scratchpad section.
+ If you're using one of these older versions of POI, you will either
+ need to include the scratchpad jar on your classpath, or build from a</em>
+ <a href="site:git">git checkout</a>.
+</p>
+ </section>
+
+ <anchor id="xssf_sax_api"/>
+ <section><title>XSSF and SAX (Event API)</title>
+
+ <p>If memory footprint is an issue, then for XSSF, you can get at
+ the underlying XML data, and process it yourself. This is intended
+ for intermediate developers who are willing to learn a little bit of
+ low level structure of .xlsx files, and who are happy processing
+ XML in java. Its relatively simple to use, but requires a basic
+ understanding of the file structure. The advantage provided is that
+ you can read a XLSX file with a relatively small memory footprint.
+ </p>
+ <p>One important thing to note with the basic Event API is that it
+ triggers events only for things actually stored within the file.
+ With the XLSX file format, it is quite common for things that
+ have yet to be edited to simply not exist in the file. This means
+ there may well be apparent "gaps" in the record stream, which
+ you need to work around.</p>
+ <p>To use this API you construct an instance of
+ org.apache.poi.xssf.eventmodel.XSSFReader. This will optionally
+ provide a nice interface on the shared strings table, and the styles.
+ It provides methods to get the raw xml data from the rest of the
+ file, which you will then pass to SAX.</p>
+ <p>This example shows how to get at a single known sheet, or at
+ all sheets in the file. It is based on the example in
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/eventusermodel/FromHowTo.java">svn
+ poi-examples/src/main/java/org/apache/poi/examples/xssf/eventusermodel/FromHowTo.java</a></p>
+<source><![CDATA[
+import java.io.InputStream;
+import java.util.Iterator;
+
+import org.apache.poi.util.XMLHelper;
+import org.apache.poi.openxml4j.opc.OPCPackage;
+import org.apache.poi.xssf.eventusermodel.XSSFReader;
+import org.apache.poi.xssf.model.SharedStringsTable;
+import org.xml.sax.Attributes;
+import org.xml.sax.ContentHandler;
+import org.xml.sax.InputSource;
+import org.xml.sax.SAXException;
+import org.xml.sax.XMLReader;
+import org.xml.sax.helpers.DefaultHandler;
+
+import javax.xml.parsers.ParserConfigurationException;
+
+public class ExampleEventUserModel {
+ public void processOneSheet(String filename) throws Exception {
+ OPCPackage pkg = OPCPackage.open(filename);
+ XSSFReader r = new XSSFReader( pkg );
+ SharedStringsTable sst = r.getSharedStringsTable();
+
+ XMLReader parser = fetchSheetParser(sst);
+
+ // To look up the Sheet Name / Sheet Order / rID,
+ // you need to process the core Workbook stream.
+ // Normally it's of the form rId# or rSheet#
+ InputStream sheet2 = r.getSheet("rId2");
+ InputSource sheetSource = new InputSource(sheet2);
+ parser.parse(sheetSource);
+ sheet2.close();
+ }
+
+ public void processAllSheets(String filename) throws Exception {
+ OPCPackage pkg = OPCPackage.open(filename);
+ XSSFReader r = new XSSFReader( pkg );
+ SharedStringsTable sst = r.getSharedStringsTable();
+
+ XMLReader parser = fetchSheetParser(sst);
+
+ Iterator<InputStream> sheets = r.getSheetsData();
+ while(sheets.hasNext()) {
+ System.out.println("Processing new sheet:\n");
+ InputStream sheet = sheets.next();
+ InputSource sheetSource = new InputSource(sheet);
+ parser.parse(sheetSource);
+ sheet.close();
+ System.out.println("");
+ }
+ }
+
+ public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException, ParserConfigurationException {
+ XMLReader parser = XMLHelper.newXMLReader();
+ ContentHandler handler = new SheetHandler(sst);
+ parser.setContentHandler(handler);
+ return parser;
+ }
+
+ /**
+ * See org.xml.sax.helpers.DefaultHandler javadocs
+ */
+ private static class SheetHandler extends DefaultHandler {
+ private SharedStringsTable sst;
+ private String lastContents;
+ private boolean nextIsString;
+
+ private SheetHandler(SharedStringsTable sst) {
+ this.sst = sst;
+ }
+
+ public void startElement(String uri, String localName, String name,
+ Attributes attributes) throws SAXException {
+ // c => cell
+ if(name.equals("c")) {
+ // Print the cell reference
+ System.out.print(attributes.getValue("r") + " - ");
+ // Figure out if the value is an index in the SST
+ String cellType = attributes.getValue("t");
+ if(cellType != null && cellType.equals("s")) {
+ nextIsString = true;
+ } else {
+ nextIsString = false;
+ }
+ }
+ // Clear contents cache
+ lastContents = "";
+ }
+
+ public void endElement(String uri, String localName, String name)
+ throws SAXException {
+ // Process the last contents as required.
+ // Do now, as characters() may be called more than once
+ if(nextIsString) {
+ int idx = Integer.parseInt(lastContents);
+ lastContents = sst.getItemAt(idx).getString();
+ nextIsString = false;
+ }
+
+ // v => contents of a cell
+ // Output after we've seen the string contents
+ if(name.equals("v")) {
+ System.out.println(lastContents);
+ }
+ }
+
+ public void characters(char[] ch, int start, int length) {
+ lastContents += new String(ch, start, length);
+ }
+ }
+
+ public static void main(String[] args) throws Exception {
+ ExampleEventUserModel example = new ExampleEventUserModel();
+ example.processOneSheet(args[0]);
+ example.processAllSheets(args[0]);
+ }
+}
+]]></source>
+ <p>
+ For a fuller example, including support for fetching number formatting
+ information and applying it to numeric cells (eg to format dates or
+ percentages), please see
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/eventusermodel/XLSX2CSV.java">the XLSX2CSV example in svn</a>
+ </p>
+ <p>An example is also <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/streaming/HybridStreaming.java">provided</a>
+ showing how to combine the user API and the SAX API by doing a streaming parse
+ of larger worksheets and a traditional user-model parse of the rest of a workbook.</p>
+ </section>
+ <anchor id="sxssf"/>
+ <section><title>SXSSF (Streaming Usermodel API)</title>
+ <p>
+ SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when
+ very large spreadsheets have to be produced, and heap space is limited.
+ SXSSF achieves its low memory footprint by limiting access to the rows that
+ are within a sliding window, while XSSF gives access to all rows in the
+ document. Older rows that are no longer in the window become inaccessible,
+ as they are written to the disk.
+ </p>
+ <p>
+ You can specify the window size at workbook construction time via <em>new SXSSFWorkbook(int windowSize)</em>
+ or you can set it per-sheet via <em>SXSSFSheet#setRandomAccessWindowSize(int windowSize)</em>
+ </p>
+ <p>
+ When a new row is created via createRow() and the total number
+ of unflushed records would exceed the specified window size, then the
+ row with the lowest index value is flushed and cannot be accessed
+ via getRow() anymore.
+ </p>
+ <p>
+ The default window size is <em>100</em> and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.
+ </p>
+ <p>
+ A windowSize of -1 indicates unlimited access. In this case all
+ records that have not been flushed by a call to flushRows() are available
+ for random access.
+ </p>
+ <p>
+ Note that SXSSF allocates temporary files that you <strong>must</strong> always clean up explicitly, by calling the dispose method.
+ </p>
+ <p>
+ SXSSFWorkbook defaults to using inline strings instead of a shared strings
+ table. This is very efficient, since no document content needs to be kept in
+ memory, but is also known to produce documents that are incompatible with
+ some clients. With shared strings enabled all unique strings in the document
+ has to be kept in memory. Depending on your document content this could use
+ a lot more resources than with shared strings disabled.
+ </p>
+ <p>
+ Please note that there are still things that still may consume a large
+ amount of memory based on which features you are using, e.g. merged regions,
+ hyperlinks, comments, ... are still only stored in memory and thus may require a lot of
+ memory if used extensively.
+ </p>
+ <p>
+ Carefully review your memory budget and compatibility needs before deciding
+ whether to enable shared strings or not.
+ </p>
+ <p> The example below writes a sheet with a window of 100 rows. When the row count reaches 101,
+ the row with rownum=0 is flushed to disk and removed from memory, when rownum reaches 102 then the row with rownum=1 is flushed, etc.
+ </p>
+
+
+<source><![CDATA[
+import junit.framework.Assert;
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Sheet;
+import org.apache.poi.ss.usermodel.Workbook;
+import org.apache.poi.ss.util.CellReference;
+import org.apache.poi.xssf.streaming.SXSSFWorkbook;
+
+ public static void main(String[] args) throws Throwable {
+ SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
+ Sheet sh = wb.createSheet();
+ for(int rownum = 0; rownum < 1000; rownum++){
+ Row row = sh.createRow(rownum);
+ for(int cellnum = 0; cellnum < 10; cellnum++){
+ Cell cell = row.createCell(cellnum);
+ String address = new CellReference(cell).formatAsString();
+ cell.setCellValue(address);
+ }
+
+ }
+
+ // Rows with rownum < 900 are flushed and not accessible
+ for(int rownum = 0; rownum < 900; rownum++){
+ Assert.assertNull(sh.getRow(rownum));
+ }
+
+ // ther last 100 rows are still in memory
+ for(int rownum = 900; rownum < 1000; rownum++){
+ Assert.assertNotNull(sh.getRow(rownum));
+ }
+
+ FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
+ wb.write(out);
+ out.close();
+
+ // dispose of temporary files backing this workbook on disk
+ wb.dispose();
+ }
+
+
+]]></source>
+<p>The next example turns off auto-flushing (windowSize=-1) and the code manually controls how portions of data are written to disk</p>
+<source><![CDATA[
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Sheet;
+import org.apache.poi.ss.usermodel.Workbook;
+import org.apache.poi.ss.util.CellReference;
+import org.apache.poi.xssf.streaming.SXSSFWorkbook;
+
+ public static void main(String[] args) throws Throwable {
+ SXSSFWorkbook wb = new SXSSFWorkbook(-1); // turn off auto-flushing and accumulate all rows in memory
+ Sheet sh = wb.createSheet();
+ for(int rownum = 0; rownum < 1000; rownum++){
+ Row row = sh.createRow(rownum);
+ for(int cellnum = 0; cellnum < 10; cellnum++){
+ Cell cell = row.createCell(cellnum);
+ String address = new CellReference(cell).formatAsString();
+ cell.setCellValue(address);
+ }
+
+ // manually control how rows are flushed to disk
+ if(rownum % 100 == 0) {
+ ((SXSSFSheet)sh).flushRows(100); // retain 100 last rows and flush all others
+
+ // ((SXSSFSheet)sh).flushRows() is a shortcut for ((SXSSFSheet)sh).flushRows(0),
+ // this method flushes all rows
+ }
+
+ }
+
+ FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
+ wb.write(out);
+ out.close();
+
+ // dispose of temporary files backing this workbook on disk
+ wb.dispose();
+ }
+
+
+]]></source>
+<p>SXSSF flushes sheet data in temporary files (a temp file per sheet) and the size of these temporary files
+can grow to a very large value. For example, for a 20 MB csv data the size of the temp xml becomes more than a gigabyte.
+If the size of the temp files is an issue, you can tell SXSSF to use gzip compression:
+</p>
+<source><![CDATA[
+ SXSSFWorkbook wb = new SXSSFWorkbook();
+ wb.setCompressTempFiles(true); // temp files will be gzipped
+
+]]></source>
+ </section>
+
+ <anchor id="low_level_api" />
+ <section><title>Low Level APIs</title>
+
+<p>The low level API is not much to look at. It consists of lots of
+&quot;Records&quot; in the org.apache.poi.hssf.record.* package,
+and set of helper classes in org.apache.poi.hssf.model.*. The
+record classes are consistent with the low level binary structures
+inside a BIFF8 file (which is embedded in a POIFS file system). You
+probably need the book: &quot;Microsoft Excel 97 Developer's Kit&quot;
+from Microsoft Press in order to understand how these fit together
+(out of print but easily obtainable from Amazon's used books). In
+order to gain a good understanding of how to use the low level APIs
+should view the source in org.apache.poi.hssf.usermodel.* and
+the classes in org.apache.poi.hssf.model.*. You should read the
+documentation for the POIFS libraries as well.</p>
+ </section>
+ <section><title>Generating XLS from XML</title>
+<p>If you wish to generate an XLS file from some XML, it is possible to
+write your own XML processing code, then use the User API to write out
+the document.</p>
+<p>The other option is to use <a href="https://cocoon.apache.org/">Cocoon</a>.
+In Cocoon, there is the <a href="https://cocoon.apache.org/2.1/userdocs/xls-serializer.html">HSSF Serializer</a>,
+which takes in XML (in the gnumeric format), and outputs an XLS file for you.</p>
+ </section>
+ <section><title>HSSF Class/Test Application</title>
+
+<p>The HSSF application is nothing more than a test for the high
+level API (and indirectly the low level support). The main body of
+its code is repeated above. To run it:
+</p>
+<ul>
+ <li>download the poi-alpha build and untar it (tar xvzf
+ tarball.tar.gz)
+ </li>
+ <li>set up your classpath as follows:
+ <code>export HSSFDIR={wherever you put HSSF's jar files}
+export LOG4JDIR={wherever you put LOG4J's jar files}
+export CLASSPATH=$CLASSPATH:$HSSFDIR/hssf.jar:$HSSFDIR/poi-poifs.jar:$HSSFDIR/poi-util.jar:$LOG4JDIR/log4j.jar</code>
+ </li><li>type:
+ <code>java org.apache.poi.hssf.dev.HSSF ~/myxls.xls write</code></li>
+</ul>
+<p></p>
+<p>This should generate a test sheet in your home directory called <code>&quot;myxls.xls&quot;</code>. </p>
+<ul>
+ <li>Type:
+ <code>java org.apache.poi.hssf.dev.HSSF ~/input.xls output.xls</code>
+ <br/>
+ <br/>
+This is the read/write/modify test. It reads in the spreadsheet, modifies a cell, and writes it back out.
+Failing this test is not necessarily a bad thing. If HSSF tries to modify a non-existant sheet then this will
+most likely fail. No big deal. </li>
+</ul>
+ </section>
+ <section><title>HSSF Developer's Tools</title>
+
+<p>HSSF has a number of tools useful for developers to debug/develop
+stuff using HSSF (and more generally XLS files). We've already
+discussed the app for testing HSSF read/write/modify capabilities;
+now we'll talk a bit about BiffViewer. Early on in the development of
+HSSF, it was decided that knowing what was in a record, what was
+wrong with it, etc. was virtually impossible with the available
+tools. So we developed BiffViewer. You can find it at
+org.apache.poi.hssf.dev.BiffViewer. It performs two basic
+functions and a derivative.
+</p>
+<p>The first is &quot;biffview&quot;. To do this you run it (assumes
+you have everything setup in your classpath and that you know what
+you're doing enough to be thinking about this) with an xls file as a
+parameter. It will give you a listing of all understood records with
+their data and a list of not-yet-understood records with no data
+(because it doesn't know how to interpret them). This listing is
+useful for several things. First, you can look at the values and SEE
+what is wrong in quasi-English. Second, you can send the output to a
+file and compare it.
+</p>
+<p>The second function is &quot;big freakin dump&quot;, just pass a
+file and a second argument matching &quot;bfd&quot; exactly. This
+will just make a big hexdump of the file.
+</p>
+<p>Lastly, there is &quot;mixed&quot; mode which does the same as
+regular biffview, only it includes hex dumps of certain records
+intertwined. To use that just pass a file with a second argument
+matching &quot;on&quot; exactly.</p>
+<p>In the next release cycle we'll also have something called a
+FormulaViewer. The class is already there, but its not very useful
+yet. When it does something, we'll document it.</p>
+
+ </section>
+ <section><title>What's Next?</title>
+
+<p>Further effort on HSSF is going to focus on the following major areas: </p>
+<ul>
+<li>Performance: POI currently uses a lot of memory for large sheets.</li>
+<li>Charts: This is a hard problem, with very little documentation.</li>
+</ul>
+<p><a href="site:guidelines"> So jump in! </a> </p>
+
+ </section>
+
+</section>
+</body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/index.xml b/src/documentation/content/xdocs/components/spreadsheet/index.xml
new file mode 100644
index 0000000000..ec262b554c
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/index.xml
@@ -0,0 +1,119 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-HSSF and POI-XSSF/SXSSF - Java API To Access Microsoft Excel Format Files</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Andrew C. Oliver" email="acoliver@apache.org"/>
+ <person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
+ </authors>
+ </header>
+
+ <body>
+ <section>
+ <title>Overview</title>
+
+ <p>HSSF is the POI Project's pure Java implementation of the
+ Excel '97(-2007) file format. XSSF is the POI Project's pure
+ Java implementation of the Excel 2007 OOXML (.xlsx) file
+ format.</p>
+ <p>HSSF and XSSF provides ways to read spreadsheets create,
+ modify, read and write XLS spreadsheets. They provide:
+ </p>
+ <ul>
+ <li>low level structures for those with special needs</li>
+ <li>an eventmodel api for efficient read-only access</li>
+ <li>a full usermodel api for creating, reading and modifying XLS files</li>
+ </ul>
+ <p>For people converting from pure HSSF usermodel, who wish
+ to use the joint SS Usermodel for HSSF and XSSF support, then
+ see the <a href="converting.html">ss usermodel converting
+ guide</a>.
+ </p>
+ <p>
+ An alternate way of generating a spreadsheet is via the <a href="https://cocoon.apache.org">Cocoon</a> serializer (yet you'll still be using HSSF indirectly).
+ With Cocoon you can serialize any XML datasource (which might be a ESQL page outputting in SQL for instance) by simply
+ applying the stylesheet and designating the serializer.
+ </p>
+ <p>
+ If you're merely reading spreadsheet data, then use the
+ eventmodel api in either the org.apache.poi.hssf.eventusermodel
+ package, or the org.apache.poi.xssf.eventusermodel package, depending
+ on your file format.
+ </p>
+ <p>
+ If you're modifying spreadsheet data then use the usermodel api. You
+ can also generate spreadsheets this way.
+ </p>
+ <p>
+ Note that the usermodel system has a higher memory footprint than
+ the low level eventusermodel, but has the major advantage of being
+ much simpler to work with. Also please be aware that as the new
+ XSSF supported Excel 2007 OOXML (.xlsx) files are XML based,
+ the memory footprint for processing them is higher than for the
+ older HSSF supported (.xls) binary files.
+ </p>
+
+
+
+ </section>
+
+<section>
+<title>SXSSF (Since POI 3.8 beta3)</title>
+<p>Since 3.8-beta3, POI provides a low-memory footprint SXSSF API built on top of XSSF.</p>
+<p>
+SXSSF is an API-compatible streaming extension of XSSF to be used when
+very large spreadsheets have to be produced, and heap space is limited.
+SXSSF achieves its low memory footprint by limiting access to the rows that
+are within a sliding window, while XSSF gives access to all rows in the
+document. Older rows that are no longer in the window become inaccessible,
+as they are written to the disk.
+</p>
+<p>
+In auto-flush mode the size of the access window can be specified, to hold a certain number of rows in memory.
+When that value is reached, the creation of an additional row causes the row with the lowest index to to be
+removed from the access window and written to disk. Or, the window size can be set to grow dynamically;
+it can be trimmed periodically by an explicit call to flushRows(int keepRows) as needed.
+</p>
+<p>
+Due to the streaming nature of the implementation, there are the following
+limitations when compared to XSSF:
+</p>
+ <ul>
+ <li>Only a limited number of rows are accessible at a point in time.</li>
+ <li>Sheet.clone() is not supported.</li>
+ <li>Formula evaluation is not supported</li>
+ </ul>
+
+ <p> See more details at <a href="how-to.html#sxssf">SXSSF How-To</a></p>
+
+<p>The table below synopsizes the comparative features of POI's Spreadsheet API:</p>
+ <p><em>Spreadsheet API Feature Summary</em></p>
+
+ <p>
+ <img src="images/ss-features.png" alt="Spreadsheet API Feature Summary"/>
+ </p>
+</section>
+
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/limitations.xml b/src/documentation/content/xdocs/components/spreadsheet/limitations.xml
new file mode 100644
index 0000000000..ce9f8afc56
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/limitations.xml
@@ -0,0 +1,99 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Apache POI™ - HSSF and XSSF Limitations</title>
+ <authors>
+ <person email="user@poi.apache.org" name="Glen Stampoultzis" id="GJS"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>Current HSSF / XSSF main limitations</title>
+ <p>
+ The intent of this document is to outline some of the known limitations of the
+ POI HSSF and XSSF APIs. It is not intended to be complete list of every bug
+ or missing feature of HSSF or XSSF, rather it's purpose is to provide a broad
+ feel for some of the functionality that is missing or broken.
+ </p>
+ <ul>
+ <li>
+ File sizes/Memory usage<br/><br/>
+ <ul>
+ <li>
+ There are some inherent limits in the Excel file formats. These are defined in class
+ <a href="../../apidocs/dev/org/apache/poi/ss/SpreadsheetVersion.html">SpreadsheetVersion</a>.
+ As long as you have enough main-memory, you should be able to handle files up to these limits. For huge files
+ using the default POI classes you will likely need a very large amount of memory.
+ <br/>
+ <br/>
+ There are ways to overcome the main-memory limitations if needed:
+ <br/>
+ <ul>
+ <li>
+ For writing very huge files, there is <a href="site:spreadsheet">SXSSFWorkbook</a>
+ which allows to do a streaming write of data out to files (with certain limitations on what you can do as only
+ parts of the file are held in memory).
+ </li>
+ <li>
+ For reading very huge files, take a look at the sample
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/xssf/eventusermodel/XLSX2CSV.java">XLSX2CSV</a>
+ which shows how you can read a file in streaming fashion (again with some limitations on what information you
+ can read out of the file, but there are ways to get at most of it if necessary).
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </li>
+ <li>
+ Charts<br/><br/>
+ <ul>
+ <li>
+ HSSF has some limited support for creating a handful of very simple Chart types,
+ but largely this isn't supported. HSSF (largely) doesn't support changing Charts.
+ You can however create a chart in Excel using Named ranges, modify the chart data
+ values using HSSF and write a new spreadsheet out. This is possible because POI
+ attempts to keep existing records intact as far as possible.<br/>
+ </li>
+ <li>
+ XSSF has only limited chart support including making some simple changes
+ and adding at least some line and scatter charts, see the examples <em>LineChart</em>
+ and <em>ScatterChart</em>.<br/><br/>
+ </li>
+ </ul>
+ </li>
+ <li>
+ Macros<br/><br/>
+ Macros can not be created. The are currently no plans to support macros.
+ However, reading and re-writing files containing macros will safely preserve
+ the macros. Recent versions of Apache POI support extracting the macro data
+ via <a href="../../apidocs/dev/org/apache/poi/poifs/macros/VBAMacroExtractor.html">VBAMacroExtractor</a>
+ and <a href="../../apidocs/dev/org/apache/poi/poifs/macros/VBAMacroReader.html">VBAMacroReader</a><br/><br/>
+ </li>
+ <li>
+ Pivot Tables<br/><br/>
+ HSSF doesn't have support for reading or creating Pivot tables. XSSF has limited
+ support for creating Pivot Tables, and very limited read/change support.
+ </li>
+ </ul>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/quick-guide.xml b/src/documentation/content/xdocs/components/spreadsheet/quick-guide.xml
new file mode 100644
index 0000000000..3ef09310b7
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/quick-guide.xml
@@ -0,0 +1,2455 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Busy Developers' Guide to HSSF and XSSF Features</title>
+ </header>
+ <body>
+ <section><title>Busy Developers' Guide to Features</title>
+ <p>
+ Want to use HSSF and XSSF read and write spreadsheets in a hurry? This
+ guide is for you. If you're after more in-depth coverage of the HSSF and
+ XSSF user-APIs, please consult the <a href="how-to.html">HOWTO</a>
+ guide as it contains actual descriptions of how to use this stuff.
+ </p>
+ <section><title>Index of Features</title>
+ <ul>
+ <li><a href="#NewWorkbook">How to create a new workbook</a></li>
+ <li><a href="#NewSheet">How to create a sheet</a></li>
+ <li><a href="#CreateCells">How to create cells</a></li>
+ <li><a href="#CreateDateCells">How to create date cells</a></li>
+ <li><a href="#CellTypes">Working with different types of cells</a></li>
+ <li><a href="#Iterator">Iterate over rows and cells</a></li>
+ <li><a href="#CellContents">Getting the cell contents</a></li>
+ <li><a href="#TextExtraction">Text Extraction</a></li>
+ <li><a href="#FileInputStream">Files vs InputStreams</a></li>
+ <li><a href="#Alignment">Aligning cells</a></li>
+ <li><a href="#Borders">Working with borders</a></li>
+ <li><a href="#FillsAndFrills">Fills and color</a></li>
+ <li><a href="#MergedCells">Merging cells</a></li>
+ <li><a href="#WorkingWithFonts">Working with fonts</a></li>
+ <li><a href="#CustomColors">Custom colors</a></li>
+ <li><a href="#ReadWriteWorkbook">Reading and writing</a></li>
+ <li><a href="#NewLinesInCells">Use newlines in cells.</a></li>
+ <li><a href="#DataFormats">Create user defined data formats</a></li>
+ <li><a href="#FitTo">Fit Sheet to One Page</a></li>
+ <li><a href="#PrintArea2">Set print area for a sheet</a></li>
+ <li><a href="#FooterPageNumbers">Set page numbers on the footer of a sheet</a></li>
+ <li><a href="#ShiftRows">Shift rows</a></li>
+ <li><a href="#SelectSheet">Set a sheet as selected</a></li>
+ <li><a href="#Zoom">Set the zoom magnification for a sheet</a></li>
+ <li><a href="#Splits">Create split and freeze panes</a></li>
+ <li><a href="#Repeating">Repeating rows and columns</a></li>
+ <li><a href="#HeaderFooter">Headers and Footers</a></li>
+ <li><a href="#XSSFHeaderFooter">XSSF enhancement for Headers and Footers</a></li>
+ <li><a href="#DrawingShapes">Drawing Shapes</a></li>
+ <li><a href="#StylingShapes">Styling Shapes</a></li>
+ <li><a href="#Graphics2d">Shapes and Graphics2d</a></li>
+ <li><a href="#Outlining">Outlining</a></li>
+ <li><a href="#Images">Images</a></li>
+ <li><a href="#NamedRanges">Named Ranges and Named Cells</a></li>
+ <li><a href="#CellComments">How to set cell comments</a></li>
+ <li><a href="#Autofit">How to adjust column width to fit the contents</a></li>
+ <li><a href="#Hyperlinks">Hyperlinks</a></li>
+ <li><a href="#Validation">Data Validations</a></li>
+ <li><a href="#Embedded">Embedded Objects</a></li>
+ <li><a href="#Autofilter">Autofilters</a></li>
+ <li><a href="#ConditionalFormatting">Conditional Formatting</a></li>
+ <li><a href="#Hiding">Hiding and Un-Hiding Rows</a></li>
+ <li><a href="#CellProperties">Setting Cell Properties</a></li>
+ <li><a href="#DrawingBorders">Drawing Borders</a></li>
+ <li><a href="#PivotTable">Create a Pivot Table</a></li>
+ <li><a href="#RichText">Cells with multiple styles</a></li>
+ </ul>
+ </section>
+ <section><title>Features</title>
+ <anchor id="NewWorkbook"/>
+ <section><title>New Workbook</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ ...
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+
+ Workbook wb = new XSSFWorkbook();
+ ...
+ try (OutputStream fileOut = new FileOutputStream("workbook.xlsx")) {
+ wb.write(fileOut);
+ }
+ </source>
+ </section>
+ <anchor id="NewSheet"/>
+ <section><title>New Sheet</title>
+ <source>
+ Workbook wb = new HSSFWorkbook(); // or new XSSFWorkbook();
+ Sheet sheet1 = wb.createSheet("new sheet");
+ Sheet sheet2 = wb.createSheet("second sheet");
+
+ // Note that sheet name is Excel must not exceed 31 characters
+ // and must not contain any of the any of the following characters:
+ // 0x0000
+ // 0x0003
+ // colon (:)
+ // backslash (\)
+ // asterisk (*)
+ // question mark (?)
+ // forward slash (/)
+ // opening square bracket ([)
+ // closing square bracket (])
+
+ // You can use org.apache.poi.ss.util.WorkbookUtil#createSafeSheetName(String nameProposal)}
+ // for a safe way to create valid names, this utility replaces invalid characters with a space (' ')
+ String safeName = WorkbookUtil.createSafeSheetName("[O'Brien's sales*?]"); // returns " O'Brien's sales "
+ Sheet sheet3 = wb.createSheet(safeName);
+
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+ </source>
+ </section>
+ <anchor id="CreateCells"/>
+ <section><title>Creating Cells</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ //Workbook wb = new XSSFWorkbook();
+ CreationHelper createHelper = wb.getCreationHelper();
+ Sheet sheet = wb.createSheet("new sheet");
+
+ // Create a row and put some cells in it. Rows are 0 based.
+ Row row = sheet.createRow(0);
+ // Create a cell and put a value in it.
+ Cell cell = row.createCell(0);
+ cell.setCellValue(1);
+
+ // Or do it on one line.
+ row.createCell(1).setCellValue(1.2);
+ row.createCell(2).setCellValue(
+ createHelper.createRichTextString("This is a string"));
+ row.createCell(3).setCellValue(true);
+
+ // Write the output to a file
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+ </source>
+ </section>
+ <anchor id="CreateDateCells"/>
+ <section><title>Creating Date Cells</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ //Workbook wb = new XSSFWorkbook();
+ CreationHelper createHelper = wb.getCreationHelper();
+ Sheet sheet = wb.createSheet("new sheet");
+
+ // Create a row and put some cells in it. Rows are 0 based.
+ Row row = sheet.createRow(0);
+
+ // Create a cell and put a date value in it. The first cell is not styled
+ // as a date.
+ Cell cell = row.createCell(0);
+ cell.setCellValue(new Date());
+
+ // we style the second cell as a date (and time). It is important to
+ // create a new cell style from the workbook otherwise you can end up
+ // modifying the built in style and effecting not only this cell but other cells.
+ CellStyle cellStyle = wb.createCellStyle();
+ cellStyle.setDataFormat(
+ createHelper.createDataFormat().getFormat("m/d/yy h:mm"));
+ cell = row.createCell(1);
+ cell.setCellValue(new Date());
+ cell.setCellStyle(cellStyle);
+
+ //you can also set date as java.util.Calendar
+ cell = row.createCell(2);
+ cell.setCellValue(Calendar.getInstance());
+ cell.setCellStyle(cellStyle);
+
+ // Write the output to a file
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+ </source>
+ </section>
+ <anchor id="CellTypes"/>
+ <section><title>Working with different types of cells</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet("new sheet");
+ Row row = sheet.createRow(2);
+ row.createCell(0).setCellValue(1.1);
+ row.createCell(1).setCellValue(new Date());
+ row.createCell(2).setCellValue(Calendar.getInstance());
+ row.createCell(3).setCellValue("a string");
+ row.createCell(4).setCellValue(true);
+ row.createCell(5).setCellType(CellType.ERROR);
+
+ // Write the output to a file
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+ </source>
+ </section>
+
+ <anchor id="FileInputStream"/>
+ <section><title>Files vs InputStreams</title>
+ <p>When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx
+ XSSFWorkbook, the Workbook can be loaded from either a <em>File</em>
+ or an <em>InputStream</em>. Using a <em>File</em> object allows for
+ lower memory consumption, while an <em>InputStream</em> requires more
+ memory as it has to buffer the whole file.</p>
+ <p>If using <em>WorkbookFactory</em>, it's very easy to use one or
+ the other:</p>
+ <source>
+ // Use a file
+ Workbook wb = WorkbookFactory.create(new File("MyExcel.xls"));
+
+ // Use an InputStream, needs more memory
+ Workbook wb = WorkbookFactory.create(new FileInputStream("MyExcel.xlsx"));
+ </source>
+ <p>If using <em>HSSFWorkbook</em> or <em>XSSFWorkbook</em> directly,
+ you should generally go through <em>POIFSFileSystem</em> or
+ <em>OPCPackage</em>, to have full control of the lifecycle (including
+ closing the file when done):</p>
+ <source>
+ // HSSFWorkbook, File
+ POIFSFileSystem fs = new POIFSFileSystem(new File("file.xls"));
+ HSSFWorkbook wb = new HSSFWorkbook(fs.getRoot(), true);
+ ....
+ fs.close();
+
+ // HSSFWorkbook, InputStream, needs more memory
+ POIFSFileSystem fs = new POIFSFileSystem(myInputStream);
+ HSSFWorkbook wb = new HSSFWorkbook(fs.getRoot(), true);
+
+ // XSSFWorkbook, File
+ OPCPackage pkg = OPCPackage.open(new File("file.xlsx"));
+ XSSFWorkbook wb = new XSSFWorkbook(pkg);
+ ....
+ pkg.close();
+
+ // XSSFWorkbook, InputStream, needs more memory
+ OPCPackage pkg = OPCPackage.open(myInputStream);
+ XSSFWorkbook wb = new XSSFWorkbook(pkg);
+ ....
+ pkg.close();
+ </source>
+ </section>
+
+ <anchor id="Alignment"/>
+ <section><title>Demonstrates various alignment options</title>
+ <source>
+ public static void main(String[] args) throws Exception {
+ Workbook wb = new XSSFWorkbook(); //or new HSSFWorkbook();
+
+ Sheet sheet = wb.createSheet();
+ Row row = sheet.createRow(2);
+ row.setHeightInPoints(30);
+
+ createCell(wb, row, 0, HorizontalAlignment.CENTER, VerticalAlignment.BOTTOM);
+ createCell(wb, row, 1, HorizontalAlignment.CENTER_SELECTION, VerticalAlignment.BOTTOM);
+ createCell(wb, row, 2, HorizontalAlignment.FILL, VerticalAlignment.CENTER);
+ createCell(wb, row, 3, HorizontalAlignment.GENERAL, VerticalAlignment.CENTER);
+ createCell(wb, row, 4, HorizontalAlignment.JUSTIFY, VerticalAlignment.JUSTIFY);
+ createCell(wb, row, 5, HorizontalAlignment.LEFT, VerticalAlignment.TOP);
+ createCell(wb, row, 6, HorizontalAlignment.RIGHT, VerticalAlignment.TOP);
+
+ // Write the output to a file
+ try (OutputStream fileOut = new FileOutputStream("xssf-align.xlsx")) {
+ wb.write(fileOut);
+ }
+
+ wb.close();
+ }
+
+ /**
+ * Creates a cell and aligns it a certain way.
+ *
+ * @param wb the workbook
+ * @param row the row to create the cell in
+ * @param column the column number to create the cell in
+ * @param halign the horizontal alignment for the cell.
+ * @param valign the vertical alignment for the cell.
+ */
+ private static void createCell(Workbook wb, Row row, int column, HorizontalAlignment halign, VerticalAlignment valign) {
+ Cell cell = row.createCell(column);
+ cell.setCellValue("Align It");
+ CellStyle cellStyle = wb.createCellStyle();
+ cellStyle.setAlignment(halign);
+ cellStyle.setVerticalAlignment(valign);
+ cell.setCellStyle(cellStyle);
+ }
+ </source>
+ </section>
+ <anchor id="Borders"/>
+ <section><title>Working with borders</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet("new sheet");
+
+ // Create a row and put some cells in it. Rows are 0 based.
+ Row row = sheet.createRow(1);
+
+ // Create a cell and put a value in it.
+ Cell cell = row.createCell(1);
+ cell.setCellValue(4);
+
+ // Style the cell with borders all around.
+ CellStyle style = wb.createCellStyle();
+ style.setBorderBottom(BorderStyle.THIN);
+ style.setBottomBorderColor(IndexedColors.BLACK.getIndex());
+ style.setBorderLeft(BorderStyle.THIN);
+ style.setLeftBorderColor(IndexedColors.GREEN.getIndex());
+ style.setBorderRight(BorderStyle.THIN);
+ style.setRightBorderColor(IndexedColors.BLUE.getIndex());
+ style.setBorderTop(BorderStyle.MEDIUM_DASHED);
+ style.setTopBorderColor(IndexedColors.BLACK.getIndex());
+ cell.setCellStyle(style);
+
+ // Write the output to a file
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+
+ wb.close();
+ </source>
+ </section>
+ <anchor id="Iterator"/>
+ <section><title>Iterate over rows and cells</title>
+ <p>Sometimes, you'd like to just iterate over all the sheets in
+ a workbook, all the rows in a sheet, or all the cells in a row.
+ This is possible with a simple for loop.</p>
+ <p>These iterators are available by calling <em>workbook.sheetIterator()</em>,
+ <em>sheet.rowIterator()</em>, and <em>row.cellIterator()</em>, or
+ implicitly using a for-each loop.
+ Note that a rowIterator and cellIterator iterate over rows or
+ cells that have been created, skipping empty rows and cells.</p>
+
+ <source>
+ for (Sheet sheet : wb ) {
+ for (Row row : sheet) {
+ for (Cell cell : row) {
+ // Do something here
+ }
+ }
+ }
+ </source>
+ </section>
+ <section><title>Iterate over cells, with control of missing / blank cells</title>
+ <p>In some cases, when iterating, you need full control over how
+ missing or blank rows and cells are treated, and you need to ensure
+ you visit every cell and not just those defined in the file. (The
+ CellIterator will only return the cells defined in the file, which
+ is largely those with values or stylings, but it depends on Excel).</p>
+ <p>In cases such as these, you should fetch the first and last column
+ information for a row, then call <em>getCell(int, MissingCellPolicy)</em>
+ to fetch the cell. Use a
+ <a href="../../apidocs/dev/org/apache/poi/ss/usermodel/Row.MissingCellPolicy.html">MissingCellPolicy</a>
+ to control how blank or null cells are handled.</p>
+ <source>
+ // Decide which rows to process
+ int rowStart = Math.min(15, sheet.getFirstRowNum());
+ int rowEnd = Math.max(1400, sheet.getLastRowNum());
+
+ for (int rowNum = rowStart; rowNum &lt; rowEnd; rowNum++) {
+ Row r = sheet.getRow(rowNum);
+ if (r == null) {
+ // This whole row is empty
+ // Handle it as needed
+ continue;
+ }
+
+ int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);
+
+ for (int cn = 0; cn &lt; lastColumn; cn++) {
+ Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
+ if (c == null) {
+ // The spreadsheet is empty in this cell
+ } else {
+ // Do something useful with the cell's contents
+ }
+ }
+ }
+ </source>
+ </section>
+
+ <anchor id="CellContents"/>
+ <section><title>Getting the cell contents</title>
+ <p>To get the contents of a cell, you first need to
+ know what kind of cell it is (asking a string cell
+ for its numeric contents will get you a
+ NumberFormatException for example). So, you will
+ want to switch on the cell's type, and then call
+ the appropriate getter for that cell.</p>
+ <p>In the code below, we loop over every cell
+ in one sheet, print out the cell's reference
+ (eg A3), and then the cell's contents.</p>
+ <source>
+ // import org.apache.poi.ss.usermodel.*;
+
+ DataFormatter formatter = new DataFormatter();
+ Sheet sheet1 = wb.getSheetAt(0);
+ for (Row row : sheet1) {
+ for (Cell cell : row) {
+ CellReference cellRef = new CellReference(row.getRowNum(), cell.getColumnIndex());
+ System.out.print(cellRef.formatAsString());
+ System.out.print(" - ");
+
+ // get the text that appears in the cell by getting the cell value and applying any data formats (Date, 0.00, 1.23e9, $1.23, etc)
+ String text = formatter.formatCellValue(cell);
+ System.out.println(text);
+
+ // Alternatively, get the value and format it yourself
+ switch (cell.getCellType()) {
+ case CellType.STRING:
+ System.out.println(cell.getRichStringCellValue().getString());
+ break;
+ case CellType.NUMERIC:
+ if (DateUtil.isCellDateFormatted(cell)) {
+ System.out.println(cell.getDateCellValue());
+ } else {
+ System.out.println(cell.getNumericCellValue());
+ }
+ break;
+ case CellType.BOOLEAN:
+ System.out.println(cell.getBooleanCellValue());
+ break;
+ case CellType.FORMULA:
+ System.out.println(cell.getCellFormula());
+ break;
+ case CellType.BLANK:
+ System.out.println();
+ break;
+ default:
+ System.out.println();
+ }
+ }
+ }
+ </source>
+ </section>
+
+ <anchor id="TextExtraction"/>
+ <section><title>Text Extraction</title>
+ <p>For most text extraction requirements, the standard
+ ExcelExtractor class should provide all you need.</p>
+ <source>
+ try (InputStream inp = new FileInputStream("workbook.xls")) {
+ HSSFWorkbook wb = new HSSFWorkbook(new POIFSFileSystem(inp));
+ ExcelExtractor extractor = new ExcelExtractor(wb);
+
+ extractor.setFormulasNotResults(true);
+ extractor.setIncludeSheetNames(false);
+ String text = extractor.getText();
+ wb.close();
+ }
+ </source>
+ <p>For very fancy text extraction, XLS to CSV etc,
+ take a look at
+ <em>/poi-examples/src/main/java/org/apache/poi/examples/hssf/eventusermodel/XLS2CSVmra.java</em>
+ </p>
+ </section>
+ <anchor id="FillsAndFrills"/>
+ <section><title>Fills and colors</title>
+ <source>
+ Workbook wb = new XSSFWorkbook();
+ Sheet sheet = wb.createSheet("new sheet");
+
+ // Create a row and put some cells in it. Rows are 0 based.
+ Row row = sheet.createRow(1);
+
+ // Aqua background
+ CellStyle style = wb.createCellStyle();
+ style.setFillBackgroundColor(IndexedColors.AQUA.getIndex());
+ style.setFillPattern(FillPatternType.BIG_SPOTS);
+ Cell cell = row.createCell(1);
+ cell.setCellValue("X");
+ cell.setCellStyle(style);
+
+ // Orange "foreground", foreground being the fill foreground not the font color.
+ style = wb.createCellStyle();
+ style.setFillForegroundColor(IndexedColors.ORANGE.getIndex());
+ style.setFillPattern(FillPatternType.SOLID_FOREGROUND);
+ cell = row.createCell(2);
+ cell.setCellValue("X");
+ cell.setCellStyle(style);
+
+ // Write the output to a file
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+
+ wb.close();
+ </source>
+ </section>
+ <anchor id="MergedCells"/>
+ <section><title>Merging cells</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet("new sheet");
+
+ Row row = sheet.createRow(1);
+ Cell cell = row.createCell(1);
+ cell.setCellValue("This is a test of merging");
+
+ sheet.addMergedRegion(new CellRangeAddress(
+ 1, //first row (0-based)
+ 1, //last row (0-based)
+ 1, //first column (0-based)
+ 2 //last column (0-based)
+ ));
+
+ // Write the output to a file
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+
+ wb.close();
+ </source>
+ </section>
+ <anchor id="WorkingWithFonts"/>
+ <section><title>Working with fonts</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet("new sheet");
+
+ // Create a row and put some cells in it. Rows are 0 based.
+ Row row = sheet.createRow(1);
+
+ // Create a new font and alter it.
+ Font font = wb.createFont();
+ font.setFontHeightInPoints((short)24);
+ font.setFontName("Courier New");
+ font.setItalic(true);
+ font.setStrikeout(true);
+
+ // Fonts are set into a style so create a new one to use.
+ CellStyle style = wb.createCellStyle();
+ style.setFont(font);
+
+ // Create a cell and put a value in it.
+ Cell cell = row.createCell(1);
+ cell.setCellValue("This is a test of fonts");
+ cell.setCellStyle(style);
+
+ // Write the output to a file
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+
+ wb.close();
+ </source>
+<p>
+ Note, the maximum number of unique fonts in a workbook is limited to 32767. You should re-use fonts in your applications instead of
+ creating a font for each cell.
+Examples:
+</p>
+<p><strong>Wrong:</strong></p>
+<source>
+ for (int i = 0; i &lt; 10000; i++) {
+ Row row = sheet.createRow(i);
+ Cell cell = row.createCell(0);
+
+ CellStyle style = workbook.createCellStyle();
+ Font font = workbook.createFont();
+ font.setBoldweight(Font.BOLDWEIGHT_BOLD);
+ style.setFont(font);
+ cell.setCellStyle(style);
+ }
+</source>
+<p><strong>Correct:</strong></p>
+<source>
+ CellStyle style = workbook.createCellStyle();
+ Font font = workbook.createFont();
+ font.setBoldweight(Font.BOLDWEIGHT_BOLD);
+ style.setFont(font);
+ for (int i = 0; i &lt; 10000; i++) {
+ Row row = sheet.createRow(i);
+ Cell cell = row.createCell(0);
+ cell.setCellStyle(style);
+ }
+</source>
+
+ </section>
+ <anchor id="CustomColors"/>
+ <section><title>Custom colors</title>
+ <p><strong>HSSF:</strong></p>
+ <source>
+ HSSFWorkbook wb = new HSSFWorkbook();
+ HSSFSheet sheet = wb.createSheet();
+ HSSFRow row = sheet.createRow(0);
+ HSSFCell cell = row.createCell(0);
+ cell.setCellValue("Default Palette");
+
+ //apply some colors from the standard palette,
+ // as in the previous examples.
+ //we'll use red text on a lime background
+
+ HSSFCellStyle style = wb.createCellStyle();
+ style.setFillForegroundColor(HSSFColor.LIME.index);
+ style.setFillPattern(FillPatternType.SOLID_FOREGROUND);
+
+ HSSFFont font = wb.createFont();
+ font.setColor(HSSFColor.RED.index);
+ style.setFont(font);
+
+ cell.setCellStyle(style);
+
+ //save with the default palette
+ try (OutputStream out = new FileOutputStream("default_palette.xls")) {
+ wb.write(out);
+ }
+
+ //now, let's replace RED and LIME in the palette
+ // with a more attractive combination
+ // (lovingly borrowed from freebsd.org)
+
+ cell.setCellValue("Modified Palette");
+
+ //creating a custom palette for the workbook
+ HSSFPalette palette = wb.getCustomPalette();
+
+ //replacing the standard red with freebsd.org red
+ palette.setColorAtIndex(HSSFColor.RED.index,
+ (byte) 153, //RGB red (0-255)
+ (byte) 0, //RGB green
+ (byte) 0 //RGB blue
+ );
+ //replacing lime with freebsd.org gold
+ palette.setColorAtIndex(HSSFColor.LIME.index, (byte) 255, (byte) 204, (byte) 102);
+
+ //save with the modified palette
+ // note that wherever we have previously used RED or LIME, the
+ // new colors magically appear
+ try (out = new FileOutputStream("modified_palette.xls")) {
+ wb.write(out);
+ }
+ </source>
+ <p><strong>XSSF:</strong></p>
+ <source>
+ XSSFWorkbook wb = new XSSFWorkbook();
+ XSSFSheet sheet = wb.createSheet();
+ XSSFRow row = sheet.createRow(0);
+ XSSFCell cell = row.createCell( 0);
+ cell.setCellValue("custom XSSF colors");
+
+ XSSFCellStyle style1 = wb.createCellStyle();
+ style1.setFillForegroundColor(new XSSFColor(new java.awt.Color(128, 0, 128), new DefaultIndexedColorMap()));
+ style1.setFillPattern(FillPatternType.SOLID_FOREGROUND);
+ </source>
+ </section>
+ <anchor id="ReadWriteWorkbook"/>
+ <section><title>Reading and Rewriting Workbooks</title>
+ <source>
+ try (InputStream inp = new FileInputStream("workbook.xls")) {
+ //InputStream inp = new FileInputStream("workbook.xlsx");
+
+ Workbook wb = WorkbookFactory.create(inp);
+ Sheet sheet = wb.getSheetAt(0);
+ Row row = sheet.getRow(2);
+ Cell cell = row.getCell(3);
+ if (cell == null)
+ cell = row.createCell(3);
+ cell.setCellType(CellType.STRING);
+ cell.setCellValue("a test");
+
+ // Write the output to a file
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+ }
+ </source>
+ </section>
+ <anchor id="NewLinesInCells"/>
+ <section><title>Using newlines in cells</title>
+ <source>
+ Workbook wb = new XSSFWorkbook(); //or new HSSFWorkbook();
+ Sheet sheet = wb.createSheet();
+
+ Row row = sheet.createRow(2);
+ Cell cell = row.createCell(2);
+ cell.setCellValue("Use \n with word wrap on to create a new line");
+
+ //to enable newlines you need set a cell styles with wrap=true
+ CellStyle cs = wb.createCellStyle();
+ cs.setWrapText(true);
+ cell.setCellStyle(cs);
+
+ //increase row height to accommodate two lines of text
+ row.setHeightInPoints((2*sheet.getDefaultRowHeightInPoints()));
+
+ //adjust column width to fit the content
+ sheet.autoSizeColumn(2);
+
+ try (OutputStream fileOut = new FileOutputStream("ooxml-newlines.xlsx")) {
+ wb.write(fileOut);
+ }
+
+ wb.close();
+ </source>
+ </section>
+ <anchor id="DataFormats"/>
+ <section><title>Data Formats</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet("format sheet");
+ CellStyle style;
+ DataFormat format = wb.createDataFormat();
+ Row row;
+ Cell cell;
+ int rowNum = 0;
+ int colNum = 0;
+
+ row = sheet.createRow(rowNum++);
+ cell = row.createCell(colNum);
+ cell.setCellValue(11111.25);
+ style = wb.createCellStyle();
+ style.setDataFormat(format.getFormat("0.0"));
+ cell.setCellStyle(style);
+
+ row = sheet.createRow(rowNum++);
+ cell = row.createCell(colNum);
+ cell.setCellValue(11111.25);
+ style = wb.createCellStyle();
+ style.setDataFormat(format.getFormat("#,##0.0000"));
+ cell.setCellStyle(style);
+
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+
+ wb.close();
+ </source>
+ </section>
+ <anchor id="FitTo"/>
+ <section><title>Fit Sheet to One Page</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet("format sheet");
+ PrintSetup ps = sheet.getPrintSetup();
+
+ sheet.setAutobreaks(true);
+
+ ps.setFitHeight((short)1);
+ ps.setFitWidth((short)1);
+
+
+ // Create various cells and rows for spreadsheet.
+
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+
+ wb.close();
+ </source>
+ </section>
+ <anchor id="PrintArea2"/>
+ <section><title>Set Print Area</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet("Sheet1");
+ //sets the print area for the first sheet
+ wb.setPrintArea(0, "$A$1:$C$2");
+
+ //Alternatively:
+ wb.setPrintArea(
+ 0, //sheet index
+ 0, //start column
+ 1, //end column
+ 0, //start row
+ 0 //end row
+ );
+
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+
+ wb.close();
+ </source>
+ </section>
+
+ <anchor id="FooterPageNumbers"/>
+ <section><title>Set Page Numbers on Footer</title>
+ <source>
+ Workbook wb = new HSSFWorkbook(); // or new XSSFWorkbook();
+ Sheet sheet = wb.createSheet("format sheet");
+ Footer footer = sheet.getFooter();
+
+ footer.setRight( "Page " + HeaderFooter.page() + " of " + HeaderFooter.numPages() );
+
+
+
+ // Create various cells and rows for spreadsheet.
+
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+
+ wb.close();
+ </source>
+ </section>
+
+ <anchor id="ConvenienceFunctions"/>
+ <section><title>Using the Convenience Functions</title>
+ <p>
+ The convenience functions provide
+ utility features such as setting borders around merged
+ regions and changing style attributes without explicitly
+ creating new styles.
+ </p>
+ <source>
+ Workbook wb = new HSSFWorkbook(); // or new XSSFWorkbook()
+ Sheet sheet1 = wb.createSheet( "new sheet" );
+
+ // Create a merged region
+ Row row = sheet1.createRow( 1 );
+ Row row2 = sheet1.createRow( 2 );
+ Cell cell = row.createCell( 1 );
+ cell.setCellValue( "This is a test of merging" );
+ CellRangeAddress region = CellRangeAddress.valueOf("B2:E5");
+ sheet1.addMergedRegion( region );
+
+ // Set the border and border colors.
+ RegionUtil.setBorderBottom( BorderStyle.MEDIUM_DASHED, region, sheet1, wb );
+ RegionUtil.setBorderTop( BorderStyle.MEDIUM_DASHED, region, sheet1, wb );
+ RegionUtil.setBorderLeft( BorderStyle.MEDIUM_DASHED, region, sheet1, wb );
+ RegionUtil.setBorderRight( BorderStyle.MEDIUM_DASHED, region, sheet1, wb );
+ RegionUtil.setBottomBorderColor(IndexedColors.AQUA.getIndex(), region, sheet1, wb);
+ RegionUtil.setTopBorderColor( IndexedColors.AQUA.getIndex(), region, sheet1, wb);
+ RegionUtil.setLeftBorderColor( IndexedColors.AQUA.getIndex(), region, sheet1, wb);
+ RegionUtil.setRightBorderColor( IndexedColors.AQUA.getIndex(), region, sheet1, wb);
+
+ // Shows some usages of HSSFCellUtil
+ CellStyle style = wb.createCellStyle();
+ style.setIndention((short)4);
+ CellUtil.createCell(row, 8, "This is the value of the cell", style);
+ Cell cell2 = CellUtil.createCell( row2, 8, "This is the value of the cell");
+ CellUtil.setAlignment(cell2, HorizontalAlignment.CENTER);
+
+ // Write out the workbook
+ try (OutputStream fileOut = new FileOutputStream( "workbook.xls" )) {
+ wb.write( fileOut );
+ }
+
+ wb.close();
+ </source>
+ </section>
+
+ <anchor id="ShiftRows"/>
+ <section><title>Shift rows up or down on a sheet</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet("row sheet");
+
+ // Create various cells and rows for spreadsheet.
+
+ // Shift rows 6 - 11 on the spreadsheet to the top (rows 0 - 5)
+ sheet.shiftRows(5, 10, -5);
+
+ </source>
+ </section>
+
+ <anchor id="SelectSheet"/>
+ <section><title>Set a sheet as selected</title>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet("row sheet");
+ sheet.setSelected(true);
+
+ </source>
+ </section>
+
+ <anchor id="Zoom"/>
+ <section><title>Set the zoom magnification</title>
+ <p>
+ The zoom is expressed as a fraction. For example to
+ express a zoom of 75% use 3 for the numerator and
+ 4 for the denominator.
+ </p>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet1 = wb.createSheet("new sheet");
+ sheet1.setZoom(75); // 75 percent magnification
+ </source>
+ </section>
+
+ <anchor id="Splits"/>
+ <section><title>Splits and freeze panes</title>
+ <p>
+ There are two types of panes you can create; freeze panes and split panes.
+ </p>
+ <p>
+ A freeze pane is split by columns and rows. You create
+ a freeze pane using the following mechanism:
+ </p>
+ <p>
+ sheet1.createFreezePane( 3, 2, 3, 2 );
+ </p>
+ <p>
+ The first two parameters are the columns and rows you
+ wish to split by. The second two parameters indicate
+ the cells that are visible in the bottom right quadrant.
+ </p>
+ <p>
+
+ Split panes appear differently. The split area is
+ divided into four separate work area's. The split
+ occurs at the pixel level and the user is able to
+ adjust the split by dragging it to a new position.
+ </p>
+ <p>
+
+ Split panes are created with the following call:
+ </p>
+ <p>
+ sheet2.createSplitPane( 2000, 2000, 0, 0, Sheet.PANE_LOWER_LEFT );
+ </p>
+ <p>
+
+ The first parameter is the x position of the split.
+ This is in 1/20th of a point. A point in this case
+ seems to equate to a pixel. The second parameter is
+ the y position of the split. Again in 1/20th of a point.
+ </p>
+ <p>
+ The last parameter indicates which pane currently has
+ the focus. This will be one of Sheet.PANE_LOWER_LEFT,
+ PANE_LOWER_RIGHT, PANE_UPPER_RIGHT or PANE_UPPER_LEFT.
+ </p>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet1 = wb.createSheet("new sheet");
+ Sheet sheet2 = wb.createSheet("second sheet");
+ Sheet sheet3 = wb.createSheet("third sheet");
+ Sheet sheet4 = wb.createSheet("fourth sheet");
+
+ // Freeze just one row
+ sheet1.createFreezePane( 0, 1, 0, 1 );
+ // Freeze just one column
+ sheet2.createFreezePane( 1, 0, 1, 0 );
+ // Freeze the columns and rows (forget about scrolling position of the lower right quadrant).
+ sheet3.createFreezePane( 2, 2 );
+ // Create a split with the lower left side being the active quadrant
+ sheet4.createSplitPane( 2000, 2000, 0, 0, Sheet.PANE_LOWER_LEFT );
+
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+ </source>
+ </section>
+
+ <anchor id="Repeating"/>
+ <section><title>Repeating rows and columns</title>
+ <p>
+ It's possible to set up repeating rows and columns in
+ your printouts by using the setRepeatingRows() and
+ setRepeatingColumns() methods in the Sheet class.
+ </p>
+ <p>
+ These methods expect a CellRangeAddress parameter
+ which specifies the range for the rows or columns to
+ repeat.
+ For setRepeatingRows(), it should specify a range of
+ rows to repeat, with the column part spanning all
+ columns.
+ For setRepeatingColumns(), it should specify a range of
+ columns to repeat, with the row part spanning all
+ rows.
+ If the parameter is null, the repeating rows or columns
+ will be removed.
+ </p>
+ <source>
+ Workbook wb = new HSSFWorkbook(); // or new XSSFWorkbook();
+ Sheet sheet1 = wb.createSheet("Sheet1");
+ Sheet sheet2 = wb.createSheet("Sheet2");
+
+ // Set the rows to repeat from row 4 to 5 on the first sheet.
+ sheet1.setRepeatingRows(CellRangeAddress.valueOf("4:5"));
+ // Set the columns to repeat from column A to C on the second sheet
+ sheet2.setRepeatingColumns(CellRangeAddress.valueOf("A:C"));
+
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+ </source>
+ </section>
+ <anchor id="HeaderFooter"/>
+ <section><title>Headers and Footers</title>
+ <p>
+ Example is for headers but applies directly to footers.
+ </p>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet("new sheet");
+
+ Header header = sheet.getHeader();
+ header.setCenter("Center Header");
+ header.setLeft("Left Header");
+ header.setRight(HSSFHeader.font("Stencil-Normal", "Italic") +
+ HSSFHeader.fontSize((short) 16) + "Right w/ Stencil-Normal Italic font and size 16");
+
+ try (OutputStream fileOut = new FileOutputStream("workbook.xls")) {
+ wb.write(fileOut);
+ }
+ </source>
+ </section>
+ <anchor id="XSSFHeaderFooter"/>
+ <section><title>XSSF Enhancement for Headers and Footers</title>
+ <p>
+ Example is for headers but applies directly to footers. Note, the above example for
+ basic headers and footers applies to XSSF Workbooks as well as HSSF Workbooks. The HSSFHeader
+ stuff does not work for XSSF Workbooks.
+ </p>
+ <p>
+ XSSF has the ability to handle First page headers and footers, as well as Even/Odd
+ headers and footers. All Header/Footer Property flags can be handled in XSSF as well.
+ The odd header and footer is the default header and footer. It is displayed on all
+ pages that do not display either a first page header or an even page header. That is,
+ if the Even header/footer does not exist, then the odd header/footer is displayed on
+ even pages. If the first page header/footer does not exist, then the odd header/footer
+ is displayed on the first page. If the even/odd property is not set, that is the same as
+ the even header/footer not existing. If the first page property does not exist, that is
+ the same as the first page header/footer not existing.
+ </p>
+ <source>
+ Workbook wb = new XSSFWorkbook();
+ XSSFSheet sheet = (XSSFSheet) wb.createSheet("new sheet");
+
+ // Create a first page header
+ Header header = sheet.getFirstHeader();
+ header.setCenter("Center First Page Header");
+ header.setLeft("Left First Page Header");
+ header.setRight("Right First Page Header");
+
+ // Create an even page header
+ Header header2 = sheet.getEvenHeader();
+ header2.setCenter("Center Even Page Header");
+ header2.setLeft("Left Even Page Header");
+ header2.setRight("Right Even Page Header");
+
+ // Create an odd page header
+ Header header3 = sheet.getOddHeader();
+ header3.setCenter("Center Odd Page Header");
+ header3.setLeft("Left Odd Page Header");
+ header3.setRight("Right Odd Page Header");
+
+ // Set/Remove Header properties
+ XSSFHeaderProperties prop = sheet.getHeaderFooterProperties();
+ prop.setAlignWithMargins();
+ prop.scaleWithDoc();
+ prop.removeDifferentFirstPage(); // This does not remove first page headers or footers
+ prop.removeDifferentEvenOdd(); // This does not remove even headers or footers
+
+ try (OutputStream fileOut = new FileOutputStream("workbook.xlsx")) {
+ wb.write(fileOut);
+ }
+ </source>
+ </section>
+
+ <anchor id="DrawingShapes"/>
+ <section><title>Drawing Shapes</title>
+ <p>
+ POI supports drawing shapes using the Microsoft Office
+ drawing tools. Shapes on a sheet are organized in a
+ hierarchy of groups and and shapes. The top-most shape
+ is the patriarch. This is not visible on the sheet
+ at all. To start drawing you need to call <code>createPatriarch</code>
+ on the <code>HSSFSheet</code> class. This has the
+ effect erasing any other shape information stored
+ in that sheet. By default POI will leave shape
+ records alone in the sheet unless you make a call to
+ this method.
+ </p>
+ <p>
+ To create a shape you have to go through the following
+ steps:
+ </p>
+ <ol>
+ <li>Create the patriarch.</li>
+ <li>Create an anchor to position the shape on the sheet.</li>
+ <li>Ask the patriarch to create the shape.</li>
+ <li>Set the shape type (line, oval, rectangle etc...)</li>
+ <li>Set any other style details concerning the shape. (eg:
+ line thickness, etc...)</li>
+ </ol>
+ <source>
+ HSSFPatriarch patriarch = sheet.createDrawingPatriarch();
+ a = new HSSFClientAnchor( 0, 0, 1023, 255, (short) 1, 0, (short) 1, 0 );
+ HSSFSimpleShape shape1 = patriarch.createSimpleShape(a1);
+ shape1.setShapeType(HSSFSimpleShape.OBJECT_TYPE_LINE);
+ </source>
+ <p>
+ Text boxes are created using a different call:
+ </p>
+ <source>
+ HSSFTextbox textbox1 = patriarch.createTextbox(
+ new HSSFClientAnchor(0,0,0,0,(short)1,1,(short)2,2));
+ textbox1.setString(new HSSFRichTextString("This is a test") );
+ </source>
+ <p>
+ It's possible to use different fonts to style parts of
+ the text in the textbox. Here's how:
+ </p>
+ <source>
+ HSSFFont font = wb.createFont();
+ font.setItalic(true);
+ font.setUnderline(HSSFFont.U_DOUBLE);
+ HSSFRichTextString string = new HSSFRichTextString("Woo!!!");
+ string.applyFont(2,5,font);
+ textbox.setString(string );
+ </source>
+ <p>
+ Just as can be done manually using Excel, it is possible
+ to group shapes together. This is done by calling
+ <code>createGroup()</code> and then creating the shapes
+ using those groups.
+ </p>
+ <p>
+ It's also possible to create groups within groups.
+ </p>
+ <warning>Any group you create should contain at least two
+ other shapes or subgroups.</warning>
+ <p>
+ Here's how to create a shape group:
+ </p>
+ <source>
+ // Create a shape group.
+ HSSFShapeGroup group = patriarch.createGroup(
+ new HSSFClientAnchor(0,0,900,200,(short)2,2,(short)2,2));
+
+ // Create a couple of lines in the group.
+ HSSFSimpleShape shape1 = group.createShape(new HSSFChildAnchor(3,3,500,500));
+ shape1.setShapeType(HSSFSimpleShape.OBJECT_TYPE_LINE);
+ ( (HSSFChildAnchor) shape1.getAnchor() ).setAnchor(3,3,500,500);
+ HSSFSimpleShape shape2 = group.createShape(new HSSFChildAnchor(1,200,400,600));
+ shape2.setShapeType(HSSFSimpleShape.OBJECT_TYPE_LINE);
+ </source>
+ <p>
+ If you're being observant you'll noticed that the shapes
+ that are added to the group use a new type of anchor:
+ the <code>HSSFChildAnchor</code>. What happens is that
+ the created group has its own coordinate space for
+ shapes that are placed into it. POI defaults this to
+ (0,0,1023,255) but you are able to change it as desired.
+ Here's how:
+ </p>
+ <source>
+ myGroup.setCoordinates(10,10,20,20); // top-left, bottom-right
+ </source>
+ <p>
+ If you create a group within a group it's also going
+ to have its own coordinate space.
+ </p>
+ </section>
+
+ <anchor id="StylingShapes"/>
+ <section><title>Styling Shapes</title>
+ <p>
+ By default shapes can look a little plain. It's possible
+ to apply different styles to the shapes however. The
+ sorts of things that can currently be done are:
+ </p>
+ <ul>
+ <li>Change the fill color.</li>
+ <li>Make a shape with no fill color.</li>
+ <li>Change the thickness of the lines.</li>
+ <li>Change the style of the lines. Eg: dashed, dotted.</li>
+ <li>Change the line color.</li>
+ </ul>
+ <p>
+ Here's an examples of how this is done:
+ </p>
+ <source>
+ HSSFSimpleShape s = patriarch.createSimpleShape(a);
+ s.setShapeType(HSSFSimpleShape.OBJECT_TYPE_OVAL);
+ s.setLineStyleColor(10,10,10);
+ s.setFillColor(90,10,200);
+ s.setLineWidth(HSSFShape.LINEWIDTH_ONE_PT * 3);
+ s.setLineStyle(HSSFShape.LINESTYLE_DOTSYS);
+ </source>
+ </section>
+ <anchor id="Graphics2d"/>
+ <section><title>Shapes and Graphics2d</title>
+ <p>
+ While the native POI shape drawing commands are the
+ recommended way to draw shapes in a shape it's sometimes
+ desirable to use a standard API for compatibility with
+ external libraries. With this in mind we created some
+ wrappers for <code>Graphics</code> and <code>Graphics2d</code>.
+ </p>
+ <warning>
+ It's important to not however before continuing that
+ <code>Graphics2d</code> is a poor match to the capabilities
+ of the Microsoft Office drawing commands. The older
+ <code>Graphics</code> class offers a closer match but is
+ still a square peg in a round hole.
+ </warning>
+ <p>
+ All Graphics commands are issued into an <code>HSSFShapeGroup</code>.
+ Here's how it's done:
+ </p>
+ <source>
+ a = new HSSFClientAnchor( 0, 0, 1023, 255, (short) 1, 0, (short) 1, 0 );
+ group = patriarch.createGroup( a );
+ group.setCoordinates( 0, 0, 80 * 4 , 12 * 23 );
+ float verticalPointsPerPixel = a.getAnchorHeightInPoints(sheet) / (float)Math.abs(group.getY2() - group.getY1());
+ g = new EscherGraphics( group, wb, Color.black, verticalPointsPerPixel );
+ g2d = new EscherGraphics2d( g );
+ drawChemicalStructure( g2d );
+ </source>
+ <p>
+ The first thing we do is create the group and set its coordinates
+ to match what we plan to draw. Next we calculate a reasonable
+ fontSizeMultiplier then create the EscherGraphics object.
+ Since what we really want is a <code>Graphics2d</code>
+ object we create an EscherGraphics2d object and pass in
+ the graphics object we created. Finally we call a routine
+ that draws into the EscherGraphics2d object.
+ </p>
+ <p>
+ The vertical points per pixel deserves some more explanation.
+ One of the difficulties in converting Graphics calls
+ into escher drawing calls is that Excel does not have
+ the concept of absolute pixel positions. It measures
+ its cell widths in 'characters' and the cell heights in points.
+ Unfortunately it's not defined exactly what type of character it's
+ measuring. Presumably this is due to the fact that the Excel will be
+ using different fonts on different platforms or even within the same
+ platform.
+ </p>
+ <p>
+ Because of this constraint we've had to implement the concept of a
+ verticalPointsPerPixel. This the amount the font should be scaled by when
+ you issue commands such as drawString(). To calculate this value
+ use the follow formula:
+ </p>
+ <source>
+ multipler = groupHeightInPoints / heightOfGroup
+ </source>
+ <p>
+ The height of the group is calculated fairly simply by calculating the
+ difference between the y coordinates of the bounding box of the shape. The
+ height of the group can be calculated by using a convenience called
+ <code>HSSFClientAnchor.getAnchorHeightInPoints()</code>.
+ </p>
+ <p>
+ Many of the functions supported by the graphics classes
+ are not complete. Here's some of the functions that are known
+ to work.
+ </p>
+ <ul>
+ <li>fillRect()</li>
+ <li>fillOval()</li>
+ <li>drawString()</li>
+ <li>drawOval()</li>
+ <li>drawLine()</li>
+ <li>clearRect()</li>
+ </ul>
+ <p>
+ Functions that are not supported will return and log a message
+ using the POI logging infrastructure (disabled by default).
+ </p>
+ </section>
+ <anchor id="Outlining"/>
+ <section>
+ <title>Outlining</title>
+ <p>
+ Outlines are great for grouping sections of information
+ together and can be added easily to columns and rows
+ using the POI API. Here's how:
+ </p>
+ <source>
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet1 = wb.createSheet("new sheet");
+
+ sheet1.groupRow( 5, 14 );
+ sheet1.groupRow( 7, 14 );
+ sheet1.groupRow( 16, 19 );
+
+ sheet1.groupColumn( 4, 7 );
+ sheet1.groupColumn( 9, 12 );
+ sheet1.groupColumn( 10, 11 );
+
+ try (OutputStream fileOut = new FileOutputStream(filename)) {
+ wb.write(fileOut);
+ }
+ </source>
+ <p>
+ To collapse (or expand) an outline use the following calls:
+ </p>
+ <source>
+ sheet1.setRowGroupCollapsed( 7, true );
+ sheet1.setColumnGroupCollapsed( 4, true );
+ </source>
+ <p>
+ The row/column you choose should contain an already
+ created group. It can be anywhere within the group.
+ </p>
+ </section>
+ </section>
+ </section>
+ <anchor id="Images"/>
+ <section>
+ <title>Images</title>
+ <p>
+ Images are part of the drawing support. To add an image just
+ call <code>createPicture()</code> on the drawing patriarch.
+ At the time of writing the following types are supported:
+ </p>
+ <ul>
+ <li>PNG</li>
+ <li>JPG</li>
+ <li>DIB</li>
+ </ul>
+ <p>
+ It should be noted that any existing drawings may be erased
+ once you add an image to a sheet.
+ </p>
+ <source>
+ //create a new workbook
+ Workbook wb = new XSSFWorkbook(); //or new HSSFWorkbook();
+
+ //add picture data to this workbook.
+ InputStream is = new FileInputStream("image1.jpeg");
+ byte[] bytes = IOUtils.toByteArray(is);
+ int pictureIdx = wb.addPicture(bytes, Workbook.PICTURE_TYPE_JPEG);
+ is.close();
+
+ CreationHelper helper = wb.getCreationHelper();
+
+ //create sheet
+ Sheet sheet = wb.createSheet();
+
+ // Create the drawing patriarch. This is the top level container for all shapes.
+ Drawing drawing = sheet.createDrawingPatriarch();
+
+ //add a picture shape
+ ClientAnchor anchor = helper.createClientAnchor();
+ //set top-left corner of the picture,
+ //subsequent call of Picture#resize() will operate relative to it
+ anchor.setCol1(3);
+ anchor.setRow1(2);
+ Picture pict = drawing.createPicture(anchor, pictureIdx);
+
+ //auto-size picture relative to its top-left corner
+ pict.resize();
+
+ //save workbook
+ String file = "picture.xls";
+ if(wb instanceof XSSFWorkbook) file += "x";
+ try (OutputStream fileOut = new FileOutputStream(file)) {
+ wb.write(fileOut);
+ }
+ </source>
+ <warning>
+ Picture.resize() works only for JPEG and PNG. Other formats are not yet supported.
+ </warning>
+ <p>Reading images from a workbook:</p>
+ <source>
+
+ List lst = workbook.getAllPictures();
+ for (Iterator it = lst.iterator(); it.hasNext(); ) {
+ PictureData pict = (PictureData)it.next();
+ String ext = pict.suggestFileExtension();
+ byte[] data = pict.getData();
+ if (ext.equals("jpeg")){
+ try (OutputStream out = new FileOutputStream("pict.jpg")) {
+ out.write(data);
+ }
+ }
+ }
+ </source>
+ </section>
+ <anchor id="NamedRanges"/>
+ <section>
+ <title>Named Ranges and Named Cells</title>
+ <p>
+ Named Range is a way to refer to a group of cells by a name. Named Cell is a
+ degenerate case of Named Range in that the 'group of cells' contains exactly one
+ cell. You can create as well as refer to cells in a workbook by their named range.
+ When working with Named Ranges, the classes <code>org.apache.poi.ss.util.CellReference</code>
+ and <code>org.apache.poi.ss.util.AreaReference</code> are used.
+ </p>
+ <p>
+ Note: Using relative values like 'A1:B1' can lead to unexpected moving of
+ the cell that the name points to when working with the workbook in Microsoft Excel,
+ usually using absolute references like '$A$1:$B$1' avoids this, see also
+ <a href="https://superuser.com/a/1031047/126954">this discussion</a>.
+ </p>
+ <p>
+ Creating Named Range / Named Cell
+ </p>
+ <source>
+ // setup code
+ String sname = "TestSheet", cname = "TestName", cvalue = "TestVal";
+ Workbook wb = new HSSFWorkbook();
+ Sheet sheet = wb.createSheet(sname);
+ sheet.createRow(0).createCell(0).setCellValue(cvalue);
+
+ // 1. create named range for a single cell using areareference
+ Name namedCell = wb.createName();
+ namedCell.setNameName(cname + "1");
+ String reference = sname+"!$A$1:$A$1"; // area reference
+ namedCell.setRefersToFormula(reference);
+
+ // 2. create named range for a single cell using cellreference
+ Name namedCel2 = wb.createName();
+ namedCel2.setNameName(cname + "2");
+ reference = sname+"!$A$1"; // cell reference
+ namedCel2.setRefersToFormula(reference);
+
+ // 3. create named range for an area using AreaReference
+ Name namedCel3 = wb.createName();
+ namedCel3.setNameName(cname + "3");
+ reference = sname+"!$A$1:$C$5"; // area reference
+ namedCel3.setRefersToFormula(reference);
+
+ // 4. create named formula
+ Name namedCel4 = wb.createName();
+ namedCel4.setNameName("my_sum");
+ namedCel4.setRefersToFormula("SUM(" + sname + "!$I$2:$I$6)");
+ </source>
+ <p>
+ Reading from Named Range / Named Cell
+ </p>
+ <source>
+ // setup code
+ String cname = "TestName";
+ Workbook wb = getMyWorkbook(); // retrieve workbook
+
+ // retrieve the named range
+ int namedCellIdx = wb.getNameIndex(cellName);
+ Name aNamedCell = wb.getNameAt(namedCellIdx);
+
+ // retrieve the cell at the named range and test its contents
+ AreaReference aref = new AreaReference(aNamedCell.getRefersToFormula());
+ CellReference[] crefs = aref.getAllReferencedCells();
+ for (int i=0; i&lt;crefs.length; i++) {
+ Sheet s = wb.getSheet(crefs[i].getSheetName());
+ Row r = sheet.getRow(crefs[i].getRow());
+ Cell c = r.getCell(crefs[i].getCol());
+ // extract the cell contents based on cell type etc.
+ }
+ </source>
+ <p>
+ Reading from non-contiguous Named Ranges
+ </p>
+ <source>
+ // Setup code
+ String cname = "TestName";
+ Workbook wb = getMyWorkbook(); // retrieve workbook
+
+ // Retrieve the named range
+ // Will be something like "$C$10,$D$12:$D$14";
+ int namedCellIdx = wb.getNameIndex(cellName);
+ Name aNamedCell = wb.getNameAt(namedCellIdx);
+
+ // Retrieve the cell at the named range and test its contents
+ // Will get back one AreaReference for C10, and
+ // another for D12 to D14
+ AreaReference[] arefs = AreaReference.generateContiguous(aNamedCell.getRefersToFormula());
+ for (int i=0; i&lt;arefs.length; i++) {
+ // Only get the corners of the Area
+ // (use arefs[i].getAllReferencedCells() to get all cells)
+ CellReference[] crefs = arefs[i].getCells();
+ for (int j=0; j&lt;crefs.length; j++) {
+ // Check it turns into real stuff
+ Sheet s = wb.getSheet(crefs[j].getSheetName());
+ Row r = s.getRow(crefs[j].getRow());
+ Cell c = r.getCell(crefs[j].getCol());
+ // Do something with this corner cell
+ }
+ }
+ </source>
+ <p>
+ Note, when a cell is deleted, Excel does not delete the
+ attached named range. As result, workbook can contain
+ named ranges that point to cells that no longer exist.
+ You should check the validity of a reference before
+ constructing AreaReference
+ </p>
+ <source>
+ if(name.isDeleted()){
+ //named range points to a deleted cell.
+ } else {
+ AreaReference ref = new AreaReference(name.getRefersToFormula());
+ }
+ </source>
+ </section>
+ <anchor id="CellComments"/>
+ <section><title>Cell Comments - HSSF and XSSF</title>
+ <p>
+ A comment is a rich text note that is attached to &amp;
+ associated with a cell, separate from other cell content.
+ Comment content is stored separate from the cell, and is displayed in a drawing object (like a text box)
+ that is separate from, but associated with, a cell
+ </p>
+ <source>
+ Workbook wb = new XSSFWorkbook(); //or new HSSFWorkbook();
+
+ CreationHelper factory = wb.getCreationHelper();
+
+ Sheet sheet = wb.createSheet();
+
+ Row row = sheet.createRow(3);
+ Cell cell = row.createCell(5);
+ cell.setCellValue("F4");
+
+ Drawing drawing = sheet.createDrawingPatriarch();
+
+ // When the comment box is visible, have it show in a 1x3 space
+ ClientAnchor anchor = factory.createClientAnchor();
+ anchor.setCol1(cell.getColumnIndex());
+ anchor.setCol2(cell.getColumnIndex()+1);
+ anchor.setRow1(row.getRowNum());
+ anchor.setRow2(row.getRowNum()+3);
+
+ // Create the comment and set the text+author
+ Comment comment = drawing.createCellComment(anchor);
+ RichTextString str = factory.createRichTextString("Hello, World!");
+ comment.setString(str);
+ comment.setAuthor("Apache POI");
+
+ // Assign the comment to the cell
+ cell.setCellComment(comment);
+
+ String fname = "comment-xssf.xls";
+ if(wb instanceof XSSFWorkbook) fname += "x";
+ try (OutputStream out = new FileOutputStream(fname)) {
+ wb.write(out);
+ }
+
+ wb.close();
+ </source>
+ <p>
+ Reading cell comments
+ </p>
+ <source>
+ Cell cell = sheet.get(3).getColumn(1);
+ Comment comment = cell.getCellComment();
+ if (comment != null) {
+ RichTextString str = comment.getString();
+ String author = comment.getAuthor();
+ }
+ // alternatively you can retrieve cell comments by (row, column)
+ comment = sheet.getCellComment(3, 1);
+ </source>
+
+ <p>To get all the comments on a sheet:</p>
+ <source>
+ Map&lt;CellAddress, Comment&gt; comments = sheet.getCellComments();
+ Comment commentA1 = comments.get(new CellAddress(0, 0));
+ Comment commentB1 = comments.get(new CellAddress(0, 1));
+ for (Entry&lt;CellAddress, ? extends Comment&gt; e : comments.entrySet()) {
+ CellAddress loc = e.getKey();
+ Comment comment = e.getValue();
+ System.out.println("Comment at " + loc + ": " +
+ "[" + comment.getAuthor() + "] " + comment.getString().getString());
+ }
+ </source>
+ </section>
+
+ <anchor id="Autofit"/>
+ <section><title>Adjust column width to fit the contents</title>
+ <source>
+ Sheet sheet = workbook.getSheetAt(0);
+ sheet.autoSizeColumn(0); //adjust width of the first column
+ sheet.autoSizeColumn(1); //adjust width of the second column
+ </source>
+ <p>
+ For SXSSFWorkbooks only, because the random access window is likely to exclude most of the rows
+ in the worksheet, which are needed for computing the best-fit width of a column, the columns must
+ be tracked for auto-sizing prior to flushing any rows.
+ </p>
+ <source>
+ SXSSFWorkbook workbook = new SXSSFWorkbook();
+ SXSSFSheet sheet = workbook.createSheet();
+ sheet.trackColumnForAutoSizing(0);
+ sheet.trackColumnForAutoSizing(1);
+ // If you have a Collection of column indices, see SXSSFSheet#trackColumnForAutoSizing(Collection&lt;Integer&gt;)
+ // or roll your own for-loop.
+ // Alternatively, use SXSSFSheet#trackAllColumnsForAutoSizing() if the columns that will be auto-sized aren't
+ // known in advance or you are upgrading existing code and are trying to minimize changes. Keep in mind
+ // that tracking all columns will require more memory and CPU cycles, as the best-fit width is calculated
+ // on all tracked columns on every row that is flushed.
+
+ // create some cells
+ for (int r=0; r &lt; 10; r++) {
+ Row row = sheet.createRow(r);
+ for (int c; c &lt; 10; c++) {
+ Cell cell = row.createCell(c);
+ cell.setCellValue("Cell " + c.getAddress().formatAsString());
+ }
+ }
+
+ // Auto-size the columns.
+ sheet.autoSizeColumn(0);
+ sheet.autoSizeColumn(1);
+ </source>
+ <p>
+ Note, that Sheet#autoSizeColumn() does not evaluate formula cells,
+ the width of formula cells is calculated based on the cached formula result.
+ If your workbook has many formulas then it is a good idea to evaluate them before auto-sizing.
+ </p>
+ <warning>
+ To calculate column width Sheet.autoSizeColumn uses Java2D classes
+ that throw exception if graphical environment is not available. In case if graphical environment
+ is not available, you must tell Java that you are running in headless mode and
+ set the following system property: <code> java.awt.headless=true </code>.
+ You should also ensure that the fonts you use in your workbook are
+ available to Java.
+ </warning>
+ </section>
+ <anchor id="Hyperlinks"/>
+ <section><title>How to read hyperlinks</title>
+ <source>
+ Sheet sheet = workbook.getSheetAt(0);
+
+ Cell cell = sheet.getRow(0).getCell(0);
+ Hyperlink link = cell.getHyperlink();
+ if(link != null){
+ System.out.println(link.getAddress());
+ }
+ </source>
+ </section>
+ <section><title>How to create hyperlinks</title>
+ <source>
+ Workbook wb = new XSSFWorkbook(); //or new HSSFWorkbook();
+ CreationHelper createHelper = wb.getCreationHelper();
+
+ //cell style for hyperlinks
+ //by default hyperlinks are blue and underlined
+ CellStyle hlink_style = wb.createCellStyle();
+ Font hlink_font = wb.createFont();
+ hlink_font.setUnderline(Font.U_SINGLE);
+ hlink_font.setColor(IndexedColors.BLUE.getIndex());
+ hlink_style.setFont(hlink_font);
+
+ Cell cell;
+ Sheet sheet = wb.createSheet("Hyperlinks");
+ //URL
+ cell = sheet.createRow(0).createCell(0);
+ cell.setCellValue("URL Link");
+
+ Hyperlink link = createHelper.createHyperlink(HyperlinkType.URL);
+ link.setAddress("https://poi.apache.org/");
+ cell.setHyperlink(link);
+ cell.setCellStyle(hlink_style);
+
+ //link to a file in the current directory
+ cell = sheet.createRow(1).createCell(0);
+ cell.setCellValue("File Link");
+ link = createHelper.createHyperlink(HyperlinkType.FILE);
+ link.setAddress("link1.xls");
+ cell.setHyperlink(link);
+ cell.setCellStyle(hlink_style);
+
+ //e-mail link
+ cell = sheet.createRow(2).createCell(0);
+ cell.setCellValue("Email Link");
+ link = createHelper.createHyperlink(HyperlinkType.EMAIL);
+ //note, if subject contains white spaces, make sure they are url-encoded
+ link.setAddress("mailto:poi@apache.org?subject=Hyperlinks");
+ cell.setHyperlink(link);
+ cell.setCellStyle(hlink_style);
+
+ //link to a place in this workbook
+
+ //create a target sheet and cell
+ Sheet sheet2 = wb.createSheet("Target Sheet");
+ sheet2.createRow(0).createCell(0).setCellValue("Target Cell");
+
+ cell = sheet.createRow(3).createCell(0);
+ cell.setCellValue("Worksheet Link");
+ Hyperlink link2 = createHelper.createHyperlink(HyperlinkType.DOCUMENT);
+ link2.setAddress("'Target Sheet'!A1");
+ cell.setHyperlink(link2);
+ cell.setCellStyle(hlink_style);
+
+ try (OutputStream out = new FileOutputStream("hyperinks.xlsx")) {
+ wb.write(out);
+ }
+
+ wb.close();
+ </source>
+ </section>
+ <anchor id="Validation"/>
+ <section><title>Data Validations</title>
+ <p>
+ As of version 3.8, POI has slightly different syntax to work with data validations with .xls and .xlsx formats.
+ </p>
+ <section>
+ <title>hssf.usermodel (binary .xls format)</title>
+ <p><strong>Check the value a user enters into a cell against one or more predefined value(s).</strong></p>
+ <p>The following code will limit the value the user can enter into cell A1 to one of three integer values, 10, 20 or 30.</p>
+ <source>
+ HSSFWorkbook workbook = new HSSFWorkbook();
+ HSSFSheet sheet = workbook.createSheet("Data Validation");
+ CellRangeAddressList addressList = new CellRangeAddressList(
+ 0, 0, 0, 0);
+ DVConstraint dvConstraint = DVConstraint.createExplicitListConstraint(
+ new String[]{"10", "20", "30"});
+ DataValidation dataValidation = new HSSFDataValidation
+ (addressList, dvConstraint);
+ dataValidation.setSuppressDropDownArrow(true);
+ sheet.addValidationData(dataValidation);
+ </source>
+ <p><strong> Drop Down Lists:</strong></p>
+ <p>This code will do the same but offer the user a drop down list to select a value from.</p>
+ <source>
+ HSSFWorkbook workbook = new HSSFWorkbook();
+ HSSFSheet sheet = workbook.createSheet("Data Validation");
+ CellRangeAddressList addressList = new CellRangeAddressList(
+ 0, 0, 0, 0);
+ DVConstraint dvConstraint = DVConstraint.createExplicitListConstraint(
+ new String[]{"10", "20", "30"});
+ DataValidation dataValidation = new HSSFDataValidation
+ (addressList, dvConstraint);
+ dataValidation.setSuppressDropDownArrow(false);
+ sheet.addValidationData(dataValidation);
+ </source>
+ <p><strong>Messages On Error:</strong></p>
+ <p>To create a message box that will be shown to the user if the value they enter is invalid.</p>
+ <source>
+ dataValidation.setErrorStyle(DataValidation.ErrorStyle.STOP);
+ dataValidation.createErrorBox("Box Title", "Message Text");
+ </source>
+ <p>Replace 'Box Title' with the text you wish to display in the message box's title bar
+ and 'Message Text' with the text of your error message.</p>
+ <p><strong>Prompts:</strong></p>
+ <p>To create a prompt that the user will see when the cell containing the data validation receives focus</p>
+ <source>
+ dataValidation.createPromptBox("Title", "Message Text");
+ dataValidation.setShowPromptBox(true);
+ </source>
+ <p>The text encapsulated in the first parameter passed to the createPromptBox() method will appear emboldened
+ and as a title to the prompt whilst the second will be displayed as the text of the message.
+ The createExplicitListConstraint() method can be passed and array of String(s) containing interger, floating point, dates or text values.</p>
+
+ <p><strong>Further Data Validations:</strong></p>
+ <p>To obtain a validation that would check the value entered was, for example, an integer between 10 and 100,
+ use the DVConstraint.createNumericConstraint(int, int, String, String) factory method.</p>
+ <source>
+ dvConstraint = DVConstraint.createNumericConstraint(
+ DVConstraint.ValidationType.INTEGER,
+ DVConstraint.OperatorType.BETWEEN, "10", "100");
+ </source>
+ <p>Look at the javadoc for the other validation and operator types; also note that not all validation
+ types are supported for this method. The values passed to the two String parameters can be formulas; the '=' symbol is used to denote a formula</p>
+ <source>
+ dvConstraint = DVConstraint.createNumericConstraint(
+ DVConstraint.ValidationType.INTEGER,
+ DVConstraint.OperatorType.BETWEEN, "=SUM(A1:A3)", "100");
+ </source>
+ <p>It is not possible to create a drop down list if the createNumericConstraint() method is called,
+ the setSuppressDropDownArrow(false) method call will simply be ignored.</p>
+ <p>Date and time constraints can be created by calling the createDateConstraint(int, String, String, String)
+ or the createTimeConstraint(int, String, String). Both are very similar to the above and are explained in the javadoc. </p>
+ <p><strong>Creating Data Validations From Spreadsheet Cells.</strong></p>
+ <p>The contents of specific cells can be used to provide the values for the data validation
+ and the DVConstraint.createFormulaListConstraint(String) method supports this.
+ To specify that the values come from a contiguous range of cells do either of the following:</p>
+ <source>
+ dvConstraint = DVConstraint.createFormulaListConstraint("$A$1:$A$3");
+ </source>
+ <p>or</p>
+ <source>
+ Name namedRange = workbook.createName();
+ namedRange.setNameName("list1");
+ namedRange.setRefersToFormula("$A$1:$A$3");
+ dvConstraint = DVConstraint.createFormulaListConstraint("list1");
+ </source>
+ <p>and in both cases the user will be able to select from a drop down list containing the values from cells A1, A2 and A3.</p>
+ <p>The data does not have to be as the data validation. To select the data from a different sheet however, the sheet
+ must be given a name when created and that name should be used in the formula. So assuming the existence of a sheet named 'Data Sheet' this will work:</p>
+ <source>
+ Name namedRange = workbook.createName();
+ namedRange.setNameName("list1");
+ namedRange.setRefersToFormula("'Data Sheet'!$A$1:$A$3");
+ dvConstraint = DVConstraint.createFormulaListConstraint("list1");
+ </source>
+ <p>as will this:</p>
+ <source>
+ dvConstraint = DVConstraint.createFormulaListConstraint("'Data Sheet'!$A$1:$A$3");
+ </source>
+ <p>whilst this will not:</p>
+ <source>
+ Name namedRange = workbook.createName();
+ namedRange.setNameName("list1");
+ namedRange.setRefersToFormula("'Sheet1'!$A$1:$A$3");
+ dvConstraint = DVConstraint.createFormulaListConstraint("list1");
+ </source><p>and nor will this:</p><source>
+ dvConstraint = DVConstraint.createFormulaListConstraint("'Sheet1'!$A$1:$A$3");
+ </source>
+ </section>
+ <section>
+ <title>xssf.usermodel (.xlsx format)</title>
+<p>
+Data validations work similarly when you are creating an xml based, SpreadsheetML,
+workbook file; but there are differences. Explicit casts are required, for example,
+in a few places as much of the support for data validations in the xssf stream was
+built into the unifying ss stream, of which more later. Other differences are
+noted with comments in the code.
+</p>
+
+<p><strong>Check the value the user enters into a cell against one or more predefined value(s).</strong></p>
+<source>
+ XSSFWorkbook workbook = new XSSFWorkbook();
+ XSSFSheet sheet = workbook.createSheet("Data Validation");
+ XSSFDataValidationHelper dvHelper = new XSSFDataValidationHelper(sheet);
+ XSSFDataValidationConstraint dvConstraint = (XSSFDataValidationConstraint)
+ dvHelper.createExplicitListConstraint(new String[]{"11", "21", "31"});
+ CellRangeAddressList addressList = new CellRangeAddressList(0, 0, 0, 0);
+ XSSFDataValidation validation =(XSSFDataValidation)dvHelper.createValidation(
+ dvConstraint, addressList);
+
+ // Here the boolean value false is passed to the setSuppressDropDownArrow()
+ // method. In the hssf.usermodel examples above, the value passed to this
+ // method is true.
+ validation.setSuppressDropDownArrow(false);
+
+ // Note this extra method call. If this method call is omitted, or if the
+ // boolean value false is passed, then Excel will not validate the value the
+ // user enters into the cell.
+ validation.setShowErrorBox(true);
+ sheet.addValidationData(validation);
+</source>
+
+<p><strong>Drop Down Lists:</strong></p>
+<p>This code will do the same but offer the user a drop down list to select a value from.</p>
+<source>
+ XSSFWorkbook workbook = new XSSFWorkbook();
+ XSSFSheet sheet = workbook.createSheet("Data Validation");
+ XSSFDataValidationHelper dvHelper = new XSSFDataValidationHelper(sheet);
+ XSSFDataValidationConstraint dvConstraint = (XSSFDataValidationConstraint)
+ dvHelper.createExplicitListConstraint(new String[]{"11", "21", "31"});
+ CellRangeAddressList addressList = new CellRangeAddressList(0, 0, 0, 0);
+ XSSFDataValidation validation = (XSSFDataValidation)dvHelper.createValidation(
+ dvConstraint, addressList);
+ validation.setShowErrorBox(true);
+ sheet.addValidationData(validation);
+</source>
+<p>Note that the call to the setSuppressDropDowmArrow() method can either be simply excluded or replaced with:</p>
+<source>
+ validation.setSuppressDropDownArrow(true);
+</source>
+
+<p><strong>Prompts and Error Messages:</strong></p>
+<p>
+These both exactly mirror the hssf.usermodel so please refer to the 'Messages On Error:' and 'Prompts:' sections above.
+</p>
+
+<p><strong>Further Data Validations:</strong></p>
+<p>
+To obtain a validation that would check the value entered was, for example,
+an integer between 10 and 100, use the XSSFDataValidationHelper(s) createNumericConstraint(int, int, String, String) factory method.
+</p>
+<source>
+
+ XSSFDataValidationConstraint dvConstraint = (XSSFDataValidationConstraint)
+ dvHelper.createNumericConstraint(
+ XSSFDataValidationConstraint.ValidationType.INTEGER,
+ XSSFDataValidationConstraint.OperatorType.BETWEEN,
+ "10", "100");
+</source>
+<p>
+The values passed to the final two String parameters can be formulas; the '=' symbol is used to denote a formula.
+Thus, the following would create a validation the allows values only if they fall between the results of summing two cell ranges
+</p>
+<source>
+ XSSFDataValidationConstraint dvConstraint = (XSSFDataValidationConstraint)
+ dvHelper.createNumericConstraint(
+ XSSFDataValidationConstraint.ValidationType.INTEGER,
+ XSSFDataValidationConstraint.OperatorType.BETWEEN,
+ "=SUM(A1:A10)", "=SUM(B24:B27)");
+</source>
+<p>
+It is not possible to create a drop down list if the createNumericConstraint() method is called,
+the setSuppressDropDownArrow(true) method call will simply be ignored.
+</p>
+<p>
+Please check the javadoc for other constraint types as examples for those will not be included here.
+There are, for example, methods defined on the XSSFDataValidationHelper class allowing you to create
+the following types of constraint; date, time, decimal, integer, numeric, formula, text length and custom constraints.
+</p>
+<p><strong>Creating Data Validations From Spread Sheet Cells:</strong></p>
+<p>
+One other type of constraint not mentioned above is the formula list constraint.
+It allows you to create a validation that takes it value(s) from a range of cells. This code
+</p>
+<source>
+XSSFDataValidationConstraint dvConstraint = (XSSFDataValidationConstraint)
+ dvHelper.createFormulaListConstraint("$A$1:$F$1");
+</source>
+
+<p>
+would create a validation that took it's values from cells in the range A1 to F1.
+</p>
+<p>
+The usefulness of this technique can be extended if you use named ranges like this;
+</p>
+
+<source>
+ XSSFName name = workbook.createName();
+ name.setNameName("data");
+ name.setRefersToFormula("$B$1:$F$1");
+ XSSFDataValidationHelper dvHelper = new XSSFDataValidationHelper(sheet);
+ XSSFDataValidationConstraint dvConstraint = (XSSFDataValidationConstraint)
+ dvHelper.createFormulaListConstraint("data");
+ CellRangeAddressList addressList = new CellRangeAddressList(
+ 0, 0, 0, 0);
+ XSSFDataValidation validation = (XSSFDataValidation)
+ dvHelper.createValidation(dvConstraint, addressList);
+ validation.setSuppressDropDownArrow(true);
+ validation.setShowErrorBox(true);
+ sheet.addValidationData(validation);
+</source>
+<p>
+OpenOffice Calc has slightly different rules with regard to the scope of names.
+Excel supports both Workbook and Sheet scope for a name but Calc does not, it seems only to support Sheet scope for a name.
+Thus it is often best to fully qualify the name for the region or area something like this;
+</p>
+<source>
+ XSSFName name = workbook.createName();
+ name.setNameName("data");
+ name.setRefersToFormula("'Data Validation'!$B$1:$F$1");
+ ....
+</source>
+<p>
+This does open a further, interesting opportunity however and that is to place all of the data for the validation(s) into named ranges of cells on a hidden sheet within the workbook. These ranges can then be explicitly identified in the setRefersToFormula() method argument.
+</p>
+ </section>
+ <section><title>ss.usermodel</title>
+<p>
+The classes within the ss.usermodel package allow developers to create code that can be used
+to generate both binary (.xls) and SpreadsheetML (.xlsx) workbooks.
+</p>
+<p>
+The techniques used to create data validations share much in common with the xssf.usermodel examples above.
+As a result just one or two examples will be presented here.
+</p>
+<p><strong>Check the value the user enters into a cell against one or more predefined value(s).</strong></p>
+<source>
+ Workbook workbook = new XSSFWorkbook(); // or new HSSFWorkbook
+ Sheet sheet = workbook.createSheet("Data Validation");
+ DataValidationHelper dvHelper = sheet.getDataValidationHelper();
+ DataValidationConstraint dvConstraint = dvHelper.createExplicitListConstraint(
+ new String[]{"13", "23", "33"});
+ CellRangeAddressList addressList = new CellRangeAddressList(0, 0, 0, 0);
+ DataValidation validation = dvHelper.createValidation(
+ dvConstraint, addressList);
+ // Note the check on the actual type of the DataValidation object.
+ // If it is an instance of the XSSFDataValidation class then the
+ // boolean value 'false' must be passed to the setSuppressDropDownArrow()
+ // method and an explicit call made to the setShowErrorBox() method.
+ if(validation instanceof XSSFDataValidation) {
+ validation.setSuppressDropDownArrow(false);
+ validation.setShowErrorBox(true);
+ }
+ else {
+ // If the Datavalidation contains an instance of the HSSFDataValidation
+ // class then 'true' should be passed to the setSuppressDropDownArrow()
+ // method and the call to setShowErrorBox() is not necessary.
+ validation.setSuppressDropDownArrow(true);
+ }
+ sheet.addValidationData(validation);
+</source>
+
+<p><strong>Drop Down Lists:</strong></p>
+
+<p>This code will do the same but offer the user a drop down list to select a value from.</p>
+
+<source>
+ Workbook workbook = new XSSFWorkbook(); // or new HSSFWorkbook
+ Sheet sheet = workbook.createSheet("Data Validation");
+ DataValidationHelper dvHelper = sheet.getDataValidationHelper();
+ DataValidationConstraint dvConstraint = dvHelper.createExplicitListConstraint(
+ new String[]{"13", "23", "33"});
+ CellRangeAddressList addressList = new CellRangeAddressList(0, 0, 0, 0);
+ DataValidation validation = dvHelper.createValidation(
+ dvConstraint, addressList);
+ // Note the check on the actual type of the DataValidation object.
+ // If it is an instance of the XSSFDataValidation class then the
+ // boolean value 'false' must be passed to the setSuppressDropDownArrow()
+ // method and an explicit call made to the setShowErrorBox() method.
+ if(validation instanceof XSSFDataValidation) {
+ validation.setSuppressDropDownArrow(true);
+ validation.setShowErrorBox(true);
+ }
+ else {
+ // If the Datavalidation contains an instance of the HSSFDataValidation
+ // class then 'true' should be passed to the setSuppressDropDownArrow()
+ // method and the call to setShowErrorBox() is not necessary.
+ validation.setSuppressDropDownArrow(false);
+ }
+ sheet.addValidationData(validation);
+</source>
+
+<p><strong>Prompts and Error Messages:</strong></p>
+<p>
+These both exactly mirror the hssf.usermodel so please refer to the 'Messages On Error:' and 'Prompts:' sections above.
+</p>
+<p>
+As the differences between the ss.usermodel and xssf.usermodel examples are small -
+restricted largely to the way the DataValidationHelper is obtained, the lack of any
+need to explicitly cast data types and the small difference in behaviour between
+the hssf and xssf interpretation of the setSuppressDropDowmArrow() method,
+no further examples will be included in this section.
+</p>
+<p><strong>Advanced Data Validations.</strong></p>
+<p><strong>Dependent Drop Down Lists.</strong></p>
+<p>
+In some cases, it may be necessary to present to the user a sheet which contains more than one drop down list.
+Further, the choice the user makes in one drop down list may affect the options that are presented to them in
+the second or subsequent drop down lists. One technique that may be used to implement this behaviour will now be explained.
+</p>
+<p>
+There are two keys to the technique; one is to use named areas or regions of cells to hold the data for the drop down lists,
+the second is to use the INDIRECT() function to convert between the name and the actual addresses of the cells.
+In the example section there is a complete working example- called LinkedDropDownLists.java -
+that demonstrates how to create linked or dependent drop down lists. Only the more relevant points are explained here.
+</p>
+<p>
+To create two drop down lists where the options shown in the second depend upon the selection made in the first,
+begin by creating a named region of cells to hold all of the data for populating the first drop down list.
+Next, create a data validation that will look to this named area for its data, something like this;
+</p>
+<source>
+ CellRangeAddressList addressList = new CellRangeAddressList(0, 0, 0, 0);
+ DataValidationHelper dvHelper = sheet.getDataValidationHelper();
+ DataValidationConstraint dvConstraint = dvHelper.createFormulaListConstraint(
+ "CHOICES");
+ DataValidation validation = dvHelper.createValidation(
+ dvConstraint, addressList);
+ sheet.addValidationData(validation);
+</source>
+<p>
+Note that the name of the area - in the example above it is 'CHOICES' -
+is simply passed to the createFormulaListConstraint() method. This is sufficient
+to cause Excel to populate the drop down list with data from that named region.
+</p>
+<p>
+Next, for each of the options the user could select in the first drop down list,
+create a matching named region of cells. The name of that region should match the
+text the user could select in the first drop down list. Note, in the example,
+all upper case letters are used in the names of the regions of cells.
+</p>
+
+<p>
+Now, very similar code can be used to create a second, linked, drop down list;
+</p>
+
+<source>
+ CellRangeAddressList addressList = new CellRangeAddressList(0, 0, 1, 1);
+ DataValidationConstraint dvConstraint = dvHelper.createFormulaListConstraint(
+ "INDIRECT(UPPER($A$1))");
+ DataValidation validation = dvHelper.createValidation(
+ dvConstraint, addressList);
+ sheet.addValidationData(validation);
+</source>
+
+<p>
+The key here is in the following Excel function - INDIRECT(UPPER($A$1)) - which is used to populate the second,
+linked, drop down list. Working from the inner-most pair of brackets, it instructs Excel to look
+at the contents of cell A1, to convert what it reads there into upper case – as upper case letters are used
+in the names of each region - and then convert this name into the addresses of those cells that contain
+the data to populate another drop down list.
+</p>
+ </section>
+ </section>
+ <anchor id="Embedded"/>
+ <section><title>Embedded Objects</title>
+ <p>It is possible to perform more detailed processing of an embedded Excel, Word or PowerPoint document,
+ or to work with any other type of embedded object.</p>
+ <p><strong>HSSF:</strong></p>
+ <source>
+ POIFSFileSystem fs = new POIFSFileSystem(new File("excel_with_embedded.xls"));
+ HSSFWorkbook workbook = new HSSFWorkbook(fs);
+ for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
+ //the OLE2 Class Name of the object
+ String oleName = obj.getOLE2ClassName();
+ if (oleName.equals("Worksheet")) {
+ DirectoryNode dn = (DirectoryNode) obj.getDirectory();
+ HSSFWorkbook embeddedWorkbook = new HSSFWorkbook(dn, false);
+ //System.out.println(entry.getName() + ": " + embeddedWorkbook.getNumberOfSheets());
+ } else if (oleName.equals("Document")) {
+ DirectoryNode dn = (DirectoryNode) obj.getDirectory();
+ HWPFDocument embeddedWordDocument = new HWPFDocument(dn);
+ //System.out.println(entry.getName() + ": " + embeddedWordDocument.getRange().text());
+ } else if (oleName.equals("Presentation")) {
+ DirectoryNode dn = (DirectoryNode) obj.getDirectory();
+ SlideShow&lt;?,?&gt; embeddedPowerPointDocument = new HSLFSlideShow(dn);
+ //System.out.println(entry.getName() + ": " + embeddedPowerPointDocument.getSlides().length);
+ } else {
+ if(obj.hasDirectoryEntry()){
+ // The DirectoryEntry is a DocumentNode. Examine its entries to find out what it is
+ DirectoryNode dn = (DirectoryNode) obj.getDirectory();
+ for (Entry entry : dn) {
+ //System.out.println(oleName + "." + entry.getName());
+ }
+ } else {
+ // There is no DirectoryEntry
+ // Recover the object's data from the HSSFObjectData instance.
+ byte[] objectData = obj.getObjectData();
+ }
+ }
+ }
+ </source>
+ <p><strong>XSSF:</strong></p>
+ <source>
+ XSSFWorkbook workbook = new XSSFWorkbook("excel_with_embeded.xlsx");
+ for (PackagePart pPart : workbook.getAllEmbeddedParts()) {
+ String contentType = pPart.getContentType();
+ // Excel Workbook - either binary or OpenXML
+ if (contentType.equals("application/vnd.ms-excel")) {
+ HSSFWorkbook embeddedWorkbook = new HSSFWorkbook(pPart.getInputStream());
+ }
+ // Excel Workbook - OpenXML file format
+ else if (contentType.equals("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")) {
+ OPCPackage docPackage = OPCPackage.open(pPart.getInputStream());
+ XSSFWorkbook embeddedWorkbook = new XSSFWorkbook(docPackage);
+ }
+ // Word Document - binary (OLE2CDF) file format
+ else if (contentType.equals("application/msword")) {
+ HWPFDocument document = new HWPFDocument(pPart.getInputStream());
+ }
+ // Word Document - OpenXML file format
+ else if (contentType.equals("application/vnd.openxmlformats-officedocument.wordprocessingml.document")) {
+ OPCPackage docPackage = OPCPackage.open(pPart.getInputStream());
+ XWPFDocument document = new XWPFDocument(docPackage);
+ }
+ // PowerPoint Document - binary file format
+ else if (contentType.equals("application/vnd.ms-powerpoint")) {
+ HSLFSlideShow slideShow = new HSLFSlideShow(pPart.getInputStream());
+ }
+ // PowerPoint Document - OpenXML file format
+ else if (contentType.equals("application/vnd.openxmlformats-officedocument.presentationml.presentation")) {
+ OPCPackage docPackage = OPCPackage.open(pPart.getInputStream());
+ XSLFSlideShow slideShow = new XSLFSlideShow(docPackage);
+ }
+ // Any other type of embedded object.
+ else {
+ System.out.println("Unknown Embedded Document: " + contentType);
+ InputStream inputStream = pPart.getInputStream();
+ }
+ }
+ </source>
+ </section>
+ <anchor id="Autofilter"/>
+ <p>(Since POI-3.7)</p>
+ <section><title>Autofilters</title>
+ <source>
+ Workbook wb = new HSSFWorkbook(); //or new XSSFWorkbook();
+ Sheet sheet = wb.createSheet();
+ sheet.setAutoFilter(CellRangeAddress.valueOf("C5:F200"));
+ </source>
+ </section>
+ <anchor id="ConditionalFormatting"/>
+ <section><title>Conditional Formatting</title>
+ <source>
+ Workbook workbook = new HSSFWorkbook(); // or new XSSFWorkbook();
+ Sheet sheet = workbook.createSheet();
+
+ SheetConditionalFormatting sheetCF = sheet.getSheetConditionalFormatting();
+
+ ConditionalFormattingRule rule1 = sheetCF.createConditionalFormattingRule(ComparisonOperator.EQUAL, "0");
+ FontFormatting fontFmt = rule1.createFontFormatting();
+ fontFmt.setFontStyle(true, false);
+ fontFmt.setFontColorIndex(IndexedColors.DARK_RED.index);
+
+ BorderFormatting bordFmt = rule1.createBorderFormatting();
+ bordFmt.setBorderBottom(BorderStyle.THIN);
+ bordFmt.setBorderTop(BorderStyle.THICK);
+ bordFmt.setBorderLeft(BorderStyle.DASHED);
+ bordFmt.setBorderRight(BorderStyle.DOTTED);
+
+ PatternFormatting patternFmt = rule1.createPatternFormatting();
+ patternFmt.setFillBackgroundColor(IndexedColors.YELLOW.index);
+
+ ConditionalFormattingRule rule2 = sheetCF.createConditionalFormattingRule(ComparisonOperator.BETWEEN, "-10", "10");
+ ConditionalFormattingRule [] cfRules =
+ {
+ rule1, rule2
+ };
+
+ CellRangeAddress[] regions = {
+ CellRangeAddress.valueOf("A3:A5")
+ };
+
+ sheetCF.addConditionalFormatting(regions, cfRules);
+ </source>
+ <p> See more examples on Excel conditional formatting in
+ <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/ConditionalFormats.java">ConditionalFormats.java</a>
+ </p>
+
+ </section>
+ <anchor id="Hiding"/>
+ <section><title>Hiding and Un-Hiding Rows</title>
+ <p>
+ Using Excel, it is possible to hide a row on a worksheet by selecting that row (or rows),
+ right clicking once on the right hand mouse button and selecting 'Hide' from the pop-up menu that appears.
+ </p>
+ <p>
+ To emulate this using POI, simply call the setZeroHeight() method on an instance of either
+ XSSFRow or HSSFRow (the method is defined on the ss.usermodel.Row interface that both classes implement), like this:
+ </p>
+ <source>
+ Workbook workbook = new XSSFWorkbook(); // OR new HSSFWorkbook()
+ Sheet sheet = workbook.createSheet(0);
+ Row row = workbook.createRow(0);
+ row.setZeroHeight();
+ </source>
+ <p>
+ If the file were saved away to disc now, then the first row on the first sheet would not be visible.
+ </p>
+ <p>
+ Using Excel, it is possible to unhide previously hidden rows by selecting the row above and the row below
+ the one that is hidden and then pressing and holding down the Ctrl key, the Shift and the pressing
+ the number 9 before releasing them all.
+ </p>
+ <p>
+ To emulate this behaviour using POI do something like this:
+ </p>
+ <source>
+ Workbook workbook = WorkbookFactory.create(new File(.......));
+ Sheet = workbook.getSheetAt(0);
+ Iterator&lt;Row&gt; row Iter = sheet.iterator();
+ while(rowIter.hasNext()) {
+ Row row = rowIter.next();
+ if(row.getZeroHeight()) {
+ row.setZeroHeight(false);
+ }
+ }
+ </source>
+ <p>
+ If the file were saved away to disc now, any previously hidden rows on the first sheet of the workbook would now be visible.
+ </p>
+ <p>
+ The example illustrates two features. Firstly, that it is possible to unhide a row simply by calling the setZeroHeight()
+ method and passing the boolean value 'false'. Secondly, it illustrates how to test whether a row is hidden or not.
+ Simply call the getZeroHeight() method and it will return 'true' if the row is hidden, 'false' otherwise.
+ </p>
+ </section>
+ <anchor id="CellProperties"/>
+ <section><title>Setting Cell Properties</title>
+ <p>
+ Sometimes it is easier or more efficient to create a spreadsheet with basic styles and then apply special styles to certain cells
+ such as drawing borders around a range of cells or setting fills for a region. CellUtil.setCellProperties lets you do that without creating
+ a bunch of unnecessary intermediate styles in your spreadsheet.
+ </p>
+ <p>
+ Properties are created as a Map and applied to a cell in the following manner.
+ </p>
+ <source>
+ Workbook workbook = new XSSFWorkbook(); // OR new HSSFWorkbook()
+ Sheet sheet = workbook.createSheet("Sheet1");
+ Map&lt;String, Object&gt; properties = new HashMap&lt;String, Object&gt;();
+
+ // border around a cell
+ properties.put(CellUtil.BORDER_TOP, BorderStyle.MEDIUM);
+ properties.put(CellUtil.BORDER_BOTTOM, BorderStyle.MEDIUM);
+ properties.put(CellUtil.BORDER_LEFT, BorderStyle.MEDIUM);
+ properties.put(CellUtil.BORDER_RIGHT, BorderStyle.MEDIUM);
+
+ // Give it a color (RED)
+ properties.put(CellUtil.TOP_BORDER_COLOR, IndexedColors.RED.getIndex());
+ properties.put(CellUtil.BOTTOM_BORDER_COLOR, IndexedColors.RED.getIndex());
+ properties.put(CellUtil.LEFT_BORDER_COLOR, IndexedColors.RED.getIndex());
+ properties.put(CellUtil.RIGHT_BORDER_COLOR, IndexedColors.RED.getIndex());
+
+ // Apply the borders to the cell at B2
+ Row row = sheet.createRow(1);
+ Cell cell = row.createCell(1);
+ CellUtil.setCellStyleProperties(cell, properties);
+
+ // Apply the borders to a 3x3 region starting at D4
+ for (int ix=3; ix &lt;= 5; ix++) {
+ row = sheet.createRow(ix);
+ for (int iy = 3; iy &lt;= 5; iy++) {
+ cell = row.createCell(iy);
+ CellUtil.setCellStyleProperties(cell, properties);
+ }
+ }
+ </source>
+
+ <p>
+ NOTE: This does not replace the properties of the cell, it merges the properties you have put into the Map with the
+ cell's existing style properties. If a property already exists, it is replaced with the new property. If a property does not
+ exist, it is added. This method will not remove CellStyle properties.
+ </p>
+ </section>
+ <anchor id="DrawingBorders"/>
+ <section>
+ <title>Drawing Borders</title>
+ <p>
+ In Excel, you can apply a set of borders on an entire workbook region at the press of a button. The PropertyTemplate
+ object simulates this with methods and constants defined to allow drawing top, bottom, left, right, horizontal,
+ vertical, inside, outside, or all borders around a range of cells. Additional methods allow for applying colors
+ to the borders.
+ </p>
+ <p>
+ It works like this: you create a PropertyTemplate object which is a container for the borders you wish to apply to a
+ sheet. Then you add borders and colors to the PropertyTemplate, and finally apply it to whichever sheets you need
+ that set of borders on. You can create multiple PropertyTemplate objects and apply them to a single sheet, or you can
+ apply the same PropertyTemplate object to multiple sheets. It is just like a preprinted form.
+ </p>
+ <p>
+ Enums:
+ </p>
+ <dl>
+ <dt>BorderStyle</dt>
+ <dd>
+ Defines the look of the border, is it thick or thin, solid or dashed, single or double.
+ This enum replaces the CellStyle.BORDER_XXXXX constants which have been deprecated. The PropertyTemplate will not
+ support the older style BORDER_XXXXX constants. A special value of BorderStyle.NONE will remove the border from
+ a Cell once it is applied.
+ </dd>
+ <dt>BorderExtent</dt>
+ <dd>
+ Describes the portion of the region that the BorderStyle will apply to. For example, TOP, BOTTOM, INSIDE, or OUTSIDE.
+ A special value of BorderExtent.NONE will remove the border from the PropertyTemplate. When the template is applied,
+ no change will be made to a cell border where no border properties exist in the PropertyTemplate.
+ </dd>
+ </dl>
+ <source>
+ // draw borders (three 3x3 grids)
+ PropertyTemplate pt = new PropertyTemplate();
+ // #1) these borders will all be medium in default color
+ pt.drawBorders(new CellRangeAddress(1, 3, 1, 3),
+ BorderStyle.MEDIUM, BorderExtent.ALL);
+ // #2) these cells will have medium outside borders and thin inside borders
+ pt.drawBorders(new CellRangeAddress(5, 7, 1, 3),
+ BorderStyle.MEDIUM, BorderExtent.OUTSIDE);
+ pt.drawBorders(new CellRangeAddress(5, 7, 1, 3), BorderStyle.THIN,
+ BorderExtent.INSIDE);
+ // #3) these cells will all be medium weight with different colors for the
+ // outside, inside horizontal, and inside vertical borders. The center
+ // cell will have no borders.
+ pt.drawBorders(new CellRangeAddress(9, 11, 1, 3),
+ BorderStyle.MEDIUM, IndexedColors.RED.getIndex(),
+ BorderExtent.OUTSIDE);
+ pt.drawBorders(new CellRangeAddress(9, 11, 1, 3),
+ BorderStyle.MEDIUM, IndexedColors.BLUE.getIndex(),
+ BorderExtent.INSIDE_VERTICAL);
+ pt.drawBorders(new CellRangeAddress(9, 11, 1, 3),
+ BorderStyle.MEDIUM, IndexedColors.GREEN.getIndex(),
+ BorderExtent.INSIDE_HORIZONTAL);
+ pt.drawBorders(new CellRangeAddress(10, 10, 2, 2),
+ BorderStyle.NONE,
+ BorderExtent.ALL);
+
+ // apply borders to sheet
+ Workbook wb = new XSSFWorkbook();
+ Sheet sh = wb.createSheet("Sheet1");
+ pt.applyBorders(sh);
+ </source>
+ <p>
+ NOTE: The last pt.drawBorders() call removes the borders from the range by using BorderStyle.NONE. Like
+ setCellStyleProperties, the applyBorders method merges the properties of a cell style, so existing borders
+ are changed only if they are replaced by something else, or removed only if they are replaced by
+ BorderStyle.NONE. To remove a color from a border, use IndexedColor.AUTOMATIC.getIndex().
+ </p>
+ <p>Additionally, to remove a border or color from the PropertyTemplate object, use BorderExtent.NONE.</p>
+ <p>
+ This does not work with diagonal borders yet.
+ </p>
+ </section>
+ <anchor id="PivotTable"/>
+ <section><title>Creating a Pivot Table</title>
+ <p>
+ Pivot Tables are a powerful feature of spreadsheet files. You can create a pivot table with the following piece of code.
+ </p>
+ <source>
+ XSSFWorkbook wb = new XSSFWorkbook();
+ XSSFSheet sheet = wb.createSheet();
+
+ //Create some data to build the pivot table on
+ setCellData(sheet);
+
+ XSSFPivotTable pivotTable = sheet.createPivotTable(new AreaReference("A1:D4"), new CellReference("H5"));
+ //Configure the pivot table
+ //Use first column as row label
+ pivotTable.addRowLabel(0);
+ //Sum up the second column
+ pivotTable.addColumnLabel(DataConsolidateFunction.SUM, 1);
+ //Set the third column as filter
+ pivotTable.addColumnLabel(DataConsolidateFunction.AVERAGE, 2);
+ //Add filter on forth column
+ pivotTable.addReportFilter(3);
+ </source>
+ </section>
+ <anchor id="RichText"/>
+ <section><title>Cells with multiple styles (Rich Text Strings)</title>
+ <p>
+ To apply a single set of text formatting (colour, style, font etc)
+ to a cell, you should create a
+ <a href="/../apidocs/dev/org/apache/poi/ss/usermodel/CellStyle.html">CellStyle</a>
+ for the workbook, then apply to the cells.
+ </p>
+ <source>
+ // HSSF Example
+ HSSFCell hssfCell = row.createCell(idx);
+ //rich text consists of two runs
+ HSSFRichTextString richString = new HSSFRichTextString( "Hello, World!" );
+ richString.applyFont( 0, 6, font1 );
+ richString.applyFont( 6, 13, font2 );
+ hssfCell.setCellValue( richString );
+
+
+ // XSSF Example
+ XSSFCell cell = row.createCell(1);
+ XSSFRichTextString rt = new XSSFRichTextString("The quick brown fox");
+
+ XSSFFont font1 = wb.createFont();
+ font1.setBold(true);
+ font1.setColor(new XSSFColor(new java.awt.Color(255, 0, 0)));
+ rt.applyFont(0, 10, font1);
+
+ XSSFFont font2 = wb.createFont();
+ font2.setItalic(true);
+ font2.setUnderline(XSSFFont.U_DOUBLE);
+ font2.setColor(new XSSFColor(new java.awt.Color(0, 255, 0)));
+ rt.applyFont(10, 19, font2);
+
+ XSSFFont font3 = wb.createFont();
+ font3.setColor(new XSSFColor(new java.awt.Color(0, 0, 255)));
+ rt.append(" Jumped over the lazy dog", font3);
+
+ cell.setCellValue(rt);
+ </source>
+ <p>
+ To apply different formatting to different parts of a cell, you
+ need to use
+ <a href="../../apidocs/dev/org/apache/poi/ss/usermodel/RichTextString.html">RichTextString</a>,
+ which permits styling of parts of the text within the cell.
+ </p>
+ <p>
+ There are some slight differences between HSSF and XSSF, especially
+ around font colours (the two formats store colours quite differently
+ internally), refer to the
+ <a href="../../apidocs/dev/org/apache/poi/hssf/usermodel/HSSFRichTextString.html">HSSF Rich Text String</a>
+ and
+ <a href="../../apidocs/dev/org/apache/poi/xssf/usermodel/XSSFRichTextString.html">XSSF Rich Text String</a>
+ javadocs for more details.
+ </p>
+ </section>
+ </body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/record-generator.xml b/src/documentation/content/xdocs/components/spreadsheet/record-generator.xml
new file mode 100644
index 0000000000..39f328bc78
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/record-generator.xml
@@ -0,0 +1,212 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>Record Generator HOWTO</title>
+ <authors>
+ <person email="user@poi.apache.org" name="Glen Stampoultzis" id="glens"/>
+ <person email="acoliver@apache.org" name="Andrew C. Oliver" id="acoliver"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>How to Use the Record Generator</title>
+
+ <section><title>History</title>
+ <p>
+ The record generator was born from frustration with translating
+ the Excel records to Java classes. Doing this manually is a time
+ consuming process. It's also very easy to make mistakes.
+ </p>
+ <p>
+ A utility was needed to take the definition of what a
+ record looked like and do all the boring and repetitive work.
+ </p>
+ </section>
+
+ <section><title>Capabilities</title>
+ <p>
+ The record generator takes XML as input and produces the following
+ output:
+ </p>
+ <ul>
+ <li>A Java file capable of decoding and encoding the record.</li>
+ <li>A test class that provides a fill-in-the-blanks implementation
+ of a test case for ensuring the record operates as
+ designed.</li>
+ </ul>
+ </section>
+ <section><title>Usage</title>
+ <p>
+ The record generator is invoked as an Ant target
+ (generate-records). It goes through looking for all files in
+ <code>src/records/definitions</code> ending with _record.xml.
+ It then creates two files; the Java record definition and the
+ Java test case template.
+ </p>
+ <p>
+ The records themselves have the following general layout:
+ </p>
+ <source><![CDATA[
+<record id="0x1032" name="Frame" package="org.apache.poi.hssf.record"
+ excel-record-id="FRAME">
+ <description>The frame record indicates whether there is a border
+ around the displayed text of a chart.</description>
+ <author>Glen Stampoultzis (glens at apache.org)</author>
+ <fields>
+ <field type="int" size="2" name="border type">
+ <const name="regular" value="0" description="regular rectangle or no border"/>
+ <const name="shadow" value="1" description="rectangle with shadow"/>
+ </field>
+ <field type="int" size="2" name="options">
+ <bit number="0" name="auto size"
+ description="excel calculates the size automatically if true"/>
+ <bit number="1" name="auto position"
+ description="excel calculates the position automatically"/>
+ </field>
+ </fields>
+</record>
+ ]]></source>
+ <p>
+ The following table details the allowable types and sizes for
+ the fields.
+ </p>
+ <table>
+ <tr>
+ <th>Type</th>
+ <th>Size</th>
+ <th>Java Type</th>
+ </tr>
+ <tr>
+ <td>int</td>
+ <td>1</td>
+ <td>byte</td>
+ </tr>
+ <tr>
+ <td>int</td>
+ <td>2</td>
+ <td>short</td>
+ </tr>
+ <tr>
+ <td>int</td>
+ <td>4</td>
+ <td>int</td>
+ </tr>
+ <tr>
+ <td>int</td>
+ <td>8</td>
+ <td>long</td>
+ </tr>
+ <tr>
+ <td>int</td>
+ <td>varword</td>
+ <td>array of shorts</td>
+ </tr>
+ <tr>
+ <td>bits</td>
+ <td>1</td>
+ <td>A byte comprising of a bits (defined by the bit element)
+ </td>
+ </tr>
+ <tr>
+ <td>bits</td>
+ <td>2</td>
+ <td>An short comprising of a bits</td>
+ </tr>
+ <tr>
+ <td>bits</td>
+ <td>4</td>
+ <td>A int comprising of a bits</td>
+ </tr>
+ <tr>
+ <td>float</td>
+ <td>8</td>
+ <td>double</td>
+ </tr>
+ <tr>
+ <td>hbstring</td>
+ <td>java expression</td>
+ <td>String</td>
+ </tr>
+ </table>
+ <p>
+ The Java records are regenerated each time the record generator is
+ run, however the test stubs are only created if the test stub does
+ not already exist. What this means is that you may change test
+ stubs but not the generated records.
+ </p>
+ </section>
+ <section><title>Custom Field Types</title>
+ <p>
+ Occasionally the builtin types are not enough. More control
+ over the encoding and decoding of the streams is required. This
+ can be achieved using a custom type.
+ </p>
+ <p>
+ A custom type lets you escape to java to define the way in which
+ the field encodes and decodes. To code a custom type you
+ declare your field like this:
+ </p>
+ <source><![CDATA[
+ <field type="custom:org.apache.poi.hssf.record.LinkedDataFormulaField"
+ size="var" name="formula of link" description="formula"/>
+ ]]></source>
+ <p>
+ Where the class name specified after <code>custom:</code> is a
+ class implementing the interface <code>CustomField</code>.
+ </p>
+ <p>
+ You can then implement the encoding yourself.
+ </p>
+ </section>
+ <section><title>How it Works</title>
+ <p>
+ The record generation works by taking an XML file and styling it
+ using XSLT. Given that XSLT is a little limited in some ways it was
+ necessary to add a little Java code to the mix.
+ </p>
+ <p>
+ See record.xsl, record_test.xsl, FieldIterator.java,
+ RecordUtil.java, RecordGenerator.java
+ </p>
+ <p>
+ There is a corresponding &quot;type&quot; generator for HWPF.
+ See the HWPF documentation for details.
+ </p>
+ </section>
+ <section><title>Limitations</title>
+ <p>
+ The record generator does not handle all possible record types and
+ goes not intend to perform this function. When dealing with a
+ non-standard record sometimes the cost-benefit of coding the
+ record by hand will be greater than attempting modify the
+ generator. The main point of the record generator is to save
+ time, so keep that in mind.
+ </p>
+ <p>
+ Currently the XSL file that generates the record calls out to
+ Java objects. The Java code for the record generation is
+ currently quite messy with minimal comments.
+ </p>
+ </section>
+</section>
+</body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/use-case.xml b/src/documentation/content/xdocs/components/spreadsheet/use-case.xml
new file mode 100644
index 0000000000..6c8ed246e7
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/use-case.xml
@@ -0,0 +1,200 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>HSSF Use Cases</title>
+ <authors>
+ <person email="marc.johnson@yahoo.com" name="Marc Johnson" id="MJ"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>HSSF Use Cases</title>
+ <section><title>Use Case 1: Read existing HSSF</title>
+
+<p><strong>Primary Actor:</strong> HSSF client</p>
+<p><strong>Scope:</strong> HSSF</p>
+<p><strong>Level:</strong> Summary</p>
+<p><strong>Stakeholders and Interests:</strong></p>
+<ul>
+ <li>HSSF client- wants to read content
+ of HSSF file</li>
+ <li>HSSF - understands HSSF file</li>
+ <li>POIFS - understands underlying POI
+ file system</li>
+</ul>
+<p><strong>Precondition:</strong> None</p>
+<p><strong>Minimal Guarantee:</strong> None</p>
+<p><strong>Main Success Guarantee:</strong></p>
+<ol>
+ <li>HSSF client requests HSSF to read
+ a HSSF file, providing an InputStream
+ containing HSSF file in question.</li>
+ <li>HSSF requests POIFS to read the HSSF
+ file, passing the InputStream
+ object to POIFS (POIFS use case 1, read existing file system)</li>
+ <li>HSSF reads the &quot;Workbook&quot;
+ file (use case 4, read workbook entry)</li>
+</ol>
+<p><strong>Extensions:</strong></p>
+<p>2a. Exceptions
+thrown by POIFS will be passed on to the HSSF client.</p>
+</section>
+ <section><title>Use Case 2: Write HSSF file</title>
+
+<p><strong>Primary Actor:</strong> HSSF client</p>
+<p><strong>Scope:</strong> HSSF</p>
+<p><strong>Level:</strong> Summary</p>
+<p><strong>Stakeholders and Interests:</strong></p>
+<ul>
+ <li>HSSF client- wants to write file
+ out.</li>
+ <li>HSSF - knows how to write file
+ out.</li>
+ <li>POIFS - knows how to write file
+ system out.</li>
+</ul>
+<p><strong>Precondition:</strong></p>
+<ul>
+ <li>File has been
+ read (use case 1, read existing HSSF file) and subsequently modified
+ or file has been created (use case 3, create HSSF file)</li>
+</ul>
+<p><strong>Minimal Guarantee:</strong> None</p>
+<p><strong>Main Success Guarantee:</strong></p>
+<ol>
+ <li>HSSF client
+ provides an OutputStream to
+ write the file to.</li>
+ <li>HSSF writes
+ the &quot;Workbook&quot; to its associated POIFS file system (use case
+ 5, write workbook entry)</li>
+ <li>HSSF
+ requests POIFS to write its file system out, using the OutputStream
+ obtained from the HSSF client (POIFS use case 2, write file system).</li>
+</ol>
+<p><strong>Extensions:</strong></p>
+<p>3a. Exceptions
+from POIFS are passed to the HSSF client.</p>
+
+</section>
+ <section><title>Use Case 3:Create HSSF file</title>
+
+<p><strong>Primary Actor:</strong> HSSF client</p>
+<p><strong>Scope:</strong> HSSF</p>
+<p>
+<strong>Level:</strong> Summary</p>
+<p><strong>Stakeholders and Interests:</strong></p>
+<ul>
+ <li>HSSF client- wants to create a new
+ file.</li>
+ <li>HSSF - knows how to create a new
+ file.</li>
+ <li>POIFS - knows how to create a new
+ file system.</li>
+</ul>
+<p><strong>Precondition:</strong></p>
+<p><strong>Minimal Guarantee:</strong> None</p>
+<p><strong>Main Success Guarantee:</strong></p>
+<ol>
+ <li>HSSF requests
+ POIFS to create a new file system (POIFS use case 3, create new file
+ system)</li>
+</ol>
+<p><strong>Extensions:</strong>
+None</p>
+
+</section>
+ <section><title>Use Case 4: Read workbook entry</title>
+<p><strong>Primary Actor:</strong> HSSF</p>
+<p><strong>Scope:</strong> HSSF</p>
+<p>
+<strong>Level:</strong> Summary</p>
+<p><strong>Stakeholders and Interests:</strong></p>
+<ul>
+ <li>HSSF - knows how to read the
+ workbook entry</li>
+ <li>POIFS - knows how to manage the file
+ system.</li>
+</ul>
+<p><strong>Precondition:</strong></p>
+<ul>
+ <li>The file
+ system has been read (use case 1, read existing HSSF file) or has
+ been created and written to (use case 3, create HSSF file system;
+ use case 5, write workbook entry).</li>
+</ul>
+<p><strong>Minimal
+Guarantee:</strong> None</p>
+<p><strong>Main Success Guarantee:</strong></p>
+<ol>
+ <li>
+ HSSF requests POIFS for the &quot;Workbook&quot; file</li>
+ <li>POIFS returns
+ an InputStream for the file.</li>
+ <li>HSSF reads
+ from the InputStream provided by POIFS</li>
+ <li>HSSF closes
+ the InputStream provided by POIFS</li>
+</ol>
+<p><strong>Extensions:</strong></p>
+<p>3a. Exceptions
+thrown by POIFS will be passed on</p>
+</section>
+ <section><title>Use Case 5: Write workbook entry</title>
+
+
+<p><strong>Primary Actor:</strong> HSSF</p>
+<p><strong>Scope:</strong> HSSF</p>
+<p>
+<strong>Level:</strong> Summary</p>
+<p><strong>Stakeholders and Interests:</strong></p>
+<ul>
+ <li>HSSF - knows how to manage the
+ write the workbook entry.</li>
+ <li>POIFS - knows how to manage the file
+ system.</li>
+</ul>
+<p><strong>Precondition:</strong>
+</p>
+<ul>
+ <li>Either an existing HSSF file has
+ been read (use case 1, read existing HSSF file) or an HSSF file has
+ been created (use case 3, create HSSF file).</li>
+</ul>
+<p><strong>Minimal Guarantee:</strong> None</p>
+<p><strong>Main Success Guarantee:</strong></p>
+<ol>
+ <li>HSSF
+ checks the POIFS file system directory for the &quot;Workbook&quot;
+ file (POIFS use case 8, read file system directory)</li>
+ <li>If &quot;Workbook&quot; is in the directory, HSSF requests POIFS to
+ replace it with the new workbook entry (POIFS use case 4, replace file
+ in file system). Otherwise, HSSF requests POIFS to write the new
+ workbook file, with the name &quot;Workbook&quot; (POIFS use case 6,
+ write new file to file system)</li>
+</ol>
+<p><strong>Extensions:</strong>None</p>
+</section>
+
+</section>
+</body>
+</document>
diff --git a/src/documentation/content/xdocs/components/spreadsheet/user-defined-functions.xml b/src/documentation/content/xdocs/components/spreadsheet/user-defined-functions.xml
new file mode 100644
index 0000000000..1d5819b88b
--- /dev/null
+++ b/src/documentation/content/xdocs/components/spreadsheet/user-defined-functions.xml
@@ -0,0 +1,414 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>User Defined Functions</title>
+ <authors>
+ <person email="jon@loquatic.com" name="Jon Svede" id="JDS"/>
+ <person email="brian.bush@nrel.gov" name="Brian Bush" id="BWB"/>
+ </authors>
+ </header>
+ <body>
+ <section><title>How to Create and Use User Defined Functions</title>
+
+ <section><title>Description</title>
+ <p>This document describes the User Defined Functions within POI.
+ User defined functions allow you to take code that is written in VBA
+ and re-write in Java and use within POI. Consider the following example.</p>
+ </section>
+ <section><title>An Example</title>
+ <p>Suppose you are given a spreadsheet that can calculate the principal and interest
+ payments for a mortgage. The user enters the principal loan amount, the interest rate
+ and the term of the loan. The Excel spreadsheet does the rest.</p>
+ <p>
+ <img src="images/simple-xls-with-function.jpg" alt="mortgage calculation spreadsheet"/>
+ </p>
+ <p>When you actually look at the workbook you discover that rather than having
+ the formula in a cell it has been written as VBA function. You review the
+ function and determine that it could be written in Java:</p>
+ <p>
+ <img src="images/calculatePayment.jpg" alt="VBA code"/>
+ </p>
+ <p>If we write a small program to try to evaluate this cell, we'll fail. Consider this source code:</p>
+ <source><![CDATA[
+import java.io.File ;
+import java.io.FileInputStream ;
+import java.io.FileNotFoundException ;
+import java.io.IOException ;
+
+import org.apache.poi.openxml4j.exceptions.InvalidFormatException ;
+import org.apache.poi.ss.formula.functions.FreeRefFunction ;
+import org.apache.poi.ss.formula.udf.AggregatingUDFFinder ;
+import org.apache.poi.ss.formula.udf.DefaultUDFFinder ;
+import org.apache.poi.ss.formula.udf.UDFFinder ;
+import org.apache.poi.ss.usermodel.Cell ;
+import org.apache.poi.ss.usermodel.CellValue ;
+import org.apache.poi.ss.usermodel.Row ;
+import org.apache.poi.ss.usermodel.Sheet ;
+import org.apache.poi.ss.usermodel.Workbook ;
+import org.apache.poi.ss.usermodel.WorkbookFactory ;
+import org.apache.poi.ss.util.CellReference ;
+
+public class Evaluator {
+
+
+
+ public static void main( String[] args ) {
+
+ System.out.println( "fileName: " + args[0] ) ;
+ System.out.println( "cell: " + args[1] ) ;
+
+ File workbookFile = new File( args[0] ) ;
+
+ try {
+ FileInputStream fis = new FileInputStream(workbookFile);
+ Workbook workbook = WorkbookFactory.create(fis);
+
+ FormulaEvaluator evaluator = workbook.getCreationHelper().createFormulaEvaluator();
+
+ CellReference cr = new CellReference( args[1] ) ;
+ String sheetName = cr.getSheetName() ;
+ Sheet sheet = workbook.getSheet( sheetName ) ;
+ int rowIdx = cr.getRow() ;
+ int colIdx = cr.getCol() ;
+ Row row = sheet.getRow( rowIdx ) ;
+ Cell cell = row.getCell( colIdx ) ;
+
+ CellValue value = evaluator.evaluate( cell ) ;
+
+ System.out.println("returns value: " + value ) ;
+
+
+ } catch( FileNotFoundException e ) {
+ e.printStackTrace();
+ } catch( InvalidFormatException e ) {
+ e.printStackTrace();
+ } catch( IOException e ) {
+ e.printStackTrace();
+ }
+ }
+}
+
+]]></source>
+ <p>If you run this code, you're likely to get the following error:</p>
+
+ <source><![CDATA[
+Exception in thread "main" org.apache.poi.ss.formula.eval.NotImplementedException: Error evaluating cell Sheet1!B4
+ at org.apache.poi.ss.formula.WorkbookEvaluator.addExceptionInfo(WorkbookEvaluator.java:321)
+ at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:288)
+ at org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:221)
+ at org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.evaluateFormulaCellValue(HSSFFormulaEvaluator.java:320)
+ at org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.evaluate(HSSFFormulaEvaluator.java:182)
+ at poi.tests.Evaluator.main(Evaluator.java:61)
+Caused by: org.apache.poi.ss.formula.eval.NotImplementedException: calculatePayment
+ at org.apache.poi.ss.formula.UserDefinedFunction.evaluate(UserDefinedFunction.java:59)
+ at org.apache.poi.ss.formula.OperationEvaluatorFactory.evaluate(OperationEvaluatorFactory.java:129)
+ at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:456)
+ at org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:279)
+ ... 4 more
+
+]]></source>
+
+ <p>How would we make it so POI can use this sheet?</p>
+ </section>
+
+ <section><title>Defining Your Function</title>
+ <p>To 'convert' this code to Java and make it available to POI you need to implement
+ a FreeRefFunction instance. FreeRefFunction is an interface in the org.apache.poi.ss.formula.functions
+ package. This interface defines one method, evaluate(ValueEval[] args, OperationEvaluationContext ec),
+ which is how you will receive the argument values from POI.</p>
+ <p>The evaluate() method as defined above is where you will convert the ValueEval instances to the
+ proper number types. The following code snippet shows you how to get your values:</p>
+
+ <source><![CDATA[
+public class CalculateMortgage implements FreeRefFunction {
+
+@Override
+public ValueEval evaluate( ValueEval[] args, OperationEvaluationContext ec ) {
+ if (args.length != 3) {
+ return ErrorEval.VALUE_INVALID;
+ }
+
+ double principal, rate, years, result;
+ try {
+ ValueEval v1 = OperandResolver.getSingleValue( args[0],
+ ec.getRowIndex(),
+ ec.getColumnIndex() ) ;
+ ValueEval v2 = OperandResolver.getSingleValue( args[1],
+ ec.getRowIndex(),
+ ec.getColumnIndex() ) ;
+ ValueEval v3 = OperandResolver.getSingleValue( args[2],
+ ec.getRowIndex(),
+ ec.getColumnIndex() ) ;
+
+ principal = OperandResolver.coerceValueToDouble( v1 ) ;
+ rate = OperandResolver.coerceValueToDouble( v2 ) ;
+ years = OperandResolver.coerceValueToDouble( v3 ) ;
+ ]]></source>
+
+ <p>The first thing we do is check the number of arguments being passed since there is no sense
+ in attempting to go further if you are missing critical information.</p>
+ <p>Next we declare our variables, in our case we need variables for:</p>
+ <ul>
+ <li>principal - the amount of the loan</li>
+ <li>rate - the interest rate as a decimal</li>
+ <li>years - the length of the loan in years</li>
+ <li>result - the result of the calculation</li>
+ </ul>
+ <p>Next, we use the OperandResolver to convert the ValueEval instances to doubles, though not directly.
+ First we start by getting discreet values. Using the OperandResolver.getSingleValue() method
+ we retrieve each of the values passed in by the cell in the spreadsheet. Next, we use the
+ OperandResolver again to convert the ValueEval instances to doubles, in this case. This
+ class has other methods of coercion for getting Strings, ints and booleans. Now that we've
+ got our primitive values we can move on to calculating the value.</p>
+ <p>As shown previously, we have the VBA source. We need to add code to our class to calculate
+ the payment. To do this you could simply add it to the method we've already created but I've
+ chosen to add it as its own method. Add the following method: </p>
+ <source><![CDATA[
+public double calculateMortgagePayment( double p, double r, double y ) {
+
+ double i = r / 12 ;
+ double n = y * 12 ;
+
+ double principalAndInterest =
+ p * (( i * Math.pow((1 + i),n ) ) / ( Math.pow((1 + i),n) - 1)) ;
+
+ return principalAndInterest ;
+}
+ ]]></source>
+ <p>The biggest change necessary is related to the exponents; Java doesn't have a notation for this
+ so we had to add calls to Math.pow(). Now we need to add this call to our previous method:</p>
+ <source><![CDATA[
+ result = calculateMortgagePayment( principal, rate, years ) ;
+ ]]></source>
+ <p>Having done that, the last things we need to do are to check to make sure we didn't get a bad result and,
+ if not, we need to return the value. Add the following code to the class:</p>
+ <source><![CDATA[
+private void checkValue(double result) throws EvaluationException {
+ if (Double.isNaN(result) || Double.isInfinite(result)) {
+ throw new EvaluationException(ErrorEval.NUM_ERROR);
+ }
+}
+ ]]></source>
+ <p>Then add a line of code to our evaluate method to call this new static method, complete our try/catch and return the value:</p>
+ <source><![CDATA[
+ checkValue(result);
+
+ } catch (EvaluationException e) {
+ e.printStackTrace() ;
+ return e.getErrorEval();
+ }
+
+ return new NumberEval( result ) ;
+ ]]></source>
+
+ <p>So the whole class would be as follows:</p>
+
+ <source><![CDATA[
+import org.apache.poi.ss.formula.OperationEvaluationContext ;
+import org.apache.poi.ss.formula.eval.ErrorEval ;
+import org.apache.poi.ss.formula.eval.EvaluationException ;
+import org.apache.poi.ss.formula.eval.NumberEval ;
+import org.apache.poi.ss.formula.eval.OperandResolver ;
+import org.apache.poi.ss.formula.eval.ValueEval ;
+import org.apache.poi.ss.formula.functions.FreeRefFunction ;
+
+/**
+ * A simple function to calculate principal and interest.
+ *
+ * @author Jon Svede
+ *
+ */
+public class CalculateMortgage implements FreeRefFunction {
+
+ @Override
+ public ValueEval evaluate( ValueEval[] args, OperationEvaluationContext ec ) {
+ if (args.length != 3) {
+ return ErrorEval.VALUE_INVALID;
+ }
+
+ double principal, rate, years, result;
+ try {
+ ValueEval v1 = OperandResolver.getSingleValue( args[0],
+ ec.getRowIndex(),
+ ec.getColumnIndex() ) ;
+ ValueEval v2 = OperandResolver.getSingleValue( args[1],
+ ec.getRowIndex(),
+ ec.getColumnIndex() ) ;
+ ValueEval v3 = OperandResolver.getSingleValue( args[2],
+ ec.getRowIndex(),
+ ec.getColumnIndex() ) ;
+
+ principal = OperandResolver.coerceValueToDouble( v1 ) ;
+ rate = OperandResolver.coerceValueToDouble( v2 ) ;
+ years = OperandResolver.coerceValueToDouble( v3 ) ;
+
+ result = calculateMortgagePayment( principal, rate, years ) ;
+
+ checkValue(result);
+
+ } catch (EvaluationException e) {
+ e.printStackTrace() ;
+ return e.getErrorEval();
+ }
+
+ return new NumberEval( result ) ;
+ }
+
+ public double calculateMortgagePayment( double p, double r, double y ) {
+ double i = r / 12 ;
+ double n = y * 12 ;
+
+ //M = P [ i(1 + i)n ] / [ (1 + i)n - 1]
+ double principalAndInterest =
+ p * (( i * Math.pow((1 + i),n ) ) / ( Math.pow((1 + i),n) - 1)) ;
+
+ return principalAndInterest ;
+ }
+
+ /**
+ * Excel does not support infinities and NaNs, rather, it gives a #NUM! error in these cases
+ *
+ * @throws EvaluationException (#NUM!) if <tt>result</tt> is <tt>NaN</> or <tt>Infinity</tt>
+ */
+ static final void checkValue(double result) throws EvaluationException {
+ if (Double.isNaN(result) || Double.isInfinite(result)) {
+ throw new EvaluationException(ErrorEval.NUM_ERROR);
+ }
+ }
+
+}
+
+ ]]></source>
+
+ <p>Great! Now we need to go back to our original program that failed to evaluate our cell and add code that will allow it run our new Java code.</p>
+
+ </section>
+
+ <section><title>Registering Your Function</title>
+ <p>Now we need to register our function in the Workbook, so that the Formula Evaluator can resolve the name "calculatePayment"
+and map it to the actual implementation (CalculateMortgage). This is done using the UDFFinder object.
+The UDFFinder manages FreeRefFunctions which are our analogy for the VBA code. We need to create a UDFFinder. There are
+ a few things we need to know in order to do this:</p>
+ <ul>
+ <li>The name of the function in the VBA code (in our case it is calculatePayment)</li>
+ <li>The Class name of our FreeRefFunction</li>
+ </ul>
+ <p>UDFFinder is actually an interface, so we need to use an actual implementation of this interface. Therefore we use the org.apache.poi.ss.formula.udf.DefaultUDFFinder class. If you refer to the Javadocs you'll see that this class expects to get two arrays, one
+ containing the alias and the other containing an instance of the class that will represent that alias. In our case our alias will be calculatePayment
+ and our class instance will be of the CalculateMortgage type. This class needs to be available at compile and runtime. Be sure to keep these arrays
+ well organized because you'll run into problems if these arrays are of different sizes or the alias aren't in the same relative position in their respective
+ arrays. Add the following code:</p>
+ <source><![CDATA[
+String[] functionNames = { "calculatePayment" } ;
+FreeRefFunction[] functionImpls = { new CalculateMortgage() } ;
+
+UDFFinder udfs = new DefaultUDFFinder( functionNames, functionImpls ) ;
+UDFFinder udfToolpack = new AggregatingUDFFinder( udfs ) ;
+ ]]></source>
+ <p>Now we have our UDFFinder instance and we've created the AggregatingUDFFinder instance. The last step is to pass this to our Workbook:</p>
+
+ <source><![CDATA[
+workbook.addToolPack(udfToolpack);
+ ]]></source>
+ <p>So now the whole class will look like this:</p>
+ <source><![CDATA[
+import java.io.File ;
+import java.io.FileInputStream ;
+import java.io.FileNotFoundException ;
+import java.io.IOException ;
+
+import org.apache.poi.openxml4j.exceptions.InvalidFormatException ;
+import org.apache.poi.ss.formula.functions.FreeRefFunction ;
+import org.apache.poi.ss.formula.udf.AggregatingUDFFinder ;
+import org.apache.poi.ss.formula.udf.DefaultUDFFinder ;
+import org.apache.poi.ss.formula.udf.UDFFinder ;
+import org.apache.poi.ss.usermodel.Cell ;
+import org.apache.poi.ss.usermodel.CellValue ;
+import org.apache.poi.ss.usermodel.Row ;
+import org.apache.poi.ss.usermodel.Sheet ;
+import org.apache.poi.ss.usermodel.Workbook ;
+import org.apache.poi.ss.usermodel.WorkbookFactory ;
+import org.apache.poi.ss.util.CellReference ;
+
+public class Evaluator {
+
+ public static void main( String[] args ) {
+
+ System.out.println( "fileName: " + args[0] ) ;
+ System.out.println( "cell: " + args[1] ) ;
+
+ File workbookFile = new File( args[0] ) ;
+
+ try {
+ FileInputStream fis = new FileInputStream(workbookFile);
+ Workbook workbook = WorkbookFactory.create(fis);
+
+ String[] functionNames = { "calculatePayment" } ;
+ FreeRefFunction[] functionImpls = { new CalculateMortgage() } ;
+
+ UDFFinder udfs = new DefaultUDFFinder( functionNames, functionImpls ) ;
+ UDFFinder udfToolpack = new AggregatingUDFFinder( udfs ) ;
+
+ workbook.addToolPack(udfToolpack);
+
+ FormulaEvaluator evaluator = workbook.getCreationHelper().createFormulaEvaluator();
+
+ CellReference cr = new CellReference( args[1] ) ;
+ String sheetName = cr.getSheetName() ;
+ Sheet sheet = workbook.getSheet( sheetName ) ;
+ int rowIdx = cr.getRow() ;
+ int colIdx = cr.getCol() ;
+ Row row = sheet.getRow( rowIdx ) ;
+ Cell cell = row.getCell( colIdx ) ;
+
+ CellValue value = evaluator.evaluate( cell ) ;
+
+ System.out.println("returns value: " + value ) ;
+
+
+ } catch( FileNotFoundException e ) {
+ e.printStackTrace();
+ } catch( InvalidFormatException e ) {
+ e.printStackTrace();
+ } catch( IOException e ) {
+ e.printStackTrace();
+ }
+ }
+}
+
+ ]]></source>
+ <p>Now that our evaluator is aware of the UDFFinder which in turn is aware of our FreeRefFunction, we're ready to re-run our example:</p>
+ <source>Evaluator mortgage-calculation.xls Sheet1!B4</source>
+ <p>which prints the following output in the console:</p>
+ <source><![CDATA[
+fileName: mortgage-calculation.xls
+cell: Sheet1!B4
+returns value: org.apache.poi.ss.usermodel.CellValue [790.7936267415464]
+ ]]></source>
+ <p>That is it! Now you can create Java code and register it, allowing your POI based appliction to run spreadsheets that previously were inaccessible.</p>
+ <p>This example can be found in the <a href="https://github.com/apache/poi/tree/trunk/poi-examples/src/main/java/org/apache/poi/examples/ss/formula">poi-examples/src/main/java/org/apache/poi/examples/ss/formula</a> folder in the source.</p>
+ </section>
+ </section>
+</body>
+</document>
+