6 files changed, 2200 insertions, 0 deletions
diff --git a/src/documentation/content/xdocs/hpsf/book.xml b/src/documentation/content/xdocs/hpsf/book.xml
new file mode 100644
index 0000000000..529baed75a
--- /dev/null
+++ b/src/documentation/content/xdocs/hpsf/book.xml
@@ -0,0 +1,21 @@
+<?xml version="1.0"?>
+<!DOCTYPE book PUBLIC "-//APACHE//DTD Cocoon Documentation Book V1.0//EN" "../dtd/book-cocoon-v10.dtd">
+<!-- $Id$ -->
+<book software="POI Project" 
+      title="HPSF" 
+      copyright="@year@ POI Project">
+
+  <menu label="Navigation">
+    <menu-item label="Main" href="../index.html"/>
+  </menu>
+  <menu label="HPSF">
+    <menu-item label="Overview" href="index.html"/>
+    <menu-item label="How To" href="how-to.html"/>
+    <menu-item label="Thumbnails" href="thumbnails.html"/>
+    <menu-item label="Internals" href="internals.html"/>
+    <menu-item label="To Do" href="todo.html"/>
+  </menu>
+
+</book>
+
+
diff --git a/src/documentation/content/xdocs/hpsf/how-to.xml b/src/documentation/content/xdocs/hpsf/how-to.xml
new file mode 100644
index 0000000000..57f880700e
--- /dev/null
+++ b/src/documentation/content/xdocs/hpsf/how-to.xml
@@ -0,0 +1,868 @@
+<?xml version="1.0" encoding="iso-8859-1"?>
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
+"../dtd/document-v11.dtd">
+<!-- $Id$ -->
+
+<document>
+ <header>
+  <title>HPSF HOW-TO</title>
+  <authors>
+   <person name="Rainer Klute" email="klute@apache.org"/>
+  </authors>
+ </header>
+ <body>
+  <section><title>How To Use the HPSF APIs</title>
+
+   <p>This HOW-TO is organized in three sections. You should read them
+    sequentially because the later sections build upon the earlier ones.</p>
+
+   <ol>
+    <li>
+     The <link href="#sec1">first section</link> explains how to read
+      the most important standard properties of a Microsoft Office
+      document. Standard properties are things like title, author, creation
+      date etc. It is quite likely that you will find here what you need and
+      don't have to read the other sections.
+    </li>
+
+    <li>
+     The <link href="#sec2">second section</link> goes a small step
+      further and focusses on reading additional standard properties. It also
+      talks about exceptions that may be thrown when dealing with HPSF and
+      shows how you can read properties of embedded objects.
+     </li>
+
+    <li>
+     The <link href="#sec3">third section</link> tells how to read
+      non-standard properties. Non-standard properties are application-specific
+      triples consisting of an ID, a type, and a value.
+     </li>
+   </ol>
+
+
+
+   <anchor id="sec1"/>
+   <section><title>Reading Standard Properties</title>
+
+    <note>This section explains how to read
+      the most important standard properties of a Microsoft Office
+      document. Standard properties are things like title, author, creation
+      date etc. Chances are that you will find here what you need and
+      don't have to read the other sections.</note>
+
+    <p>The first thing you should understand is that properties are stored in
+     separate documents inside the POI filesystem. (If you don't know what a
+     POI filesystem is, read the <link href="../poifs/index.html">POIFS
+      documentation</link>.)  A document in a POI filesystem is also called a
+     <strong>stream</strong>.</p>
+
+    <p>The following example shows how to read a POI filesystem's
+     "title" property. Reading other properties is similar. Consider the API
+     documentation of <code>org.apache.poi.hpsf.SummaryInformation</code> to
+     learn which methods are available!</p>
+
+    <p>The standard properties this section focusses on can be found in a
+     document called <em>\005SummaryInformation</em> located in the root of the
+     POI filesystem. The notation <em>\005</em> in the document's name means
+     the character with the decimal value of 5. In order to read the title, an
+     application has to perform the following steps:</p>
+
+    <ol>
+     <li>
+      Open the document <em>\005SummaryInformation</em> located in the root
+       of the POI filesystem.
+     </li>
+     <li>
+      Create an instance of the class <code>SummaryInformation</code> from
+       that document.
+     </li>
+     <li>
+      Call the <code>SummaryInformation</code> instance's
+       <code>getTitle()</code> method.
+     </li>
+    </ol>
+
+    <p>Sounds easy, doesn't it? Here are the steps in detail.</p>
+
+
+    <section><title>Open the document \005SummaryInformation in the root of the
+       POI filesystem</title>
+
+     <p>An application that wants to open a document in a POI filesystem
+      (POIFS) proceeds as shown by the following code fragment. (The full
+      source code of the sample application is available in the
+      <em>examples</em> section of the POI source tree as
+      <em>ReadTitle.java</em>.</p>
+
+     <source>
+import java.io.*;
+import org.apache.poi.hpsf.*;
+import org.apache.poi.poifs.eventfilesystem.*;
+
+// ...
+
+public static void main(String[] args)
+    throws IOException
+{
+    final String filename = args[0];
+    POIFSReader r = new POIFSReader();
+    r.registerListener(new MyPOIFSReaderListener(),
+                       "\005SummaryInformation");
+    r.read(new FileInputStream(filename));
+}</source>
+
+     <p>The first interesting statement is</p>
+
+     <source>POIFSReader r = new POIFSReader();</source>
+
+     <p>It creates a
+      <code>org.apache.poi.poifs.eventfilesystem.POIFSReader</code> instance
+      which we shall need to read the POI filesystem. Before the application
+      actually opens the POI filesystem we have to tell the
+      <code>POIFSReader</code> which documents we are interested in. In this
+      case the application should do something with the document
+      <em>\005SummaryInformation</em>.</p>
+
+     <source>
+r.registerListener(new MyPOIFSReaderListener(),
+                   "\005SummaryInformation");</source>
+
+     <p>This method call registers a
+      <code>org.apache.poi.poifs.eventfilesystem.POIFSReaderListener</code>
+      with the <code>POIFSReader</code>. The <code>POIFSReaderListener</code>
+      interface specifies the method <code>processPOIFSReaderEvent</code>
+      which processes a document. The class
+      <code>MyPOIFSReaderListener</code> implements the
+      <code>POIFSReaderListener</code> and thus the
+      <code>processPOIFSReaderEvent</code> method. The eventing POI filesystem
+      calls this method when it finds the <em>\005SummaryInformation</em>
+      document. In the sample application <code>MyPOIFSReaderListener</code> is
+      a static class in the <em>ReadTitle.java</em> source file.</p>
+
+     <p>Now everything is prepared and reading the POI filesystem can
+      start:</p>
+
+     <source>r.read(new FileInputStream(filename));</source>
+
+     <p>The following source code fragment shows the
+      <code>MyPOIFSReaderListener</code> class and how it retrieves the
+      title.</p>
+
+     <source>
+static class MyPOIFSReaderListener implements POIFSReaderListener
+{
+    public void processPOIFSReaderEvent(POIFSReaderEvent event)
+    {
+        SummaryInformation si = null;
+        try
+        {
+            si = (SummaryInformation)
+                 PropertySetFactory.create(event.getStream());
+        }
+        catch (Exception ex)
+        {
+            throw new RuntimeException
+                ("Property set stream \"" +
+                 event.getPath() + event.getName() + "\": " + ex);
+        }
+        final String title = si.getTitle();
+        if (title != null)
+            System.out.println("Title: \"" + title + "\"");
+        else
+            System.out.println("Document has no title.");
+    }
+}
+</source>
+
+     <p>The line</p>
+
+     <source>SummaryInformation si = null;</source>
+
+     <p>declares a <code>SummaryInformation</code> variable and initializes it
+      with <code>null</code>. We need an instance of this class to access the
+      title. The instance is created in a <code>try</code> block:</p>
+
+     <source>si = (SummaryInformation)
+                 PropertySetFactory.create(event.getStream());</source>
+
+     <p>The expression <code>event.getStream()</code> returns the input stream
+      containing the bytes of the property set stream named
+      <em>\005SummaryInformation</em>. This stream is passed into the
+      <code>create</code> method of the factory class
+      <code>org.apache.poi.hpsf.PropertySetFactory</code> which returns
+      a <code>org.apache.poi.hpsf.PropertySet</code> instance. It is more or
+      less safe to cast this result to <code>SummaryInformation</code>, a
+      convenience class with methods like <code>getTitle()</code>,
+      <code>getAuthor()</code> etc.</p>
+
+     <p>The <code>PropertySetFactory.create</code> method may throw all sorts
+      of exceptions. We'll deal with them in the next sections. For now we just
+      catch all exceptions and throw a <code>RuntimeException</code>
+      containing the message text of the origin exception.</p>
+
+     <p>If all goes well, the sample application retrieves the title and prints
+     it to the standard output. As you can see you must be prepared for the
+      case that the POI filesystem does not have a title.</p>
+
+     <source>final String title = si.getTitle();
+if (title != null)
+    System.out.println("Title: \"" + title + "\"");
+else
+    System.out.println("Document has no title.");</source>
+
+     <p>Please note that a Microsoft Office document does not necessarily
+      contain the <em>\005SummaryInformation</em> stream. The documents created
+      by the Microsoft Office suite have one, as far as I know. However, an
+      Excel spreadsheet exported from StarOffice 5.2 won't have a
+      <em>\005SummaryInformation</em> stream. In this case the applications
+      won't throw an exception but simply does not call the
+      <code>processPOIFSReaderEvent</code> method. You have been warned!</p>
+    </section>
+   </section>
+
+   <anchor id="sec2"/>
+   <section><title>Additional Standard Properties, Exceptions And Embedded Objects</title>
+
+    <note>This section focusses on reading additional standard properties. It
+     also talks about exceptions that may be thrown when dealing with HPSF and
+     shows how you can read properties of embedded objects.</note>
+
+    <p>A couple of <strong>additional standard properties</strong> are not
+     contained in the <em>\005SummaryInformation</em> stream explained above,
+     for example a document's category or the number of multimedia clips in a
+     PowerPoint presentation. Microsoft has invented an additional stream named
+     <em>\005DocumentSummaryInformation</em> to hold these properties. With two
+     minor exceptions you can proceed exactly as described above to read the
+     properties stored in <em>\005DocumentSummaryInformation</em>:</p>
+
+    <ul>
+     <li>Instead of <em>\005SummaryInformation</em> use
+       <em>\005DocumentSummaryInformation</em> as the stream's name.</li>
+     <li>Replace all occurrences of the class
+       <code>SummaryInformation</code> by
+       <code>DocumentSummaryInformation</code>.</li>
+    </ul>
+
+    <p>And of course you cannot call <code>getTitle()</code> because
+     <code>DocumentSummaryInformation</code> has different query methods. See
+     the Javadoc API documentation for the details!</p>
+
+    <p>In the previous section the application simply caught all
+     <strong>exceptions</strong> and was in no way interested in any
+     details. However, a real application will likely want to know what went
+     wrong and act appropriately. Besides any IO exceptions there are three
+     HPSF resp. POI specific exceptions you should know about:</p>
+
+    <dl>
+     <dt><code>NoPropertySetStreamException</code>:</dt>
+     <dd>
+      This exception is thrown if the application tries to create a
+       <code>PropertySet</code> instance from a stream that is not a
+       property set stream. (<code>SummaryInformation</code> and
+       <code>DocumentSummaryInformation</code> are subclasses of
+       <code>PropertySet</code>.) A faulty property set stream counts as not
+       being a property set stream at all. An application should be prepared to
+       deal with this case even if it opens streams named
+       <em>\005SummaryInformation</em> or
+       <em>\005DocumentSummaryInformation</em> only. These are just names. A
+       stream's name by itself does not ensure that the stream contains the
+       expected contents and that this contents is correct.
+     </dd>
+
+     <dt><code>UnexpectedPropertySetTypeException</code></dt>
+     <dd>This exception is thrown if a certain type of property set is
+       expected somewhere (e.g. a <code>SummaryInformation</code> or
+       <code>DocumentSummaryInformation</code>) but the provided property
+       set is not of that type.</dd>
+
+     <dt><code>MarkUnsupportedException</code></dt>
+     <dd>This exception is thrown if an input stream that is to be parsed
+       into a property set does not support the
+       <code>InputStream.mark(int)</code> operation. The POI filesystem uses
+       the <code>DocumentInputStream</code> class which does support this
+       operation, so you are safe here. However, if you read a property set
+       stream from another kind of input stream things may be
+       different.</dd>
+    </dl>
+
+    <p>Many Microsoft Office documents contain <strong>embedded
+      objects</strong>, for example an Excel sheet on a page in a Word
+     document. Embedded objects may have property sets of their own. An
+     application can open these property set streams as described above. The
+     only difference is that they are not located in the POI filesystem's root
+     but in a <strong>nested directory</strong> instead. Just register a
+     <code>POIFSReaderListener</code> for the property set streams you are
+     interested in. For example, the <em>POIBrowser</em> application in the
+     contrib section tries to open each and every document in a POI filesystem
+     as a property set stream. If this operation was successful it displays the
+     properties.</p>
+   </section>
+
+   <anchor id="sec3"/>
+   <section><title>Reading Non-Standard Properties</title>
+
+    <note>This section tells how to read non-standard properties. Non-standard
+     properties are application-specific ID/type/value triples.</note>
+
+    <section><title>Overview</title>
+     <p>Now comes the real hardcode stuff. As mentioned above,
+      <code>SummaryInformation</code> and
+      <code>DocumentSummaryInformation</code> are just special cases of the
+      general concept of a property set. This concept says that a
+      <strong>property set</strong> consists of properties and that each
+      <strong>property</strong> is an entity with an <strong>ID</strong>, a
+      <strong>type</strong>, and a <strong>value</strong>.</p>
+
+     <p>Okay, that was still rather easy. However, to make things more
+      complicated, Microsoft in its infinite wisdom decided that a property set
+      shalt be broken into one or more <strong>sections</strong>. Each section
+      holds a bunch of properties. But since that's still not complicated
+      enough, a section may have an optional <strong>dictionary</strong> that
+      maps property IDs to <strong>property names</strong> - we'll explain
+      later what that means.</p>
+
+     <p>The procedure to get to the properties is the following:</p>
+
+     <ol>
+      <li>Use the <strong><code>PropertySetFactory</code></strong> class to
+       create a <code>PropertySet</code> object from a property set stream. If
+       you don't know whether an input stream is a property set stream, just
+       try to call <code>PropertySetFactory.create(java.io.InputStream)</code>:
+       You'll either get a <code>PropertySet</code> instance returned or an
+       exception is thrown.</li>
+
+      <li>Call the <code>PropertySet</code>'s method <code>getSections()</code>
+       to get the sections contained in the property set. Each section is
+       an instance of the <code>Section</code> class.</li>
+
+      <li>Each section has a format ID. The format ID of the first section in a
+       property set determines the property set's type. For example, the first
+       (and only) section of the SummaryInformation property set has a format
+       ID of <code>F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9</code>. You can
+       get the format ID with <code>Section.getFormatID()</code>.</li>
+
+      <li>The properties contained in a <code>Section</code> can be retrieved
+       with <code>Section.getProperties()</code>. The result is an array of
+       <code>Property</code> instances.</li>
+
+      <li>A property has a name, a type, and a value. The <code>Property</code>
+       class has methods to retrieve them.</li>
+     </ol>
+    </section>
+
+    <section><title>A Sample Application</title>
+     <p>Let's have a look at a sample Java application that dumps all property
+      set streams contained in a POI file system. The full source code of this
+      program can be found as <em>ReadCustomPropertySets.java</em> in the
+      <em>examples</em> area of the POI source code tree. Here are the key
+      sections:</p>
+
+    <source>import java.io.*;
+import java.util.*;
+import org.apache.poi.hpsf.*;
+import org.apache.poi.poifs.eventfilesystem.*;
+import org.apache.poi.util.HexDump;</source>
+
+    <p>The most important package the application needs is
+     <code>org.apache.poi.hpsf.*</code>. This package contains the HPSF
+     classes. Most classes named below are from the HPSF package. Of course we
+     also need the POIFS event file system's classes and <code>java.io.*</code>
+     since we are dealing with POI I/O. From the <code>java.util</code> package
+     we use the <code>List</code> and <code>Iterator</code> class. The class
+     <code>org.apache.poi.util.HexDump</code> provides a methods to dump byte
+     arrays as nicely formatted strings.</p>
+
+    <source>public static void main(String[] args)
+    throws IOException
+{
+    final String filename = args[0];
+    POIFSReader r = new POIFSReader();
+
+    /* Register a listener for *all* documents. */
+    r.registerListener(new MyPOIFSReaderListener());
+    r.read(new FileInputStream(filename));
+}</source>
+
+    <p>The <code>POIFSReader</code> is set up in a way that the listener
+     <code>MyPOIFSReaderListener</code> is called on every file in the POI file
+    system.</p>
+    </section>
+
+    <section><title>The Property Set</title>
+     <p>The listener class tries to create a <code>PropertySet</code> from each
+     stream using the <code>PropertySetFactory.create()</code> method:</p>
+
+    <source>static class MyPOIFSReaderListener implements POIFSReaderListener
+{
+    public void processPOIFSReaderEvent(POIFSReaderEvent event)
+    {
+        PropertySet ps = null;
+        try
+        {
+            ps = PropertySetFactory.create(event.getStream());
+        }
+        catch (NoPropertySetStreamException ex)
+        {
+            out("No property set stream: \"" + event.getPath() +
+                event.getName() + "\"");
+            return;
+        }
+        catch (Exception ex)
+        {
+            throw new RuntimeException
+                ("Property set stream \"" +
+                 event.getPath() + event.getName() + "\": " + ex);
+        }
+
+        /* Print the name of the property set stream: */
+        out("Property set stream \"" + event.getPath() +
+            event.getName() + "\":");</source>
+
+    <p>Creating the <code>PropertySet</code> is done in a <code>try</code>
+     block, because not each stream in the POI file system contains a property
+     set. If it is some other file, the
+     <code>PropertySetFactory.create()</code> throws a
+     <code>NoPropertySetStreamException</code>, which is caught and
+     logged. Then the program continues with the next stream. However, all
+     other types of exceptions cause the program to terminate by throwing a
+     runtime exception. If all went well, we can print the name of the property
+     set stream.</p>
+    </section>
+
+    <section><title>The Sections</title>
+     <p>The next step is to print the number of sections followed by the
+     sections themselves:</p>
+
+    <source>/* Print the number of sections: */
+final long sectionCount = ps.getSectionCount();
+out("   No. of sections: " + sectionCount);
+
+/* Print the list of sections: */
+List sections = ps.getSections();
+int nr = 0;
+for (Iterator i = sections.iterator(); i.hasNext();)
+{
+    /* Print a single section: */
+    Section sec = (Section) i.next();
+
+    // See below for the complete loop body.
+}</source>
+
+     <p>The <code>PropertySet</code>'s method <code>getSectionCount()</code>
+      returns the number of sections.</p>
+
+     <p>To retrieve the sections, use the <code>getSections()</code>
+      method. This method returns a <code>java.util.List</code> containing
+      instances of the <code>Section</code> class in their proper order.</p>
+
+     <p>The sample code shows a loop that retrieves the <code>Section</code>
+      objects one by one and prints some information about each one. Here is
+      the complete body of the loop:</p>
+
+     <source>/* Print a single section: */
+Section sec = (Section) i.next();
+out("   Section " + nr++ + ":");
+String s = hex(sec.getFormatID().getBytes());
+s = s.substring(0, s.length() - 1);
+out("      Format ID: " + s);
+
+/* Print the number of properties in this section. */
+int propertyCount = sec.getPropertyCount();
+out("      No. of properties: " + propertyCount);
+
+/* Print the properties: */
+Property[] properties = sec.getProperties();
+for (int i2 = 0; i2 &lt; properties.length; i2++)
+{
+    /* Print a single property: */
+    Property p = properties[i2];
+    int id = p.getID();
+    long type = p.getType();
+    Object value = p.getValue();
+    out("      Property ID: " + id + ", type: " + type +
+        ", value: " + value);
+}</source>
+    </section>
+
+    <section><title>The Section's Format ID</title>
+     <p>The first method called on the <code>Section</code> instance is
+      <code>getFormatID()</code>. As explained above, the format ID of the
+      first section in a property set determines the type of the property
+      set. Its type is <code>ClassID</code> which is essentially a sequence of
+      16 bytes. A real application using its own type of a custom property set
+      should have defined a unique format ID and, when reading a property set
+      stream, should check the format ID is equal to that unique format ID. The
+      sample program just prints the format ID it finds in a section:</p>
+
+     <source>String s = hex(sec.getFormatID().getBytes());
+s = s.substring(0, s.length() - 1);
+out("      Format ID: " + s);</source>
+
+     <p>As you can see, the <code>getFormatID()</code> method returns a
+      <code>ClassID</code> object. An array containing the bytes can be
+      retrieved with <code>ClassID.getBytes()</code>. In order to get a nicely
+      formatted printout, the sample program uses the <code>hex()</code> helper
+      method which in turn uses the POI utility class <code>HexDump</code> in
+      the <code>org.apache.poi.util</code> package. Another helper method is
+      <code>out()</code> which just saves typing
+      <code>System.out.println()</code>.</p>
+    </section>
+
+    <section><title>The Properties</title>
+     <p>Before getting the properties, it is possible to find out how many
+      properties are available in the section via the
+      <code>Section.getPropertyCount()</code>. The sample application uses this
+      method to print the number of properties to the standard output:</p>
+
+     <source>int propertyCount = sec.getPropertyCount();
+out("      No. of properties: " + propertyCount);</source>
+
+     <p>Now its time to get to the properties themselves. You can retrieve a
+      section's properties with the method
+      <code>Section.getProperties()</code>:</p>
+
+     <source>Property[] properties = sec.getProperties();</source>
+
+     <p>As you can see the result is an array of <code>Property</code>
+      objects. This class has three methods to retrieve a property's ID, its
+      type, and its value. The following code snippet shows how to call
+      them:</p>
+
+     <source>for (int i2 = 0; i2 &lt; properties.length; i2++)
+{
+    /* Print a single property: */
+    Property p = properties[i2];
+    int id = p.getID();
+    long type = p.getType();
+    Object value = p.getValue();
+    out("      Property ID: " + id + ", type: " + type +
+        ", value: " + value);
+}</source>
+    </section>
+
+    <section><title>Sample Output</title>
+     <p>The output of the sample program might look like the following. It
+      shows the summary information and the document summary information
+      property sets of a Microsoft Word document. However, unlike the first and
+      second section of this HOW-TO the application does not have any code
+      which is specific to the <code>SummaryInformation</code> and
+      <code>DocumentSummaryInformation</code> classes.</p>
+
+     <source>Property set stream "/SummaryInformation":
+   No. of sections: 1
+   Section 0:
+      Format ID: 00000000 F2 9F 85 E0 4F F9 10 68 AB 91 08 00 2B 27 B3 D9 ....O..h....+'..
+      No. of properties: 17
+      Property ID: 1, type: 2, value: 1252
+      Property ID: 2, type: 30, value: Titel
+      Property ID: 3, type: 30, value: Thema
+      Property ID: 4, type: 30, value: Rainer Klute (Autor)
+      Property ID: 5, type: 30, value: Test (Stichw�rter)
+      Property ID: 6, type: 30, value: This is a document for testing HPSF
+      Property ID: 7, type: 30, value: Normal.dot
+      Property ID: 8, type: 30, value: Unknown User
+      Property ID: 9, type: 30, value: 3
+      Property ID: 18, type: 30, value: Microsoft Word 9.0
+      Property ID: 12, type: 64, value: Mon Jan 01 00:59:25 CET 1601
+      Property ID: 13, type: 64, value: Thu Jul 18 16:22:00 CEST 2002
+      Property ID: 14, type: 3, value: 1
+      Property ID: 15, type: 3, value: 20
+      Property ID: 16, type: 3, value: 93
+      Property ID: 19, type: 3, value: 0
+      Property ID: 17, type: 71, value: [B@13582d
+Property set stream "/DocumentSummaryInformation":
+   No. of sections: 2
+   Section 0:
+      Format ID: 00000000 D5 CD D5 02 2E 9C 10 1B 93 97 08 00 2B 2C F9 AE ............+,..
+      No. of properties: 14
+      Property ID: 1, type: 2, value: 1252
+      Property ID: 2, type: 30, value: Test
+      Property ID: 14, type: 30, value: Rainer Klute (Manager)
+      Property ID: 15, type: 30, value: Rainer Klute IT-Consulting GmbH
+      Property ID: 5, type: 3, value: 3
+      Property ID: 6, type: 3, value: 2
+      Property ID: 17, type: 3, value: 111
+      Property ID: 23, type: 3, value: 592636
+      Property ID: 11, type: 11, value: false
+      Property ID: 16, type: 11, value: false
+      Property ID: 19, type: 11, value: false
+      Property ID: 22, type: 11, value: false
+      Property ID: 13, type: 4126, value: [B@56a499
+      Property ID: 12, type: 4108, value: [B@506411
+   Section 1:
+      Format ID: 00000000 D5 CD D5 05 2E 9C 10 1B 93 97 08 00 2B 2C F9 AE ............+,..
+      No. of properties: 7
+      Property ID: 0, type: 0, value: {6=Test-JaNein, 5=Test-Zahl, 4=Test-Datum, 3=Test-Text, 2=_PID_LINKBASE}
+      Property ID: 1, type: 2, value: 1252
+      Property ID: 2, type: 65, value: [B@c9ba38
+      Property ID: 3, type: 30, value: This is some text.
+      Property ID: 4, type: 64, value: Wed Jul 17 00:00:00 CEST 2002
+      Property ID: 5, type: 3, value: 27
+      Property ID: 6, type: 11, value: true
+No property set stream: "/WordDocument"
+No property set stream: "/CompObj"
+No property set stream: "/1Table"</source>
+
+     <p>There are some interesting items to note:</p>
+
+     <ul>
+      <li>The first property set (summary information) consists of a single
+       section, the second property set (document summary information) consists
+       of two sections.</li>
+
+      <li>Each section type (identified by its format ID) has its own domain of
+       property ID. For example, in the second property set the properties with
+       ID 2 have different meanings in the two section. By the way, the format
+       IDs of these sections are <strong>not</strong> equal, but you have to
+       look hard to find the difference.</li>
+
+      <li>The properties are not in any particular order in the section,
+       although they slightly tend to be sorted by their IDs.</li>
+     </ul>
+    </section>
+
+    <section><title>Property IDs</title>
+     <p>Properties in the same section are distinguished by their IDs. This is
+      similar to variables in a programming language like Java, which are
+      distinguished by their names. But unlike variable names, property IDs are
+      simple integral numbers. There is another similarity, however. Just like
+      a Java variable has a certain scope (e.g. a member variables in a class),
+      a property ID also has its scope of validity: the section.</p>
+
+     <p>Two property IDs in sections with different section format IDs
+      don't have the same meaning even though their IDs might be equal. For
+      example, ID 4 in the first (and only) section of a summary
+      information property set denotes the document's author, while ID 4 in the
+      first section of the document summary information property set means the
+      document's byte count. The sample output above does not show a property
+      with an ID of 4 in the first section of the document summary information
+      property set. That means that the document does not have a byte
+      count. However, there is a property with an ID of 4 in the
+      <em>second</em> section: This is a user-defined property ID - we'll get
+      to that topic in a minute.</p>
+
+     <p>So, how can you find out what the meaning of a certain property ID in
+      the summary information and the document summary information property set
+      is? The standard property sets as such don't have any hints about the
+      <strong>meanings of their property IDs</strong>. For example, the summary
+      information property set does not tell you that the property ID 4 stands
+      for the document's author. This is external knowledge. Microsoft defined
+      standard meanings for some of the property IDs in the summary information
+      and the document summary information property sets. As a help to the Java
+      and POI programmer, the class <code>PropertyIDMap</code> in the
+      <code>org.apache.poi.hpsf.wellknown</code> package defines constants
+      for the "well-known" property IDs. For example, there is the
+      definition</p>
+
+     <source>public final static int PID_AUTHOR = 4;</source>
+
+     <p>These definitions allow you to use symbolic names instead of
+      numbers.</p>
+
+     <p>In order to provide support for the other way, too, - i.e. to map
+      property IDs to property names - the class <code>PropertyIDMap</code>
+      defines two static methods:
+      <code>getSummaryInformationProperties()</code> and
+      <code>getDocumentSummaryInformationProperties()</code>. Both return
+      <code>java.util.Map</code> objects which map property IDs to
+      strings. Such a string gives a hint about the property's meaning. For
+      example,
+      <code>PropertyIDMap.getSummaryInformationProperties().get(4)</code>
+      returns the string "PID_AUTHOR". An application could use this string as
+      a key to a localized string which is displayed to the user, e.g. "Author"
+      in English or "Verfasser" in German. HPSF might provide such
+      language-dependend ("localized") mappings in a later release.</p>
+
+     <p>Usually you won't have to deal with those two maps. Instead you should
+      call the <code>Section.getPIDString(int)</code> method. It returns the
+      string associated with the specified property ID in the context of the
+      <code>Section</code> object.</p>
+
+     <p>Above you learned that property IDs have a meaning in the scope of a
+      section only. However, there are two exceptions to the rule: The property
+      IDs 0 and 1 have a fixed meaning in <strong>all</strong> sections:</p>
+
+     <table>
+      <tr>
+       <th>Property ID</th>
+       <th>Meaning</th>
+      </tr>
+
+      <tr>
+       <td>0</td>
+       <td>The property's value is a <strong>dictionary</strong>, i.e. a
+	mapping from property IDs to strings.</td>
+      </tr>
+
+      <tr>
+       <td>1</td>
+       <td>The property's value is the number of a <strong>codepage</strong>,
+	i.e. a mapping from character codes to characters. All strings in the
+	section containing this property must be interpreted using this
+	codepage. Typical property values are 1252 (8-bit "western" characters)
+	or 1200 (16-bit Unicode characters).</td>
+      </tr>
+     </table>
+    </section>
+
+    <section><title>Property types</title>
+     <p>A property is nothing without its value. It is stored in a property set
+      stream as a sequence of bytes. You must know the property's
+      <strong>type</strong> in order to properly interpret those bytes and
+      reasonably handle the value. A property's type is one of the so-called
+      Microsoft-defined <strong>"variant types"</strong>. When you call
+      <code>Property.getType()</code> you'll get a <code>long</code> value
+      which denoting the property's variant type. The class
+      <code>Variant</code> in the <code>org.apache.poi.hpsf</code> package
+      holds most of those <code>long</code> values as named constants. For
+      example, the constant <code>VT_I4 = 3</code> means a signed integer value
+      of four bytes. Examples of other types are <code>VT_LPSTR = 30</code>
+      meaning a null-terminated string of 8-bit characters, <code>VT_LPWSTR =
+       31</code> which means a null-terminated Unicode string, or <code>VT_BOOL
+       = 11</code> denoting a boolean value.</p>
+
+     <p>In most cases you won't need a property's type because HPSF does all
+      the work for you.</p>
+    </section>
+
+    <section><title>Property values</title>
+     <p>When an application wants to retrieve a property's value and calls
+      <code>Property.getValue()</code>, HPSF has to interpret the bytes making
+      out the value according to the property's type. The type determines how
+      many bytes the value consists of and what
+      to do with them. For example, if the type is <code>VT_I4</code>, HPSF
+      knows that the value is four bytes long and that these bytes
+      comprise a signed integer value in the little-endian format. This is
+      quite different from e.g. a type of <code>VT_LPWSTR</code>. In this case
+      HPSF has to scan the value bytes for a Unicode null character and collect
+      everything from the beginning to that null character as a Unicode
+      string.</p>
+
+     <p>The good new is that HPSF does another job for you, too: It maps the
+      variant type to an adequate Java type.</p>
+
+     <table>
+      <tr>
+       <th>Variant type:</th>
+       <th>Java type:</th>
+      </tr>
+
+      <tr>
+       <td>VT_I2</td>
+       <td>java.lang.Integer</td>
+      </tr>
+
+      <tr>
+       <td>VT_I4</td>
+       <td>java.lang.Long</td>
+      </tr>
+
+      <tr>
+       <td>VT_FILETIME</td>
+       <td>java.util.Date</td>
+      </tr>
+
+      <tr>
+       <td>VT_LPSTR</td>
+       <td>java.lang.String</td>
+      </tr>
+
+      <tr>
+       <td>VT_LPWSTR</td>
+       <td>java.lang.String</td>
+      </tr>
+
+      <tr>
+       <td>VT_CF</td>
+       <td>byte[]</td>
+      </tr>
+
+      <tr>
+       <td>VT_BOOL</td>
+       <td>java.lang.Boolean</td>
+      </tr>
+
+     </table>
+
+     <p>The bad news is that there are still a couple of variant types HPSF
+      does not yet support. If it encounters one of these types it
+      returns the property's value as a byte array and leaves it to be
+      interpreted by the application.</p>
+
+     <p>An application retrieves a property's value by calling the
+      <code>Property.getValue()</code> method. This method's return type is the
+      abstract <code>Object</code> class. The <code>getValue()</code> method
+      looks up the property's variant type, reads the property's value bytes,
+      creates an instance of an adequate Java type, assigns it the property's
+      value and returns it. Primitive types like <code>int</code> or
+      <code>long</code> will be returned as the corresponding class,
+      e.g. <code>Integer</code> or <code>Long</code>.</p>
+    </section>
+
+
+    <section><title>Dictionaries</title>
+     <p>The property with ID 0 has a very special meaning: It is a
+      <strong>dictionary</strong> mapping property IDs to property names. We
+      have seen already that the meanings of standard properties in the
+      summary information and the document summary information property sets
+      have been defined by Microsoft. The advantage is that the labels of
+      properties like "Author" or "Title" don't have to be stored in the
+      property set. However, a user can define custom fields in, say, Microsoft
+      Word. For each field the user has to specify a name, a type, and a
+      value.</p>
+
+     <p>The names of the custom-defined fields (i.e. the property names) are
+      stored in the document summary information second section's
+      <strong>dictionary</strong>. The dictionary is a map which associates
+      property IDs with property names.</p>
+
+     <p>The method <code>Section.getPIDString(int)</code> not only returns with
+      the well-known property names of the summary information and document
+      summary information property sets, but with self-defined properties,
+      too. It should also work with self-defined properties in self-defined
+      sections.</p>
+    </section>
+
+    <section><title>Codepage support</title>
+     <fixme author="Rainer Klute">Improve codepage support!</fixme>
+
+     <p>The property with ID 1 holds the number of the codepage which was used
+      to encode the strings in this section. The present HPSF codepage support
+      is still very limited: When reading property value strings, HPSF
+      distinguishes between 16-bit characters and 8-bit characters. 16-bit
+      characters should be Unicode characters and thus be okay. 8-bit
+      characters are interpreted according to the platform's default character
+      set. This is fine as long as the document being read has been written on
+      a platform with the same default character set. However, if you receive a
+      document from another region of the world and want to process it with
+      HPSF you are in trouble - unless the creator used Unicode, of course.</p>
+    </section>
+
+    <section><title>Further Reading</title>
+     <p>There are still some aspects of HSPF left which are not covered by this
+      HOW-TO. You should dig into the Javadoc API documentation to learn
+      further details. Since you've struggled through this document up to this
+      point, you are well prepared.</p>
+    </section>
+   </section>
+  </section>
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
diff --git a/src/documentation/content/xdocs/hpsf/index.xml b/src/documentation/content/xdocs/hpsf/index.xml
new file mode 100644
index 0000000000..b0e52c86d8
--- /dev/null
+++ b/src/documentation/content/xdocs/hpsf/index.xml
@@ -0,0 +1,54 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
+<!-- $Id$ -->
+
+<document>
+ <header>
+  <title>HPSF (Horrible Property Set Format)</title>
+  <subtitle>Overview</subtitle>
+  <authors>     
+   <person name="Rainer Klute" email="klute@apache.org"/>
+  </authors>
+ </header>
+ <body>
+  <section><title>Overview</title>
+   <p>Microsoft applications like "Word", "Excel" or "Powerpoint" let the user
+    describe his document by properties like "title", "category" and so on. The
+    application itself adds further information: last author, creation date
+    etc. These document properties are stored in so-called <strong>property set
+     streams</strong>. A property set stream is a separate document within a
+    <link href="../poifs/index.html">POI filesystem</link>. We'll call property
+    set streams mostly just "property sets". HPSF is POI's pure-Java
+    implementation to read (and in future to write) property sets.</p>
+
+   <p>The <link href="how-to.html">HPSF HOWTO</link> describes what a Java
+    application should do to read a property set using HPSF and to retrieve the
+    information it needs.</p>
+
+   <p>HPSF supports OLE2 property set streams in general, and is not limited to
+    the special case of document properties in the Microsoft Office files
+    mentioned above. The <link href="internals.html">HPSF description</link>
+    describes the internal structure of property set streams. A separate
+    document explains the internal of <link href="thumbnails.html">thumbnail
+     images</link>.</p>
+  </section>
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
diff --git a/src/documentation/content/xdocs/hpsf/internals.xml b/src/documentation/content/xdocs/hpsf/internals.xml
new file mode 100644
index 0000000000..b4792dcd33
--- /dev/null
+++ b/src/documentation/content/xdocs/hpsf/internals.xml
@@ -0,0 +1,1010 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
+<!-- $Id$ -->
+
+<document>
+  <header>
+    <title>HPSF Internals: The Horrible Property Set Format</title>
+    <authors>
+      <person name="Rainer Klute" email="klute@rainer-klute.de"/>
+    </authors>
+  </header>
+  <body>
+    <section><title>HPSF Internals</title>
+
+    <section><title>Introduction</title>
+
+    <p>A Microsoft Office document is internally organized like a filesystem
+     with directory and files. Microsoft calls these files
+     <strong>streams</strong>. A document can have properties attached to it,
+     like author, title, number of words etc. These metadata are not  stored in
+     the main stream of, say, a Word document, but instead in a dedicated
+     stream with a special format. Usually this stream's name is
+     <code>\005SummaryInformation</code>, where <code>\005</code> represents
+     the character with a decimal value of 5.</p>
+
+    <p>A single piece of information in the stream is called a
+     <strong>property</strong>, for example the document title. Each property
+     has an integral <strong>ID</strong> (e.g. 2 for title), a
+     <strong>type</strong> (telling that the title is a string of bytes) and a
+     <strong>value</strong> (what this is should be obvious). A stream
+     containing properties is called a
+     <strong>property set stream</strong>.</p>
+
+    <p>This document describes the internal structure of a property set stream,
+     i.e. the <strong>Horrible Property Set Format (HDF)</strong>.  It does not
+     describe how a Microsoft Office document is organized internally and how
+     to retrieve a stream from it. See the <link
+      href="../poifs/index.html">POIFS documentation</link> for that kind of
+     stuff.</p>
+
+    <p>The Horrible Property Set Format is not only used in the Summary
+     Information stream in the top-level document of a Microsoft Office
+     document. Often there is also a property set stream named
+     <code>\005DocumentSummaryInformation</code> with additional properties.
+     Embedded documents may have their own property set streams. You cannot
+     tell by a stream's name whether it is a property set stream or not.
+     Instead you have to open the stream and look at its bytes.</p>
+   </section>
+
+
+
+    <section><title>Data Types</title>
+
+    <p>Before delving into the details of the property set stream format we
+     have to have a short look at data types. Integral values are stored in the
+     so-called <strong>little endian</strong> format. In this format the bytes
+     that make out an integral value are stored in the "wrong" order. For
+     example, the decimal value 4660 is 0x1234 in the hexadecimal notation. If
+     you think this should be represented by a byte 0x12 followed by another
+     byte 0x34, you are right. This is called the <strong>big endian</strong>
+     format. In the little endian format, however, this order is reversed and
+     the low-value byte comes first: 0x3412.
+    </p>
+
+    <p>The following table gives an overview about some important data
+     types:</p>
+
+    <table>
+
+     <tr>
+      <th>Name</th>
+      <th>Length</th>
+      <th>Example (Little Endian)</th>
+      <th>Example (Big Endian)</th>
+     </tr>
+
+     <tr>
+      <td><strong>Bytes</strong></td>
+      <td>1 byte</td>
+      <td><code>0x12</code></td>
+      <td><code>0x12</code></td>
+     </tr>
+
+     <tr>
+      <td><strong>Word</strong></td>
+      <td>2 bytes</td>
+      <td><code>0x1234</code></td>
+      <td><code>0x3412</code></td>
+     </tr>
+
+     <tr>
+      <td><strong>DWord</strong></td>
+      <td>4 bytes</td>
+      <td><code>0x12345678</code></td>
+      <td><code>0x78563412</code></td>
+     </tr>
+
+     <tr>
+      <td><strong>ClassID</strong><br/>
+       A sequence of one DWord, two Words and eight Bytes</td>
+
+      <td>16 bytes</td>
+
+      <td><code>0xE0859FF2F94F6810AB9108002B27B3D9</code> resp.
+	<code>E0859FF2-F94F-6810-AB-91-08-00-2B-27-B3-D9</code></td>
+
+      <td><code>0xF29F85E04FF91068AB9108002B27B3D9</code> resp.
+	<code>F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9</code></td>
+     </tr>
+
+     <tr>
+      <td></td>
+      <td></td>
+      <td>The ClassID examples are given here in two different notations. The
+	second notation without the "0x" at the beginning and with dashes
+	inside shows the internal grouping into one DWord, two Words and eight
+	Bytes.</td>
+      <td><em>Watch out:</em> Microsoft documentation and tools show class IDs
+	a little bit differently like
+	<code>F29F85E0-4FF9-1068-AB91-08002B27B3D9</code>.
+	However, that representation is (intentionally?) misleading with
+	respect to endianess.</td>
+     </tr>
+    </table>
+   </section>
+
+
+
+   <section><title>HPSF Overview</title>
+
+    <p>A property set stream consists of three main parts:</p>
+
+    <ol>
+     <li>The <strong>header</strong> and</li>
+     <li>the <strong>section(s)</strong> containing the properties.</li>
+    </ol>
+   </section>
+
+
+
+   <section><title>The Header</title>
+
+    <p>The first bytes in a property set stream is the <strong>header</strong>.
+     It has a fixed length and looks like this:</p>
+
+    <table>
+      <tr>
+       <th>Offset</th>
+       <th>Type</th>
+       <th>Contents</th>
+       <th>Remarks</th>
+      </tr>
+
+      <tr>
+       <td>0</td>
+       <td>Word</td>
+       <td><code>0xFFFE</code></td>
+       <td>If the first four bytes of a stream do not contain these values, the
+	 stream is not a property set stream.</td>
+      </tr>
+
+      <tr>
+       <td>2</td>
+       <td>Word</td>
+       <td><code>0x0000</code></td>
+      <td></td>
+      </tr>
+
+      <tr>
+       <td>4</td>
+       <td>DWord</td>
+       <td>Denotes the operating system and the OS version under which this
+	 stream was created. The operating system ID is in the DWord's higher
+	 word (after little endian decoding): <code>0x0000</code> for Win16,
+	 <code>0x0001</code> for Macintosh and <code>0x0002</code> for Win32 - that's
+	 all. The reader is most likely aware of the fact that there are some
+	 more operating systems. However, Microsoft does not seem to know.</td>
+      <td></td>
+      </tr>
+
+      <tr>
+       <td>8</td>
+       <td>ClassID</td>
+       <td><code>0x00000000000000000000000000000000</code></td>
+       <td>Most property set streams have this value but this is not
+	 required.</td>
+      </tr>
+
+      <tr>
+       <td>24</td>
+       <td>DWord</td>
+       <td><code>0x01000000</code> or greater</td>
+       <td>Section count. This field's value should be equal to 1 or greater.
+	 Microsoft claims that this is a "reserved" field, but it seems to tell
+	 how many sections (see below) are following in the stream. This would
+	 really make sense because otherwise you could not know where and how
+	 far you should read section data.</td>
+     </tr>
+    </table>
+   </section>
+
+
+
+   <section><title>Section List</title>
+
+    <p>Following the header is the section list. This is an array of pairs each
+     consisting of a section format ID and an offset. This array has as many
+     pairs of ClassID and and DWord fields as the section count field in the
+     header says. The Summary Information stream contains a single section, the
+     Document Summary Information stream contains two.</p>
+
+    <table>
+      <tr>
+       <th>Type</th>
+       <th>Contents</th>
+       <th>Remarks</th>
+      </tr>
+
+      <tr>
+       <td>ClassID</td>
+       <td>Section format ID</td>
+       <td><code>0xF29F85E04FF91068AB9108002B27B3D9</code> for the single section
+	 in the Summary Information stream.<br/><br/>
+
+	<code>0xD5CDD5022E9C101B939708002B2CF9AE</code> for the first
+	 section in the Document Summary Information stream.</td>
+      </tr>
+
+      <tr>
+       <td>DWord</td>
+       <td>Offset</td>
+       <td>The number of bytes between the beginning of the stream and the
+	 beginning of the section within the stream.</td>
+      </tr>
+
+      <tr>
+       <td>ClassID</td>
+       <td>Section format ID</td>
+       <td>...</td>
+      </tr>
+
+      <tr>
+       <td>DWord</td>
+       <td>Offset</td>
+       <td>...</td>
+      </tr>
+
+      <tr>
+       <td>...</td>
+       <td>...</td>
+       <td>...</td>
+      </tr>
+    </table>
+   </section>
+
+
+
+   <section><title>Section</title>
+
+    <p>A section is divided into three parts: the section header (with the
+     section length and the number of properties in the section), the
+     properties list (with type and offset of each property), and the
+     properties themselves. Here are the details:</p>
+
+    <table>
+      <tr>
+       <th>&nbsp;</th>
+       <th>Type</th>
+       <th>Contents</th>
+       <th>Remarks</th>
+      </tr>
+
+      <tr>
+       <td>Section header</td>
+
+       <td>DWord</td>
+       <td>Length</td>
+       <td>The length of the section in bytes.</td>
+      </tr>
+
+     <tr>
+      <td></td>
+      <td>DWord</td>
+       <td>Property count</td>
+       <td>The number of properties in the section.</td>
+      </tr>
+
+      <tr>
+
+       <td>Properties list</td>
+
+       <td>DWord</td>
+       <td>Property ID</td>
+       <td>The property ID tells what the property means. For example, an ID of
+	 <code>0x0002</code> in the Summary Information stands for the document's
+	title. See the <link href="#property_ids">Property IDs</link>
+	 chapter below for more details.</td>
+      </tr>
+
+      <tr>
+      <td></td>
+       <td>DWord</td>
+       <td>Offset</td>
+       <td>The number of bytes between the beginning of the section and the
+	 property.</td>
+      </tr>
+
+      <tr>
+      <td></td>
+       <td>...</td>
+       <td>...</td>
+       <td>...</td>
+      </tr>
+
+      <tr>
+       <td>Properties</td>
+
+       <td>DWord</td>
+       <td>Property type ("variant")</td>
+       <td>This is the property's data type, e.g. an integer value,  a byte
+	 string or a Unicode string. See the
+	 <link href="#property_types"><em>Property Types</em></link> chapter for
+	 details!</td>
+      </tr>
+
+      <tr>
+      <td></td>
+       <td><em>Field length depends on the property type
+	  ("variant")</em></td>
+       <td>Property value</td>
+       <td>This field's length depends on the property's type. These are the
+	 bytes that make out the DWord, the byte string or some other data of
+	 fixed or variable length.<br/><br/>
+
+	    The  property value's length is always stored in an area which is a
+	 multiple of 4 in length. If the property is shorter, e.g. a byte
+	 string of 13 bytes, the remaining bytes are padded with <code>0x00</code>
+	 bytes.</td>
+      </tr>
+
+      <tr>
+      <td></td>
+       <td>...</td>
+       <td>...</td>
+       <td>...</td>
+      </tr>
+    </table>
+   </section>
+
+
+
+   <section><title>Property IDs</title>
+    <anchor id="property_ids"/>
+
+    <p>As seen above, a section holds a property list: an array with property
+     IDs and offsets. The property ID gives each property a meaning. For
+     example, in the Summary Information stream the property ID 2 says that
+     this property is the document's title.</p>
+
+    <p>If you want to know a property ID's meaning, it is not sufficient to
+     know the ID itself. You must also know the
+     <strong>section format ID</strong>.  For example, in the Document Summary
+     Information stream the property ID 2 means not the document's title but
+     its category. Due to Microsoft's infinite wisdom the section format ID is
+     not part of the section. Thus if you have only a section without the
+     stream it is in, you cannot make any sense of the properties because you
+     do not know what they mean.</p>
+
+    <p>So each section format ID has its own name space of property IDs.
+     Microsoft defined some "well-known" property IDs for the Summary
+     Information and the Document Summary Information streams. You can extend
+     them by your own additional IDs. This will be described below.</p>
+
+    <section><title>Property IDs in The Summary Information Stream</title>
+
+     <p>The Summary Information stream has a single section with a section
+      format ID of <code>0xF29F85E04FF91068AB9108002B27B3D9</code>. The following
+      table defines the meaning of its property IDs. Each row associates a
+      property ID with a <em>name</em> and an <em>ID string</em>. (The property
+      <em>type</em> is just for informational purposes given here. As we have
+      seen above, the type is always given along with the value.)</p>
+
+     <p>The property <em>name</em> is a readable string which could be
+      displayed to the user. However, this string is useful only for users who
+      understand English. The property name does not help with other
+      languages.</p>
+
+     <p>The property <em>ID string</em> is about the same but looks more
+      technically and is nothing a user should bother with. You could the ID
+      string and map it to an appropriate display string in a particular
+      language.  Of course you could do that with the property ID as well and
+      with less overhead, but people (including software developers) tend to be
+      better in remembering symbolic constants than remembering numbers.</p>
+
+     <table>
+      <tr>
+       <th>Property ID</th>
+       <th>Property Name</th>
+       <th>Property ID String</th>
+       <th>Property Type</th>
+      </tr>
+      <tr>
+       <td>2</td>
+       <td>Title</td>
+       <td>PID_TITLE</td>
+       <td>VT_LPSTR</td>
+      </tr>
+       <tr>
+	<td>3</td>
+	<td>Subject</td>
+	<td>PID_SUBJECT</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>4</td>
+	<td>Author</td>
+	<td>PID_AUTHOR</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>5</td>
+	<td>Keywords</td>
+	<td>PID_KEYWORDS</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>6</td>
+	<td>Comments</td>
+	<td>PID_COMMENTS</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>7</td>
+	<td>Template</td>
+	<td>PID_TEMPLATE</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>8</td>
+	<td>Last Saved By</td>
+	<td>PID_LASTAUTHOR</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>9</td>
+	<td>Revision Number</td>
+	<td>PID_REVNUMBER</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>10</td>
+	<td>Total Editing Time</td>
+	<td>PID_EDITTIME</td>
+	<td>VT_FILETIME</td>
+       </tr>
+       <tr>
+	<td>11</td>
+	<td>Last Printed</td>
+	<td>PID_LASTPRINTED</td>
+	<td>VT_FILETIME</td>
+       </tr>
+       <tr>
+	<td>12</td>
+	<td>Create Time/Date</td>
+	<td>PID_CREATE_DTM</td>
+	<td>VT_FILETIME</td>
+       </tr>
+       <tr>
+	<td>13</td>
+	<td>Last Saved Time/Date</td>
+	<td>PID_LASTSAVE_DTM</td>
+	<td>VT_FILETIME</td>
+       </tr>
+       <tr>
+	<td>14</td>
+	<td>Number of Pages</td>
+	<td>PID_PAGECOUNT</td>
+	<td>VT_I4</td>
+       </tr>
+       <tr>
+	<td>15</td>
+	<td>Number of Words</td>
+	<td>PID_WORDCOUNT</td>
+	<td>VT_I4</td>
+       </tr>
+       <tr>
+	<td>16</td>
+	<td>Number of Characters</td>
+	<td>PID_CHARCOUNT</td>
+	<td>VT_I4</td>
+       </tr>
+       <tr>
+	<td>17</td>
+	<td>Thumbnail</td>
+	<td>PID_THUMBNAIL</td>
+	<td>VT_CF</td>
+       </tr>
+       <tr>
+	<td>18</td>
+	<td>Name of Creating Application</td>
+	<td>PID_APPNAME</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>19</td>
+	<td>Security</td>
+	<td>PID_SECURITY</td>
+	<td>VT_I4</td>
+      </tr>
+     </table>
+    </section>
+
+
+
+    <section><title>Property IDs in The Document Summary Information Stream</title>
+
+     <p>The Document Summary Information stream has two sections with a section
+      format ID of <code>0xD5CDD5022E9C101B939708002B2CF9AE</code> for the first
+      one.  The following table defines the meaning of the property IDs in the
+      first section. See the preceeding section for interpreting the table.</p>
+
+     <table>
+       <tr>
+	<th>Property ID</th>
+	<th>Property name</th>
+	<th>Property ID string</th>
+	<th>VT type</th>
+       </tr>
+
+       <tr>
+	<td>0</td>
+	<td>Dictionary</td>
+	<td>PID_DICTIONARY</td>
+	<td>[Special format]</td>
+       </tr>
+       <tr>
+	<td>1</td>
+	<td>Code page</td>
+	<td>PID_CODEPAGE</td>
+	<td>VT_I2</td>
+       </tr>
+       <tr>
+	<td>2</td>
+	<td>Category</td>
+	<td>PID_CATEGORY</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>3</td>
+	<td>PresentationTarget</td>
+	<td>PID_PRESFORMAT</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>4</td>
+	<td>Bytes</td>
+	<td>PID_BYTECOUNT</td>
+	<td>VT_I4</td>
+       </tr>
+       <tr>
+	<td>5</td>
+	<td>Lines</td>
+	<td>PID_LINECOUNT</td>
+	<td>VT_I4</td>
+       </tr>
+       <tr>
+	<td>6</td>
+	<td>Paragraphs</td>
+	<td>PID_PARCOUNT</td>
+	<td>VT_I4</td>
+       </tr>
+       <tr>
+	<td>7</td>
+	<td>Slides</td>
+	<td>PID_SLIDECOUNT</td>
+	<td>VT_I4</td>
+       </tr>
+       <tr>
+	<td>8</td>
+	<td>Notes</td>
+	<td>PID_NOTECOUNT</td>
+	<td>VT_I4</td>
+       </tr>
+       <tr>
+	<td>9</td>
+	<td>HiddenSlides</td>
+	<td>PID_HIDDENCOUNT</td>
+	<td>VT_I4</td>
+       </tr>
+       <tr>
+	<td>10</td>
+	<td>MMClips</td>
+	<td>PID_MMCLIPCOUNT</td>
+	<td>VT_I4</td>
+       </tr>
+       <tr>
+	<td>11</td>
+	<td>ScaleCrop</td>
+	<td>PID_SCALE</td>
+	<td>VT_BOOL</td>
+       </tr>
+       <tr>
+	<td>12</td>
+	<td>HeadingPairs</td>
+	<td>PID_HEADINGPAIR</td>
+	<td>VT_VARIANT | VT_VECTOR</td>
+       </tr>
+       <tr>
+	<td>13</td>
+	<td>TitlesofParts</td>
+	<td>PID_DOCPARTS</td>
+	<td>VT_LPSTR | VT_VECTOR</td>
+       </tr>
+       <tr>
+	<td>14</td>
+	<td>Manager</td>
+	<td>PID_MANAGER</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>15</td>
+	<td>Company</td>
+	<td>PID_COMPANY</td>
+	<td>VT_LPSTR</td>
+       </tr>
+       <tr>
+	<td>16</td>
+	<td>LinksUpTo Date</td>
+	<td>PID_LINKSDIRTY</td>
+	<td>VT_BOOL</td>
+       </tr>
+     </table>
+    </section>
+   </section>
+
+
+
+   <section><title>Property Types</title>
+    <anchor id="property_types"/>
+
+    <p>A property consists of a DWord <em>type field</em> followed by the
+     property value. The property type is an integer value and tells how the
+     data byte following it are to be interpreted. In the Microsoft world it is
+     also known as the <em>variant</em>.</p>
+
+    <p>The <em>Usage</em> column says where a variant type may occur. Not all
+     of them are allowed in a property set but just those marked with a [P].
+     <strong>[V]</strong> - may appear in a VARIANT, <strong>[T]</strong> - may
+     appear in a TYPEDESC, <strong>[P]</strong> - may appear in an OLE property
+     set, <strong>[S]</strong> - may appear in a Safe Array.</p>
+
+    <table>
+      <tr>
+       <th>Variant ID</th>
+       <th>Variant Type</th>
+       <th>Usage</th>
+       <th>Description</th>
+      </tr>
+      <tr>
+       <td>0</td>
+       <td>VT_EMPTY</td>
+       <td>[V] [P]</td>
+       <td>nothing</td>
+      </tr>
+      <tr>
+       <td>1</td>
+       <td>VT_NULL</td>
+       <td>[V] [P]</td>
+       <td>SQL style Null</td>
+      </tr>
+      <tr>
+       <td>2</td>
+       <td>VT_I2</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>2 byte signed int</td>
+      </tr>
+      <tr>
+       <td>3</td>
+       <td>VT_I4</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>4 byte signed int</td>
+      </tr>
+      <tr>
+       <td>4</td>
+       <td>VT_R4</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>4 byte real</td>
+      </tr>
+      <tr>
+       <td>5</td>
+       <td>VT_R8</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>8 byte real</td>
+      </tr>
+      <tr>
+       <td>6</td>
+       <td>VT_CY</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>currency</td>
+      </tr>
+      <tr>
+       <td>7</td>
+       <td>VT_DATE</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>date</td>
+      </tr>
+      <tr>
+       <td>8</td>
+       <td>VT_BSTR</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>OLE Automation string</td>
+      </tr>
+      <tr>
+       <td>9</td>
+       <td>VT_DISPATCH</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>IDispatch *</td>
+      </tr>
+      <tr>
+       <td>10</td>
+       <td>VT_ERROR</td>
+       <td>[V] [T] [S]</td>
+       <td>SCODE</td>
+      </tr>
+      <tr>
+       <td>11</td>
+       <td>VT_BOOL</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>True=-1, False=0</td>
+      </tr>
+      <tr>
+       <td>12</td>
+       <td>VT_VARIANT</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>VARIANT *</td>
+      </tr>
+      <tr>
+       <td>13</td>
+       <td>VT_UNKNOWN</td>
+       <td>[V] [T] [S]</td>
+       <td>IUnknown *</td>
+      </tr>
+      <tr>
+       <td>14</td>
+       <td>VT_DECIMAL</td>
+       <td>[V] [T] [S]</td>
+       <td>16 byte fixed point</td>
+      </tr>
+      <tr>
+       <td>16</td>
+       <td>VT_I1</td>
+       <td>[T]</td>
+       <td>signed char</td>
+      </tr>
+      <tr>
+       <td>17</td>
+       <td>VT_UI1</td>
+       <td>[V] [T] [P] [S]</td>
+       <td>unsigned char</td>
+      </tr>
+      <tr>
+       <td>18</td>
+       <td>VT_UI2</td>
+       <td>[T] [P]</td>
+       <td>unsigned short</td>
+      </tr>
+      <tr>
+       <td>19</td>
+       <td>VT_UI4</td>
+       <td>[T] [P]</td>
+       <td>unsigned short</td>
+      </tr>
+      <tr>
+       <td>20</td>
+       <td>VT_I8</td>
+       <td>[T] [P]</td>
+       <td>signed 64-bit int</td>
+      </tr>
+      <tr>
+       <td>21</td>
+       <td>VT_UI8</td>
+       <td>[T] [P]</td>
+       <td>unsigned 64-bit int</td>
+      </tr>
+      <tr>
+       <td>22</td>
+       <td>VT_INT</td>
+       <td>[T]</td>
+       <td>signed machine int</td>
+      </tr>
+      <tr>
+       <td>23</td>
+       <td>VT_UINT</td>
+       <td>[T]</td>
+       <td>unsigned machine int</td>
+      </tr>
+      <tr>
+       <td>24</td>
+       <td>VT_VOID</td>
+       <td>[T]</td>
+       <td>C style void</td>
+      </tr>
+      <tr>
+       <td>25</td>
+       <td>VT_HRESULT</td>
+       <td>[T]</td>
+       <td>Standard return type</td>
+      </tr>
+      <tr>
+       <td>26</td>
+       <td>VT_PTR</td>
+       <td>[T]</td>
+       <td>pointer type</td>
+      </tr>
+      <tr>
+       <td>27</td>
+       <td>VT_SAFEARRAY</td>
+       <td>[T]</td>
+       <td>(use VT_ARRAY in VARIANT)</td>
+      </tr>
+      <tr>
+       <td>28</td>
+       <td>VT_CARRAY</td>
+       <td>[T]</td>
+       <td>C style array</td>
+      </tr>
+      <tr>
+       <td>29</td>
+       <td>VT_USERDEFINED</td>
+       <td>[T]</td>
+       <td>user defined type</td>
+      </tr>
+      <tr>
+       <td>30</td>
+       <td>VT_LPSTR</td>
+       <td>[T] [P]</td>
+       <td>null terminated string</td>
+      </tr>
+      <tr>
+       <td>31</td>
+       <td>VT_LPWSTR</td>
+       <td>[T] [P]</td>
+       <td>wide null terminated string</td>
+      </tr>
+      <tr>
+       <td>64</td>
+       <td>VT_FILETIME</td>
+       <td>[P]</td>
+       <td>FILETIME</td>
+      </tr>
+      <tr>
+       <td>65</td>
+       <td>VT_BLOB</td>
+       <td>[P]</td>
+       <td>Length prefixed bytes</td>
+      </tr>
+      <tr>
+       <td>66</td>
+       <td>VT_STREAM</td>
+       <td>[P]</td>
+       <td>Name of the stream follows</td>
+      </tr>
+      <tr>
+       <td>67</td>
+       <td>VT_STORAGE</td>
+       <td>[P]</td>
+       <td>Name of the storage follows</td>
+      </tr>
+      <tr>
+       <td>68</td>
+       <td>VT_STREAMED_OBJECT</td>
+       <td>[P]</td>
+       <td>Stream contains an object</td>
+      </tr>
+      <tr>
+       <td>69</td>
+       <td>VT_STORED_OBJECT</td>
+       <td>[P]</td>
+       <td>Storage contains an object</td>
+      </tr>
+      <tr>
+       <td>70</td>
+       <td>VT_BLOB_OBJECT</td>
+       <td>[P]</td>
+       <td>Blob contains an object</td>
+      </tr>
+      <tr>
+       <td>71</td>
+       <td>VT_CF</td>
+       <td>[P]</td>
+       <td>Clipboard format</td>
+      </tr>
+      <tr>
+       <td>72</td>
+       <td>VT_CLSID</td>
+       <td>[P]</td>
+       <td>A Class ID</td>
+      </tr>
+      <tr>
+       <td>0x1000</td>
+       <td>VT_VECTOR</td>
+       <td>[P]</td>
+       <td>simple counted array</td>
+      </tr>
+      <tr>
+       <td>0x2000</td>
+       <td>VT_ARRAY</td>
+       <td>[V]</td>
+       <td>SAFEARRAY*</td>
+      </tr>
+      <tr>
+       <td>0x4000</td>
+       <td>VT_BYREF</td>
+       <td>[V]</td>
+       <td>void* for local use</td>
+      </tr>
+      <tr>
+       <td>0x8000</td>
+       <td>VT_RESERVED</td>
+       <td><br/></td>
+       <td><br/></td>
+      </tr>
+      <tr>
+       <td>0xFFFF</td>
+       <td>VT_ILLEGAL</td>
+       <td><br/></td>
+       <td><br/></td>
+      </tr>
+      <tr>
+       <td>0xFFF</td>
+       <td>VT_ILLEGALMASKED</td>
+       <td><br/></td>
+       <td><br/></td>
+      </tr>
+      <tr>
+       <td>0xFFF</td>
+       <td>VT_TYPEMASK</td>
+       <td><br/></td>
+       <td><br/></td>
+      </tr>
+    </table>
+   </section>
+
+
+
+   <section><title>References</title>
+
+    <p>In order to assemble the HPSF description I used information publically
+     available on the Internet only. The references given below have been very
+     helpful. If you have any amendments or corrections, please let us know!
+     Thank you!</p>
+
+   <ol>
+
+    <li>In
+      <link href="http://www.kyler.com/pubs/ddj9894.html"><em>Understanding OLE
+	 documents</em></link>, Ken Kyler gives an introduction to OLE2
+       documents
+      and especially to property sets. He names the property names, types, and
+      IDs of the Summary Information and Document Summary Information
+      stream.</li>
+
+    <li>The
+      <link href="http://www.dwam.net/docs/oleref/"><em>ActiveX Programmer's
+	Reference</em></link> at
+      <link href="http://www.dwam.net/docs/oleref/">http://www.dwam.net/docs/oleref/</link>
+      seems a little outdated, but that's what I have found.</li>
+
+    <li>An overview of the <code>VT_</code> types is in
+      <link href="http://www.marin.clara.net/COM/variant_type_definitions.htm"><em>Variant
+	Type Definitions</em></link>.</li>
+
+    <li>What is a <code>FILETIME</code>? The answer can be found
+     under <link
+      href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/filetime_str.asp"></link>, <link href="http://www.vbapi.com/ref/f/filetime.html">http://www.vbapi.com/ref/f/filetime.html</link> or
+      <link href="http://www.cs.rpi.edu/courses/fall01/os/FILETIME.html">http://www.cs.rpi.edu/courses/fall01/os/FILETIME.html</link>.
+      In short: <em>The FILETIME structure holds a date and time associated
+      with a file. The structure identifies a 64-bit integer specifying the
+      number of 100-nanosecond intervals which have passed since January 1,
+      1601. This 64-bit value is split into the two dwords stored in the
+      structure.</em></li>
+
+    <li>Information about the code page property in the
+     DocumentSummaryInformation stream is available at <link
+      href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/stg/stg/property_id_1.asp">http://msdn.microsoft.com/library/default.asp?url=/library/en-us/stg/stg/property_id_1.asp</link>.</li>
+
+    <li>This documentation origins from the <link href="http://www.rainer-klute.de/~klute/Software/poibrowser/doc/HPSF-Description.html">HPSF description</link> available at <link href="http://www.rainer-klute.de/~klute/Software/poibrowser/doc/HPSF-Description.html">http://www.rainer-klute.de/~klute/Software/poibrowser/doc/HPSF-Description.html</link>.</li>
+    </ol>
+   </section>
+  </section>
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
diff --git a/src/documentation/content/xdocs/hpsf/thumbnails.xml b/src/documentation/content/xdocs/hpsf/thumbnails.xml
new file mode 100644
index 0000000000..c6d4bb19c3
--- /dev/null
+++ b/src/documentation/content/xdocs/hpsf/thumbnails.xml
@@ -0,0 +1,182 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
+"../dtd/document-v11.dtd">
+<!-- $Id$ -->
+
+<document>
+ <header>
+  <title>HPSF THUMBNAIL HOW-TO</title>
+  <authors>
+   <person name="Drew Varner" email="Drew.Varner@-deleteThis-sc.edu" />
+  </authors>
+ </header>
+ <body>
+  <section><title>The VT_CF Format</title>
+
+   <p>Thumbnail information is stored as a VT_CF, or Thumbnail Variant. The
+    Thumbnail Variant is used to store various types of information in a
+    clipboard. The VT_CF can store information in formats for the Macintosh or
+    Windows clipboard.</p>
+
+   <p>There are many types of data that can be copied to the clipboard, but the
+    only types of information needed for thumbnail manipulation are the image
+    formats.</p>
+
+   <p>The <code>VT_CF</code> structure looks like this:</p>
+
+   <table>
+    <tr>
+     <th>Element:</th>
+     <td>Clipboard Size</td>
+     <td>Clipboard Format Tag</td>
+     <td>Clipboard Data</td>
+    </tr>
+    <tr>
+     <th>Size:</th>
+     <td>32 bit unsigned integer (DWord)</td>
+     <td>32 bit signed integer (DWord)</td>
+     <td>variable length (byte array)</td>
+    </tr>
+   </table>
+
+   <p>The Clipboard Size refers to the size (in bytes) of Clipboard Data
+    (variable size) plus the Clipboard Format (four bytes).</p>
+
+   <p>Clipboard Format Tag has four possible values:</p>
+
+   <table>
+    <tr>
+     <th>Value</th>
+     <th>Identifier</th>
+     <th>Description</th>
+    </tr>
+    <tr>
+     <td><code>-1L</code></td>
+     <td><code>CFTAG_WINDOWS</code></td>
+     <td>a built-in Windows&copy; clipboard format value</td>
+    </tr>
+    <tr>
+     <td><code>-2L</code></td>
+     <td><code>CFTAG_MACINTOSH</code></td>
+     <td>a Macintosh clipboard format value</td>
+    </tr>
+    <tr>
+     <td><code>-3L</code></td>
+     <td><code>CFTAG_FMTID</code></td>
+     <td>a format identifier (FMTID) This is rarely used.</td>
+    </tr>
+    <tr>
+     <td><code>0L</code></td>
+     <td><code>CFTAG_NODATA</code></td>
+     <td>No data This is rarely used.</td>
+    </tr>
+   </table>
+  </section>
+
+
+
+  <section><title>Windows Clipboard Data</title>
+
+   <p>Windows clipboard data has four image formats for thumbnails:</p>
+
+   <table>
+    <tr>
+     <th>Value</th>
+     <th>Identifier</th>
+     <th>Description</th>
+    </tr>
+    <tr>
+     <td>3</td>
+     <td><code>CF_METAFILEPICT</code></td>
+     <td>Windows metafile format - recommended</td>
+    </tr>
+    <tr>
+     <td>8</td>
+     <td><code>CF_DIB</code></td>
+     <td>Device Independent Bitmap</td>
+    </tr>
+    <tr>
+     <td>14</td>
+     <td><code>CF_ENHMETAFILE</code></td>
+     <td>Enhanced Windows metafile format</td>
+    </tr>
+    <tr>
+     <td>2</td>
+     <td><code>CF_BITMAP</code></td>
+     <td>Bitmap - Obsolete - Use <code>CF_DIB</code> instead</td>
+    </tr>
+   </table>
+  </section>
+
+  <section><title>Windows Metafile Format</title>
+
+   <p>The most common format for thumbnails on the Windows platform is the
+    Windows metafile format. The Clipboard places and extra header in front of
+    a the standard Windows Metafile Format data.</p>
+
+   <p>The Clipboard Data byte array looks like this when an image is stored in
+    Windows' Clipboard WMF format.</p>
+
+   <table>
+    <tr>
+     <th>Identifier</th>
+     <td>CF_METAFILEPICT</td>
+     <td>mm</td>
+     <td>width</td>
+     <td>height</td>
+     <td>handle</td>
+     <td>WMF data</td>
+    </tr>
+    <tr>
+     <th>Size</th>
+     <td>32 bit unsigned int</td>
+     <td>16 bit unsigned(?) int</td>
+     <td>16 bit unsigned(?) int</td>
+     <td>16 bit unsigned(?) int</td>
+     <td>16 bit unsigned(?) int</td>
+     <td>byte array - variable length</td>
+    </tr>
+    <tr>
+     <th>Description</th>
+     <td>Clipboard WMF</td>
+     <td>Mapping Mode</td>
+     <td>Image Width</td>
+     <td>Image Height</td>
+     <td>handle to the WMF data array in memory, or 0</td>
+     <td>standard WMF byte stream</td>
+    </tr>
+   </table>
+  </section>
+
+
+  <section><title>Device Independent Bitmap</title>
+   <p><strong>FIXME:</strong> Describe the Device Independent Bitmap
+    format!</p>
+  </section>
+
+
+
+  <section><title>Macintosh Clipboard Data</title>
+   <p><strong>FIXME:</strong> Describe the Macintosh clipboard formats!</p>
+  </section>
+
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
diff --git a/src/documentation/content/xdocs/hpsf/todo.xml b/src/documentation/content/xdocs/hpsf/todo.xml
new file mode 100644
index 0000000000..3c3ca4e7f3
--- /dev/null
+++ b/src/documentation/content/xdocs/hpsf/todo.xml
@@ -0,0 +1,65 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
+<!-- $Id$ -->
+
+<document>
+ <header>
+  <title>To Do</title>
+  <authors>
+   <person name="Rainer Klute" email="klute@rainer-klute.de"/>
+  </authors>
+ </header>
+ <body>
+  <section><title>To Do</title>
+
+   <p>The following functionalities should be added to HPFS:</p>
+
+   <ol>
+    <li>
+     Add writing capability for property sets. Presently property sets can
+      be read only.
+    </li>
+    <li>
+     Add codepage support: Presently the bytes making out the string in a
+      property's value are interpreted using the platform's default character
+      set.
+    </li>
+    <li>
+      Add resource bundles to
+      <code>org.apache.poi.hpsf.wellknown</code> to ease
+      localizations. This would be useful for mapping standard property IDs to
+      localized strings. Example: The property ID 4 could be mapped to "Author"
+      in English or "Verfasser" in German.
+    </li>
+    <li>
+     Implement reading functionality for those property types that are not
+      yet supported. HPSF should return proper Java types instead of just byte
+      arrays.
+    </li>
+    <li>
+     Add WMF to <code>java.awt.Image</code> example code in <link
+     href="thumbnails.html">Thumbnail
+     HOW TO</link>.
+    </li>
+   </ol>
+  </section>
+ </body>
+</document>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: xml
+sgml-omittag:nil
+sgml-shorttag:nil
+sgml-namecase-general:nil
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:t
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->