aboutsummaryrefslogtreecommitdiffstats
path: root/build/jakarta-poi/docs/hpsf
diff options
context:
space:
mode:
authorRainer Klute <klute@apache.org>2002-03-06 09:03:53 +0000
committerRainer Klute <klute@apache.org>2002-03-06 09:03:53 +0000
commit641974525ab8040e733a8694d4fa8190357cce80 (patch)
tree1d9ed60d597bdef5ed8099a88e124cd43d6709bd /build/jakarta-poi/docs/hpsf
parent4d54d7cb622bc7f90b6bc4983888dea777dd2e3f (diff)
downloadpoi-641974525ab8040e733a8694d4fa8190357cce80.tar.gz
poi-641974525ab8040e733a8694d4fa8190357cce80.zip
- Added first sections to HPSF HOW-TO.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@352153 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'build/jakarta-poi/docs/hpsf')
-rw-r--r--build/jakarta-poi/docs/hpsf/how-to.html498
1 files changed, 495 insertions, 3 deletions
diff --git a/build/jakarta-poi/docs/hpsf/how-to.html b/build/jakarta-poi/docs/hpsf/how-to.html
index 2987064335..7e6bc47384 100644
--- a/build/jakarta-poi/docs/hpsf/how-to.html
+++ b/build/jakarta-poi/docs/hpsf/how-to.html
@@ -73,11 +73,503 @@
<tr>
<td>
<br>
+
+
+<p align="justify">This HOW-TO is organized in three section. You should read them
+ sequentially because the later sections build upon the earlier ones.</p>
+
+
+<ol>
+
+<li>
+
+<p align="justify">The <a href="#sec1">first section</a> explains how to read
+ the most important standard properties of a Microsoft Office
+ document. Standard properties are things like title, author, creation
+ date etc. It is quite likely that you will find here what you need and
+ don't have to read the other sections.</p>
+
+</li>
+
+
+<li>
+
+<p align="justify">The <a href="#sec2">second section</a> goes a small step
+ further and focusses on reading additional standard properties. It also
+ talks about exceptions that may be thrown when dealing with HPSF and
+ shows how you can read properties of embedded objects.</p>
+
+</li>
+
+
+<li>
+
+<p align="justify">The <a href="#sec3">third section</a> tells how to read
+ non-standard properties. Non-standard properties are application-specific
+ name/value/type triples.</p>
+
+</li>
+
+</ol>
+
+
+<anchor id="sec1"></anchor>
+
+<div align="right">
+<table cellspacing="0" cellpadding="2" border="0" width="99%">
+<tr>
+<td bgcolor="#525D76"><font color="#ffffff" size="+0"><font face="Arial,sans-serif"><b>Reading Standard Properties</b></font></font></td>
+</tr>
+<tr>
+<td>
+<br>
+
+
+<note>This section explains how to read
+ the most important standard properties of a Microsoft Office
+ document. Standard properties are things like title, author, creation
+ date etc. Chances are that you will find here what you need and
+ don't have to read the other sections.</note>
+
+
+<p align="justify">The first thing you should understand is that properties are stored in
+ separate documents inside the POI filesystem. (If you don't know what a
+ POI filesystem is, read its <a href="../poifs/index.html">documentation</a>.) A document in a POI
+ filesystem is also called a <em>stream</em>.</p>
+
+
+<p align="justify">The following example shows how to read a POI filesystem's
+ "title" property. Reading other properties is similar. Consider the API
+ documentation of <code>org.apache.poi.hpsf.SummaryInformation</code>.</p>
+
+
+<p align="justify">The standard properties this section focusses on can be
+ found in a document called <em>\005SummaryInformation</em> in the root of
+ the POI filesystem. The notation <em>\005</em> in the document's name
+ means the character with the decimal value of 5. In order to read the
+ title, an application has to perform the following steps:</p>
+
+
+<ol>
+
+<li>
-<p align="justify">TODO: This documentation is still to be written. For the
- time being, please see the API documentation (javadocs) of the
- <code>org.apache.poi.hpsf</code> package.</p>
+<p align="justify">Open the document <em>\005SummaryInformation</em> located in the root
+ of the POI filesystem.</p>
+
+</li>
+
+<li>
+
+<p align="justify">Create an instance of the class
+ <code>SummaryInformation</code> from that
+ document.</p>
+
+</li>
+
+<li>
+
+<p align="justify">Call the <code>SummaryInformation</code> instance's
+ <code>getTitle()</code> method.</p>
+
+</li>
+
+</ol>
+
+
+<p align="justify">Sounds easy, doesn't it? Here are the steps in detail.</p>
+
+
+<div align="right">
+<table cellspacing="0" cellpadding="2" border="0" width="98%">
+<tr>
+<td bgcolor="#525D76"><font color="#ffffff" size="-1"><font face="Arial,sans-serif"><b>Open the document \005SummaryInformation in the root of the POI filesystem</b></font></font></td>
+</tr>
+<tr>
+<td>
+<br>
+
+
+<p align="justify">An application that wants to open a document in a POI filesystem
+ (POIFS) proceeds as shown by the following code fragment. (The full
+ source code of the sample application is available in the
+ <em>examples</em> section of the POI source tree as
+ <em>ReadTitle.java</em>.)</p>
+
+
+<div align="center">
+<table cellspacing="2" cellpadding="2" border="1">
+<tr>
+<td>
+<pre>
+import java.io.*;
+import org.apache.poi.hpsf.*;
+import org.apache.poi.poifs.eventfilesystem.*;
+
+// ...
+
+public static void main(String[] args)
+ throws IOException
+{
+ final String filename = args[0];
+ POIFSReader r = new POIFSReader();
+ r.registerListener(new MyPOIFSReaderListener(),
+ "\005SummaryInformation");
+ r.read(new FileInputStream(filename));
+}</pre>
+</td>
+</tr>
+</table>
+</div>
+
+
+<p align="justify">The first interesting statement is</p>
+
+
+<div align="center">
+<table cellspacing="2" cellpadding="2" border="1">
+<tr>
+<td>
+<pre>POIFSReader r = new POIFSReader();</pre>
+</td>
+</tr>
+</table>
+</div>
+
+
+<p align="justify">It creates a
+ <code>org.apache.poi.poifs.eventfilesystem.POIFSReader</code> instance
+ which we shall need to read the POI filesystem. Before the application
+ actually opens the POI filesystem we have to tell the
+ <code>POIFSReader</code> which documents we are interested in. In this
+ case the application should do something with the document
+ <em>\005SummaryInformation</em>.</p>
+
+
+<div align="center">
+<table cellspacing="2" cellpadding="2" border="1">
+<tr>
+<td>
+<pre>
+r.registerListener(new MyPOIFSReaderListener(),
+ "\005SummaryInformation");</pre>
+</td>
+</tr>
+</table>
+</div>
+
+
+<p align="justify">This method call registers a
+ <code>org.apache.poi.poifs.eventfilesystem.POIFSReaderListener</code>
+ with the <code>POIFSReader</code>. The <code>POIFSReaderListener</code>
+ interface specifies the method <code>processPOIFSReaderEvent</code>
+ which processes a document. The class
+ <code>MyPOIFSReaderListener</code> implements the
+ <code>POIFSReaderListener</code> and thus the
+ <code>processPOIFSReaderEvent</code> method. The eventing POI filesystem
+ calls this method when it finds the <em>\005SummaryInformation</em>
+ document. In the sample application <code>MyPOIFSReaderListener</code> is
+ a static class in the <em>ReadTitle.java</em> source file.)</p>
+
+
+<p align="justify">Now everything is prepared and reading the POI filesystem can
+ start:</p>
+
+
+<div align="center">
+<table cellspacing="2" cellpadding="2" border="1">
+<tr>
+<td>
+<pre>r.read(new FileInputStream(filename));</pre>
+</td>
+</tr>
+</table>
+</div>
+
+
+<p align="justify">The following source code fragment shows the
+ <code>MyPOIFSReaderListener</code> class and how it retrieves the
+ title.</p>
+
+
+<div align="center">
+<table cellspacing="2" cellpadding="2" border="1">
+<tr>
+<td>
+<pre>
+static class MyPOIFSReaderListener implements POIFSReaderListener
+{
+ public void processPOIFSReaderEvent(POIFSReaderEvent e)
+ {
+ SummaryInformation si = null;
+ try
+ {
+ si = (SummaryInformation)
+ PropertySetFactory.create(e.getStream());
+ }
+ catch (Exception ex)
+ {
+ throw new RuntimeException
+ ("Property set stream \"" +
+ event.getPath() + event.getName() + "\": " + ex);
+ }
+ final String title = si.getTitle();
+ if (title != null)
+ System.out.println("Title: \"" + title + "\"");
+ else
+ System.out.println("Document has no title.");
+ }
+}
+</pre>
+</td>
+</tr>
+</table>
+</div>
+
+
+<p align="justify">The line</p>
+
+
+<div align="center">
+<table cellspacing="2" cellpadding="2" border="1">
+<tr>
+<td>
+<pre>SummaryInformation si = null;</pre>
+</td>
+</tr>
+</table>
+</div>
+
+
+<p align="justify">declares a <code>SummaryInformation</code> variable and initializes it
+ with <code>null</code>. We need an instance of this class to access the
+ title. The instance is created in a <code>try</code> block:</p>
+
+
+<div align="center">
+<table cellspacing="2" cellpadding="2" border="1">
+<tr>
+<td>
+<pre>si = (SummaryInformation)
+ PropertySetFactory.create(e.getStream());</pre>
+</td>
+</tr>
+</table>
+</div>
+
+
+<p align="justify">The expression <code>e.getStream()</code> returns the input stream
+ containing the bytes of the property set stream named
+ <em>\005SummaryInformation</em>. This stream is passed into the
+ <code>create</code> method of the factory class
+ <code>org.apache.poi.hpsf.PropertySetFactory</code> which returns
+ a <code>org.apache.poi.hpsf.PropertySet</code> instance. It is more or
+ less safe to cast this result to <code>SummaryInformation</code>, a
+ convenience class with methods like <code>getTitle()</code>,
+ <code>getAuthor()</code> etc.</p>
+
+
+<p align="justify">The <code>PropertySetFactory.create</code> method may throw all sorts
+ of exceptions. We'll deal with them in the next sections. For now we just
+ catch all exceptions and throw a <code>RuntimeException</code>
+ containing the message text of the origin exception.</p>
+
+
+<p align="justify">If all goes well, the sample application retrieves the title and prints
+ it to the standard output. As you can see you must be prepared for the
+ case that the POI filesystem does not have a title.</p>
+
+
+<div align="center">
+<table cellspacing="2" cellpadding="2" border="1">
+<tr>
+<td>
+<pre>final String title = si.getTitle();
+ if (title != null)
+ System.out.println("Title: \"" + title + "\"");
+ else
+ System.out.println("Document has no title.");</pre>
+</td>
+</tr>
+</table>
+</div>
+
+
+<p align="justify">Please note that a Microsoft Office document does not necessarily
+ contain the <em>\005SummaryInformation</em> stream. The documents created
+ by the Microsoft Office suite have one, as far as I know. However, an
+ Excel spreadsheet exported from StarOffice 5.2 won't have a
+ <em>\005SummaryInformation</em> stream. In this case the applications
+ won't throw an exception but simply does not call the
+ <code>processPOIFSReaderEvent</code> method. You have been warned!</p>
+
+</td>
+</tr>
+</table>
+</div>
+<br>
+
+</td>
+</tr>
+</table>
+</div>
+<br>
+
+
+<anchor id="sec2"></anchor>
+
+<div align="right">
+<table cellspacing="0" cellpadding="2" border="0" width="99%">
+<tr>
+<td bgcolor="#525D76"><font color="#ffffff" size="+0"><font face="Arial,sans-serif"><b>Additional Standard Properties, Exceptions And Embedded Objects</b></font></font></td>
+</tr>
+<tr>
+<td>
+<br>
+
+
+<note>This section focusses on reading additional standard properties. It
+ also talks about exceptions that may be thrown when dealing with HPSF and
+ shows how you can read properties of embedded objects.</note>
+
+
+<p align="justify">A couple of <em>additional standard properties</em> are not
+ contained in the <em>\005SummaryInformation</em> stream explained above,
+ for example a document's category or the number of multimedia clips in a
+ PowerPoint presentation. Microsoft has invented an additional stream named
+ <em>\005DocumentSummaryInformation</em> to hold these properties. With two
+ minor exceptions you can proceed exactly as described above to read the
+ properties stored in <em>\005DocumentSummaryInformation</em>:</p>
+
+
+<ul>
+
+<li>
+<p align="justify">Instead of <em>\005SummaryInformation</em> use
+ <em>\005DocumentSummaryInformation</em> as the stream's name.</p>
+</li>
+
+<li>
+<p align="justify">Replace all occurrences of the class
+ <code>SummaryInformation</code> by
+ <code>DocumentSummaryInformation</code>.</p>
+</li>
+
+</ul>
+
+
+<p align="justify">And of course you cannot call <code>getTitle()</code> because
+ <code>DocumentSummaryInformation</code> has different query methods. See
+ the API documentation for the details!</p>
+
+
+<p align="justify">In the previous section the application simply caught all
+ <em>exceptions</em> and was in no way interested in any
+ details. However, a real application will likely want to know what went
+ wrong and act appropriately. Besides any IO exceptions there are three
+ HPSF resp. POI specific exceptions you should know about:</p>
+
+
+<dl>
+
+<dt>
+<code>NoPropertySetStreamException</code>:</dt>
+
+<dd>
+<p align="justify">This exception is thrown if the application tries to create a
+ <code>PropertySet</code> or one of its subclasses
+ <code>SummaryInformation</code> and
+ <code>DocumentSummaryInformation</code> from a stream that is not a
+ property set stream. A faulty property set stream counts as not being a
+ property set stream at all. An application should be prepared to deal
+ with this case even if opens streams named
+ <em>\005SummaryInformation</em> or
+ <em>\005DocumentSummaryInformation</em> only. These are just names. A
+ stream's name by itself does not ensure that the stream contains the
+ expected contents and that this contents is correct.</p>
+</dd>
+
+
+<dt>
+<code>UnexpectedPropertySetTypeException</code>
+</dt>
+
+<dd>
+<p align="justify">This exception is thrown if a certain type of property set is
+ expected somewhere (e.g. a <code>SummaryInformation</code> or
+ <code>DocumentSummaryInformation</code>) but the provided property
+ set is not of that type.</p>
+</dd>
+
+
+<dt>
+<code>MarkUnsupportedException</code>
+</dt>
+
+<dd>
+<p align="justify">This exception is thrown if an input stream that is to be parsed
+ into a property set does not support the
+ <code>InputStream.mark(int)</code> operation. The POI filesystem uses
+ the <code>DocumentInputStream</code> class which does support this
+ operation, so you are safe here. However, if you read a property set
+ stream from another kind of input stream things may be
+ different.</p>
+</dd>
+
+</dl>
+
+
+<p align="justify">Many Microsoft Office documents contain <em>embedded
+ objects</em>, for example an Excel sheet on a page in a Word
+ document. Embedded objects may have property sets of their own. An
+ application can open these property set streams as described above. The
+ only difference is that they are not located in the POI filesystem's root
+ but in a nested directory instead. Just register a
+ <code>POIFSReaderListener</code> for the property set streams you are
+ interested in. For example, the <em>POIBrowser</em> application in the
+ contrib section tries to open each and every document in a POI filesystem
+ as a property set stream. If this operation was successful it displays the
+ properties.</p>
+
+</td>
+</tr>
+</table>
+</div>
+<br>
+
+
+<anchor id="sec3"></anchor>
+
+<div align="right">
+<table cellspacing="0" cellpadding="2" border="0" width="99%">
+<tr>
+<td bgcolor="#525D76"><font color="#ffffff" size="+0"><font face="Arial,sans-serif"><b>Reading Non-Standard Properties</b></font></font></td>
+</tr>
+<tr>
+<td>
+<br>
+
+
+<note>This section tells how to read
+ non-standard properties. Non-standard properties are application-specific
+ name/value/type triples.</note>
+
+
+<div align="center">
+<table cellspacing="2" cellpadding="2" border="1">
+<tr>
+<td bgcolor="#c0c0c0"><font size="-1" color="#023264">Write this section!</font></td>
+</tr>
+</table>
+</div>
+
+</td>
+</tr>
+</table>
+</div>
+<br>
+
</td>
</tr>
</table>