diff options
author | Nick Burch <nick@apache.org> | 2011-03-04 11:59:23 +0000 |
---|---|---|
committer | Nick Burch <nick@apache.org> | 2011-03-04 11:59:23 +0000 |
commit | ce77707b83b197ffd1b7a6be4b06d03ee3c042e4 (patch) | |
tree | c8500495311777b5cf87e99b5d41a269dbb9fcad | |
parent | e9f5fbd58d3b4c9cb04738a4b277f99d2ddf5b56 (diff) | |
download | poi-ce77707b83b197ffd1b7a6be4b06d03ee3c042e4.tar.gz poi-ce77707b83b197ffd1b7a6be4b06d03ee3c042e4.zip |
Add documentation for the HMEF (TNEF/winmail.dat) support so far.
Also add a little bit to the HPBF docs, and tweak build.xml to check the right files when deciding if the docs are up to date.
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1077891 13f79535-47bb-0310-9956-ffa450edef68
-rw-r--r-- | build.xml | 2 | ||||
-rw-r--r-- | src/documentation/content/xdocs/hmef/index.xml | 182 | ||||
-rw-r--r-- | src/documentation/content/xdocs/hpbf/book.xml | 35 | ||||
-rw-r--r-- | src/documentation/content/xdocs/hpbf/index.xml | 5 |
4 files changed, 209 insertions, 15 deletions
@@ -748,7 +748,7 @@ under the License. <target name="-check-docs"> <uptodate property="main.docs.notRequired" targetfile="${build.site}/index.html"> - <srcfiles dir="${build.site.src}"/> + <srcfiles dir="${main.documentation}" /> </uptodate> </target> diff --git a/src/documentation/content/xdocs/hmef/index.xml b/src/documentation/content/xdocs/hmef/index.xml index 99a3c92298..ef4246db40 100644 --- a/src/documentation/content/xdocs/hmef/index.xml +++ b/src/documentation/content/xdocs/hmef/index.xml @@ -35,19 +35,15 @@ <p>HMEF is the POI Project's pure Java implementation of the TNEF (Transport Neurtral Encoding Format), aka winmail.dat, which is used by Outlook and Exchange in some situations.</p> - <p>Currently, HMEF provides a low-level, read-only api for - accessing core TNEF attributes. It is able to provide access - to both TNEF and MAPI attributes, and low level access to - attachments. Compressed RTF is not yet fully supported, and - user-facing access to common attributes and attachment contents - is not yet present.</p> - <p>HMEF is currently very much a work-in-progress, and we hope - to add a text extractor and attachment extractor in the not - too distant future.</p> - <p>To get a feel for the contents of a file, and to track down - where data of interest is stored, HMEF comes with - <link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hmef/dev/">HMEFDumper</link> - to print out the contents of the file.</p> + <p>Currently, HMEF provides a read-only api for accessing common + message and attachment attributes, including the message body + and attachment files. In addition, it's possible to have + read-only access to all of the underlying TNEF and MAPI + attributes of the message and attachments.</p> + <p>HMEF also provides a command line tool for extracting out + the message body and attachment files from a TNEF (winmail.dat) + file.</p> + <note> This code currently lives the <link href="http://svn.apache.org/viewcvs.cgi/poi/trunk/src/scratchpad/">scratchpad area</link> @@ -55,7 +51,167 @@ Ensure that you have the scratchpad jar or the scratchpad build area in your classpath before experimenting with this code. </note> + <note> + This code is a new POI feature, and the first release that will + contain it will be POI 3.8 beta 2. Until then, you will need to + build your own jars from a <link href="../subversion.html">svn + checkout</link>. + </note> + </section> + + <section> + <title>Using HMEF to access TNEF (winmail.dat) files</title> + + <section> + <title>Easy extraction of message body and attachment files</title> + + <p>The class <em>org.apache.poi.hmef.extractor.HMEFContentsExtractor</em> + provides both command line and Java extraction. It allows the + saving of the message body (an RTF file), and all of the + attachment files, to a single directory as specified.</p> + + <p>From the command line, simply call the class specifying the + TNEF file to extract, and the directory to place the extracted + files into, eg:</p> + <source> + java -classpath poi-3.8-FINAL.jar:poi-scratchpad-3.8-FINAL.jar org.apache.poi.hmef.extractor.HMEFContentsExtractor winmail.dat /tmp/extracted/ + </source> + + <p>From Java, there are two method calls on the class, one to + extract the message body RTF to a file, and the other to extract + all the attachments to a directory. A typical use would be:</p> + <source> +public void extract(String winmailFilename, String directoryName) throws Exception { + HMEFContentsExtractor ext = new HMEFContentsExtractor(new File(winmailFilename)); + + File dir = new File(directoryName); + File rtf = new File(dir, "message.rtf"); + if(! dir.exists()) { + throw new FileNotFoundException("Output directory " + dir.getName() + " not found"); + } + + System.out.println("Extracting..."); + ext.extractMessageBody(rtf); + ext.extractAttachments(dir); + System.out.println("Extraction completed"); +} + </source> + </section> + + <section> + <title>Attachment attributes and contents</title> + + <p>To get at your attachments, simply call the + <em>getAttachments()</em> method on a <em>HMEFMessage</em> + instance, and you'll receive a list of all the attachments.</p> + <p>When you have a <em>org.apache.poi.hmef.Attachment</em> object, + there are several helper methods available. These will all + return the value of the appropriate underlying attachment + attributes, or null if for some reason the attribute isn't + present in your file.</p> + <ul> + <li><em>getFilename()</em> - returns the name of the attachment + file, possibly in 8.3 format</li> + <li><em>getLongFilename()</em> - returns the full name of the + attachment file</li> + <li><em>getExtension()</em> - returns the extension of the + attachment file, including the "."</li> + <li><em>getModifiedDate()</em> - returns the date that the + attachment file was last edited on</li> + <li><em>getContents()</em> - returns a byte array of the contents + of the attached file</li> + <li><em>getRenderedMetaFile()</em> - returns a byte array of + a windows meta file representation of the attached file</li> + </ul> + </section> + + <section> + <title>Message attributes and message body</title> + + <p>A <em>org.apache.poi.hmef.HMEFMessage</em> instance is created + from an <em>InputStream</em> of the underlying TNEF (winmail.dat) + file.</p> + <p>From a <em>HMEFMessage</em>, there are three main methods of + interest to call:</p> + <ul> + <li><em>getBody()</em> - returns a String containing the RTF + contents of the message body. + <em>Note - see limitations</em></li> + <li><em>getSubject()</em> - returns the message subject</li> + <li><em>getAttachments()</em> - returns the list of + <em>Attachment</em> objects for the message</li> + </ul> + </section> + + <section> + <title>Low level attribute access</title> + + <p>Both Messages and Attachments contain two kinds of attributes. + These are <em>TNEFAttribute</em> and <em>MAPIAttribute</em>.</p> + <p>TNEFAttribute is specific to TNEF files in terms of the + available types and properties. In general, Attachments have a + few more useful ones of these then Messages.</p> + <p>MAPIAttributes hold standard MAPI properties and values, and + work in a similar way to <link href="../hsmf/">HSMF + (Outlook)</link> does. There are typically many of these on both + Messages and Attachments. <em>Note - see limitations</em></p> + <p>Both <em>HMEFMessage</em> and <em>Attachment</em> supports + support two different ways of getting to attributes of interest. + Firstly, they support list getters, to return all attributes + (either TNEF or MAPI). Secondly, they support specific getters by + TNEF or MAPI property.</p> + <source> +HMEFMessage msg = new HMEFMessage(new FileInputStream(file)); +for(TNEFAttribute attr : msg.getMessageAttributes) { + System.out.println("TNEF : " + attr); +} +for(MAPIAttribute attr : msg.getMessageMAPIAttributes) { + System.out.println("MAPI : " + attr); +} +System.out.println("Subject is " + msg.getMessageMAPIAttribute(MAPIProperty.CONVERSATION_TOPIC)); + +for(Attachment attach : msg.getAttachments()) { + for(TNEFAttribute attr : attach.getAttributes) { + System.out.println("A.TNEF : " + attr); + } + for(MAPIAttribute attr : attach.getMAPIAttributes) { + System.out.println("A.MAPI : " + attr); + } + System.out.println("Filename is " + attach.getAttribute(TNEFProperty.CID_ATTACHTITLE)); + System.out.println("Extension is " + attach.getMAPIAttribute(MAPIProperty.ATTACH_EXTENSION)); +} + </source> + </section> + </section> + + <section> + <title>Investigating a TNEF file</title> + + <p>To get a feel for the contents of a file, and to track down + where data of interest is stored, HMEF comes with + <link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hmef/dev/">HMEFDumper</link> + to print out the contents of the file.</p> + </section> + + <section> + <title>Limitations</title> + <p>HMEF is currently a work-in-progress, and not everything + works yet. The current limitations are:</p> + <ul> + <li>Compressed RTF Message Bodies are not correctly + decompressed. This means that a call to + <em>HMEFMessage.getBody()</em> is unlikely to return the + correct RTF.</li> + <li>Non-standard MAPI properties from the range 0x8000 to 0x8fff + may not be being quite correctly turned into attributes. + The values show up, but the name and type may not always + be correct.</li> + <li>All testing so far has been performed on a small number of + English documents. We think we're correctly turning bytes into + Java unicode strings, but we need a few non-English sample + files in the test suite to verify this!</li> + </ul> </section> </body> </document> diff --git a/src/documentation/content/xdocs/hpbf/book.xml b/src/documentation/content/xdocs/hpbf/book.xml new file mode 100644 index 0000000000..9aceee7a36 --- /dev/null +++ b/src/documentation/content/xdocs/hpbf/book.xml @@ -0,0 +1,35 @@ +<?xml version="1.0"?> +<!-- + ==================================================================== + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + ==================================================================== +--> +<!DOCTYPE book PUBLIC "-//APACHE//DTD Cocoon Documentation Book V1.0//EN" "../dtd/book-cocoon-v10.dtd"> + +<book software="POI Project" + title="HPBF" + copyright="@year@ POI Project"> + + <menu label="Apache POI"> + <menu-item label="Top" href="../index.html"/> + </menu> + + <menu label="HPBF"> + <menu-item label="Overview" href="index.html"/> + <menu-item label="File Format" href="file-format.xml"/> + </menu> + +</book> diff --git a/src/documentation/content/xdocs/hpbf/index.xml b/src/documentation/content/xdocs/hpbf/index.xml index e862efb19c..4443db97c9 100644 --- a/src/documentation/content/xdocs/hpbf/index.xml +++ b/src/documentation/content/xdocs/hpbf/index.xml @@ -45,7 +45,10 @@ the document (partly supported). Additional low level code to process the file format may follow, if there is demand and developer interest warrant it.</p> - <p>At this time, there is no <em>usermodel</em> api or similar. + <p>Text Extraction is available via the + <em>org.apache.poi.hpbf.extractor.PublisherTextExtractor</em> + class.</p> + <p>At this time, there is no <em>usermodel</em> api or similar. There is only low level support for certain parts of the file, but by no means all of it.</p> <p>Our current understanding of the file format is documented |