<p>HMEF is the POI Project's pure Java implementation of the
TNEF (Transport Neurtral Encoding Format), aka winmail.dat,
which is used by Outlook and Exchange in some situations.</p>
- <p>Currently, HMEF provides a low-level, read-only api for
- accessing core TNEF attributes. It is able to provide access
- to both TNEF and MAPI attributes, and low level access to
- attachments. Compressed RTF is not yet fully supported, and
- user-facing access to common attributes and attachment contents
- is not yet present.</p>
- <p>HMEF is currently very much a work-in-progress, and we hope
- to add a text extractor and attachment extractor in the not
- too distant future.</p>
- <p>To get a feel for the contents of a file, and to track down
- where data of interest is stored, HMEF comes with
- <link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hmef/dev/">HMEFDumper</link>
- to print out the contents of the file.</p>
+ <p>Currently, HMEF provides a read-only api for accessing common
+ message and attachment attributes, including the message body
+ and attachment files. In addition, it's possible to have
+ read-only access to all of the underlying TNEF and MAPI
+ attributes of the message and attachments.</p>
+ <p>HMEF also provides a command line tool for extracting out
+ the message body and attachment files from a TNEF (winmail.dat)
+ file.</p>
+
<note>
This code currently lives the
<link href="http://svn.apache.org/viewcvs.cgi/poi/trunk/src/scratchpad/">scratchpad area</link>
Ensure that you have the scratchpad jar or the scratchpad
build area in your classpath before experimenting with this code.
</note>
+ <note>
+ This code is a new POI feature, and the first release that will
+ contain it will be POI 3.8 beta 2. Until then, you will need to
+ build your own jars from a <link href="../subversion.html">svn
+ checkout</link>.
+ </note>
+ </section>
+
+ <section>
+ <title>Using HMEF to access TNEF (winmail.dat) files</title>
+
+ <section>
+ <title>Easy extraction of message body and attachment files</title>
+
+ <p>The class <em>org.apache.poi.hmef.extractor.HMEFContentsExtractor</em>
+ provides both command line and Java extraction. It allows the
+ saving of the message body (an RTF file), and all of the
+ attachment files, to a single directory as specified.</p>
+
+ <p>From the command line, simply call the class specifying the
+ TNEF file to extract, and the directory to place the extracted
+ files into, eg:</p>
+ <source>
+ java -classpath poi-3.8-FINAL.jar:poi-scratchpad-3.8-FINAL.jar org.apache.poi.hmef.extractor.HMEFContentsExtractor winmail.dat /tmp/extracted/
+ </source>
+
+ <p>From Java, there are two method calls on the class, one to
+ extract the message body RTF to a file, and the other to extract
+ all the attachments to a directory. A typical use would be:</p>
+ <source>
+public void extract(String winmailFilename, String directoryName) throws Exception {
+ HMEFContentsExtractor ext = new HMEFContentsExtractor(new File(winmailFilename));
+
+ File dir = new File(directoryName);
+ File rtf = new File(dir, "message.rtf");
+ if(! dir.exists()) {
+ throw new FileNotFoundException("Output directory " + dir.getName() + " not found");
+ }
+
+ System.out.println("Extracting...");
+ ext.extractMessageBody(rtf);
+ ext.extractAttachments(dir);
+ System.out.println("Extraction completed");
+}
+ </source>
+ </section>
+
+ <section>
+ <title>Attachment attributes and contents</title>
+
+ <p>To get at your attachments, simply call the
+ <em>getAttachments()</em> method on a <em>HMEFMessage</em>
+ instance, and you'll receive a list of all the attachments.</p>
+ <p>When you have a <em>org.apache.poi.hmef.Attachment</em> object,
+ there are several helper methods available. These will all
+ return the value of the appropriate underlying attachment
+ attributes, or null if for some reason the attribute isn't
+ present in your file.</p>
+ <ul>
+ <li><em>getFilename()</em> - returns the name of the attachment
+ file, possibly in 8.3 format</li>
+ <li><em>getLongFilename()</em> - returns the full name of the
+ attachment file</li>
+ <li><em>getExtension()</em> - returns the extension of the
+ attachment file, including the "."</li>
+ <li><em>getModifiedDate()</em> - returns the date that the
+ attachment file was last edited on</li>
+ <li><em>getContents()</em> - returns a byte array of the contents
+ of the attached file</li>
+ <li><em>getRenderedMetaFile()</em> - returns a byte array of
+ a windows meta file representation of the attached file</li>
+ </ul>
+ </section>
+
+ <section>
+ <title>Message attributes and message body</title>
+
+ <p>A <em>org.apache.poi.hmef.HMEFMessage</em> instance is created
+ from an <em>InputStream</em> of the underlying TNEF (winmail.dat)
+ file.</p>
+ <p>From a <em>HMEFMessage</em>, there are three main methods of
+ interest to call:</p>
+ <ul>
+ <li><em>getBody()</em> - returns a String containing the RTF
+ contents of the message body.
+ <em>Note - see limitations</em></li>
+ <li><em>getSubject()</em> - returns the message subject</li>
+ <li><em>getAttachments()</em> - returns the list of
+ <em>Attachment</em> objects for the message</li>
+ </ul>
+ </section>
+
+ <section>
+ <title>Low level attribute access</title>
+
+ <p>Both Messages and Attachments contain two kinds of attributes.
+ These are <em>TNEFAttribute</em> and <em>MAPIAttribute</em>.</p>
+ <p>TNEFAttribute is specific to TNEF files in terms of the
+ available types and properties. In general, Attachments have a
+ few more useful ones of these then Messages.</p>
+ <p>MAPIAttributes hold standard MAPI properties and values, and
+ work in a similar way to <link href="../hsmf/">HSMF
+ (Outlook)</link> does. There are typically many of these on both
+ Messages and Attachments. <em>Note - see limitations</em></p>
+ <p>Both <em>HMEFMessage</em> and <em>Attachment</em> supports
+ support two different ways of getting to attributes of interest.
+ Firstly, they support list getters, to return all attributes
+ (either TNEF or MAPI). Secondly, they support specific getters by
+ TNEF or MAPI property.</p>
+ <source>
+HMEFMessage msg = new HMEFMessage(new FileInputStream(file));
+for(TNEFAttribute attr : msg.getMessageAttributes) {
+ System.out.println("TNEF : " + attr);
+}
+for(MAPIAttribute attr : msg.getMessageMAPIAttributes) {
+ System.out.println("MAPI : " + attr);
+}
+System.out.println("Subject is " + msg.getMessageMAPIAttribute(MAPIProperty.CONVERSATION_TOPIC));
+
+for(Attachment attach : msg.getAttachments()) {
+ for(TNEFAttribute attr : attach.getAttributes) {
+ System.out.println("A.TNEF : " + attr);
+ }
+ for(MAPIAttribute attr : attach.getMAPIAttributes) {
+ System.out.println("A.MAPI : " + attr);
+ }
+ System.out.println("Filename is " + attach.getAttribute(TNEFProperty.CID_ATTACHTITLE));
+ System.out.println("Extension is " + attach.getMAPIAttribute(MAPIProperty.ATTACH_EXTENSION));
+}
+ </source>
+ </section>
+ </section>
+
+ <section>
+ <title>Investigating a TNEF file</title>
+
+ <p>To get a feel for the contents of a file, and to track down
+ where data of interest is stored, HMEF comes with
+ <link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hmef/dev/">HMEFDumper</link>
+ to print out the contents of the file.</p>
+ </section>
+
+ <section>
+ <title>Limitations</title>
+ <p>HMEF is currently a work-in-progress, and not everything
+ works yet. The current limitations are:</p>
+ <ul>
+ <li>Compressed RTF Message Bodies are not correctly
+ decompressed. This means that a call to
+ <em>HMEFMessage.getBody()</em> is unlikely to return the
+ correct RTF.</li>
+ <li>Non-standard MAPI properties from the range 0x8000 to 0x8fff
+ may not be being quite correctly turned into attributes.
+ The values show up, but the name and type may not always
+ be correct.</li>
+ <li>All testing so far has been performed on a small number of
+ English documents. We think we're correctly turning bytes into
+ Java unicode strings, but we need a few non-English sample
+ files in the test suite to verify this!</li>
+ </ul>
</section>
</body>
</document>