From ce77707b83b197ffd1b7a6be4b06d03ee3c042e4 Mon Sep 17 00:00:00 2001 From: Nick Burch Date: Fri, 4 Mar 2011 11:59:23 +0000 Subject: [PATCH] Add documentation for the HMEF (TNEF/winmail.dat) support so far. Also add a little bit to the HPBF docs, and tweak build.xml to check the right files when deciding if the docs are up to date. git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1077891 13f79535-47bb-0310-9956-ffa450edef68 --- build.xml | 2 +- .../content/xdocs/hmef/index.xml | 182 ++++++++++++++++-- src/documentation/content/xdocs/hpbf/book.xml | 35 ++++ .../content/xdocs/hpbf/index.xml | 5 +- 4 files changed, 209 insertions(+), 15 deletions(-) create mode 100644 src/documentation/content/xdocs/hpbf/book.xml diff --git a/build.xml b/build.xml index a61f43a334..0d17e8d3fc 100644 --- a/build.xml +++ b/build.xml @@ -748,7 +748,7 @@ under the License. - + diff --git a/src/documentation/content/xdocs/hmef/index.xml b/src/documentation/content/xdocs/hmef/index.xml index 99a3c92298..ef4246db40 100644 --- a/src/documentation/content/xdocs/hmef/index.xml +++ b/src/documentation/content/xdocs/hmef/index.xml @@ -35,19 +35,15 @@

HMEF is the POI Project's pure Java implementation of the TNEF (Transport Neurtral Encoding Format), aka winmail.dat, which is used by Outlook and Exchange in some situations.

-

Currently, HMEF provides a low-level, read-only api for - accessing core TNEF attributes. It is able to provide access - to both TNEF and MAPI attributes, and low level access to - attachments. Compressed RTF is not yet fully supported, and - user-facing access to common attributes and attachment contents - is not yet present.

-

HMEF is currently very much a work-in-progress, and we hope - to add a text extractor and attachment extractor in the not - too distant future.

-

To get a feel for the contents of a file, and to track down - where data of interest is stored, HMEF comes with - HMEFDumper - to print out the contents of the file.

+

Currently, HMEF provides a read-only api for accessing common + message and attachment attributes, including the message body + and attachment files. In addition, it's possible to have + read-only access to all of the underlying TNEF and MAPI + attributes of the message and attachments.

+

HMEF also provides a command line tool for extracting out + the message body and attachment files from a TNEF (winmail.dat) + file.

+ This code currently lives the scratchpad area @@ -55,7 +51,167 @@ Ensure that you have the scratchpad jar or the scratchpad build area in your classpath before experimenting with this code. + + This code is a new POI feature, and the first release that will + contain it will be POI 3.8 beta 2. Until then, you will need to + build your own jars from a svn + checkout. + + + +
+ Using HMEF to access TNEF (winmail.dat) files + +
+ Easy extraction of message body and attachment files + +

The class org.apache.poi.hmef.extractor.HMEFContentsExtractor + provides both command line and Java extraction. It allows the + saving of the message body (an RTF file), and all of the + attachment files, to a single directory as specified.

+ +

From the command line, simply call the class specifying the + TNEF file to extract, and the directory to place the extracted + files into, eg:

+ + java -classpath poi-3.8-FINAL.jar:poi-scratchpad-3.8-FINAL.jar org.apache.poi.hmef.extractor.HMEFContentsExtractor winmail.dat /tmp/extracted/ + + +

From Java, there are two method calls on the class, one to + extract the message body RTF to a file, and the other to extract + all the attachments to a directory. A typical use would be:

+ +public void extract(String winmailFilename, String directoryName) throws Exception { + HMEFContentsExtractor ext = new HMEFContentsExtractor(new File(winmailFilename)); + + File dir = new File(directoryName); + File rtf = new File(dir, "message.rtf"); + if(! dir.exists()) { + throw new FileNotFoundException("Output directory " + dir.getName() + " not found"); + } + + System.out.println("Extracting..."); + ext.extractMessageBody(rtf); + ext.extractAttachments(dir); + System.out.println("Extraction completed"); +} + +
+ +
+ Attachment attributes and contents + +

To get at your attachments, simply call the + getAttachments() method on a HMEFMessage + instance, and you'll receive a list of all the attachments.

+

When you have a org.apache.poi.hmef.Attachment object, + there are several helper methods available. These will all + return the value of the appropriate underlying attachment + attributes, or null if for some reason the attribute isn't + present in your file.

+
    +
  • getFilename() - returns the name of the attachment + file, possibly in 8.3 format
  • +
  • getLongFilename() - returns the full name of the + attachment file
  • +
  • getExtension() - returns the extension of the + attachment file, including the "."
  • +
  • getModifiedDate() - returns the date that the + attachment file was last edited on
  • +
  • getContents() - returns a byte array of the contents + of the attached file
  • +
  • getRenderedMetaFile() - returns a byte array of + a windows meta file representation of the attached file
  • +
+
+ +
+ Message attributes and message body + +

A org.apache.poi.hmef.HMEFMessage instance is created + from an InputStream of the underlying TNEF (winmail.dat) + file.

+

From a HMEFMessage, there are three main methods of + interest to call:

+
    +
  • getBody() - returns a String containing the RTF + contents of the message body. + Note - see limitations
  • +
  • getSubject() - returns the message subject
  • +
  • getAttachments() - returns the list of + Attachment objects for the message
  • +
+
+ +
+ Low level attribute access + +

Both Messages and Attachments contain two kinds of attributes. + These are TNEFAttribute and MAPIAttribute.

+

TNEFAttribute is specific to TNEF files in terms of the + available types and properties. In general, Attachments have a + few more useful ones of these then Messages.

+

MAPIAttributes hold standard MAPI properties and values, and + work in a similar way to HSMF + (Outlook) does. There are typically many of these on both + Messages and Attachments. Note - see limitations

+

Both HMEFMessage and Attachment supports + support two different ways of getting to attributes of interest. + Firstly, they support list getters, to return all attributes + (either TNEF or MAPI). Secondly, they support specific getters by + TNEF or MAPI property.

+ +HMEFMessage msg = new HMEFMessage(new FileInputStream(file)); +for(TNEFAttribute attr : msg.getMessageAttributes) { + System.out.println("TNEF : " + attr); +} +for(MAPIAttribute attr : msg.getMessageMAPIAttributes) { + System.out.println("MAPI : " + attr); +} +System.out.println("Subject is " + msg.getMessageMAPIAttribute(MAPIProperty.CONVERSATION_TOPIC)); + +for(Attachment attach : msg.getAttachments()) { + for(TNEFAttribute attr : attach.getAttributes) { + System.out.println("A.TNEF : " + attr); + } + for(MAPIAttribute attr : attach.getMAPIAttributes) { + System.out.println("A.MAPI : " + attr); + } + System.out.println("Filename is " + attach.getAttribute(TNEFProperty.CID_ATTACHTITLE)); + System.out.println("Extension is " + attach.getMAPIAttribute(MAPIProperty.ATTACH_EXTENSION)); +} + +
+
+ +
+ Investigating a TNEF file + +

To get a feel for the contents of a file, and to track down + where data of interest is stored, HMEF comes with + HMEFDumper + to print out the contents of the file.

+
+ +
+ Limitations +

HMEF is currently a work-in-progress, and not everything + works yet. The current limitations are:

+
diff --git a/src/documentation/content/xdocs/hpbf/book.xml b/src/documentation/content/xdocs/hpbf/book.xml new file mode 100644 index 0000000000..9aceee7a36 --- /dev/null +++ b/src/documentation/content/xdocs/hpbf/book.xml @@ -0,0 +1,35 @@ + + + + + + + + + + + + + + + + diff --git a/src/documentation/content/xdocs/hpbf/index.xml b/src/documentation/content/xdocs/hpbf/index.xml index e862efb19c..4443db97c9 100644 --- a/src/documentation/content/xdocs/hpbf/index.xml +++ b/src/documentation/content/xdocs/hpbf/index.xml @@ -45,7 +45,10 @@ the document (partly supported). Additional low level code to process the file format may follow, if there is demand and developer interest warrant it.

-

At this time, there is no usermodel api or similar. +

Text Extraction is available via the + org.apache.poi.hpbf.extractor.PublisherTextExtractor + class.

+

At this time, there is no usermodel api or similar. There is only low level support for certain parts of the file, but by no means all of it.

Our current understanding of the file format is documented -- 2.39.5