+++ /dev/null
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
-
-<document>
- <header>
- <title>HDF</title>
- <subtitle>Word file format</subtitle>
- <authors>
- <person name="S. Ryan Ackley" email="sackley@cfl.rr.com"/>
- </authors>
- </header>
-
- <body>
- <section><title>The Word 97 File Format in semi-plain English</title>
-
- <p>The purpose of this document is to give a brief high level overview of the
- HDF document format. This document does not go into in-depth technical
- detail and is only meant as a supplement to the Microsoft Word 97 Binary
- File Format freely available at <link href="http://wotsit.org">Wotsit.org</link>.</p>
- <p>The OLE file format is not discussed in this document. It is assumed that
- the reader has a working knowledge of the POIFS API. </p>
-
- <section><title>Word file structure</title>
- <p>A Word file is made up of the document text and data structures
- containing formatting information about the text. Of course, this is a
- very simplified illustration. There are fields and macros and other
- things that have not been considered. At this stage, HDF is mainly
- concerned with formatted text.</p>
- </section>
- <section><title>Reading Word files</title>
- <p>The entry point for HDF's reading of a Word file is the File Information
- Block (FIB). This structure is the entry point for the locations and size
- of a document's text and data structures. The FIB is located at the
- beginning of the main stream.</p>
- <section><title>Text</title>
- <p>The document's text is also located in the main stream. Its starting
- location is given as FIB.fcMin and its length is given in bytes by
- FIB.ccpText. These two values are not very useful in getting the text
- because of unicode. There may be unicode text intermingled with ASCII
- text. That brings us to the piece table.</p>
- <p>The piece table is used to divide the text into non-unicode and unicode
- pieces. The size and offset are given in FIB.fcClx and FIB.lcbClx
- respectively. The piece table may contain Property Modifiers (prm).
- These are for complex(fast-saved) files and are skipped. Each text piece
- contains offsets in the main stream that contain text for that piece.
- If the piece uses unicode, the file offset is masked with a certain bit.
- Then you have to unmask the bit and divide by 2 to get the real file
- offset. </p>
- </section>
- <section><title>Text Formatting</title>
- <section><title>Stylesheet</title>
- <p>All text formatting is based on styles contained in the StyleSheet.
- The StyleSheet is a data structure containing among other things, style
- descriptions. Each style description can contain a paragraph style and
- a character style or simply a character style. Each style description
- is stored in a compressed version on file. Basically these are deltas
- from another style.</p>
- <p>Eventually, you have to chain back to the nil style which is an
- imaginary style with certain implied values.</p>
- </section>
- <section><title>Paragraph and Character styles</title>
- <p>Paragraph and Character formatting properties for a document's text are
- stored on file as deltas from some base style in the Stylesheet. The
- deltas are used to create a complete uncompressed style in memory.</p>
- <p>Uncompressed paragraph styles are represented by the Pargraph
- Properties(PAP) data structure. Uncompressed character styles are
- represented by the Character Properties(CHP) data structure. The styles
- for the document text are stored in compressed format in the
- corresponding Formatted Disk Pages (FKP). A compressed PAP is referred
- to as a PAPX and a compressed CHP is a CHPX. The FKP locations are
- stored in the bin table. There are seperate bin tables for CHPXs and
- PAPXs. The bin tables' locations and sizes are stored in the FIB.</p>
- <p>A FKP is a 512 byte OLE page. It contains the offsets of the beginning
- and end of each paragraph/character run in the main stream and the
- compressed properties for that interval. The compessed PAPX is based on
- its base style in the StyleSheet. The compressed CHPX is based on the
- enclosing paragraph's base style in the Stylesheet.</p>
- </section>
- <section><title>Uncompressing styles and other data structures</title>
- <p>All compressed properties(CHPX, PAPX, SEPX) contain a grpprl. A grpprl
- is an array of sprms. A sprm defines a delta from some base property.
- There is a table of possible sprms in the Word 97 spec. Each sprm is a
- two byte operand followed by a parameter. The parameter size depends on
- the sprm. Each sprm describes an operation that should be performed on
- the base style. After every sprm in the grpprl is performed on the base
- style you will have the style for the paragraph, character run,
- section, etc.</p>
- </section>
- </section>
- </section>
- </section>
- </body>
-</document>
-
+++ /dev/null
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
-
-<document>
- <header>
- <title>Jakarta POI - HDF -Java APIs with XML manipulate MS-Word</title>
- <subtitle>Overview</subtitle>
- <authors>
- <person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
- <person name="Andrew C. Oliver" email="acoliver@apache.org"/>
- <person name="Ryan Ackley" email="sackley@apache.org"/>
- </authors>
- </header>
-
- <body>
- <section><title>Overview</title>
-
- <p>HDF is the name of OUR port of the Microsoft Word 97(-2002) file format to
- pure Java.</p>
- <p>HDF is still in early development. It is in the
- <link href="http://cvs.apache.org/viewcvs/jakarta-poi/src/scratchpad/">scratchpad section of the
- CVS.</link> Source code in the <em>org.apache.poi.hdf.extractor</em> tree is
- legacy code. Source in the <em>org.apache.poi.hdf.model</em>
- tree is the old legacy code refactored into an object model. Check the How-To
- page for detailed examples on using HDF.
- </p>
- <p>
- We are looking for developers!!! If you are interested in helping with HDF
- familiarize yourself with the source code and just start coding. Make sure
- you read the guidelines for <link href="http://jakarta.apache.org/poi/getinvolved/index.html">
- getting involved</link></p>
- </section>
- </body>
-</document>
+++ /dev/null
-<?xml version="1.0"?>
-<!-- edited with XMLSPY v5 rel. 4 U (http://www.xmlspy.com) by Ryan Ackley (Myself) -->
-<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
-<document>
- <body>
- <p>HWPF Milestones</p>
- <table>
- <tr>
- <th>
- Milestones
- </th>
- <th>
- Target Date
- </th>
- <th>
- Owner
- </th>
- </tr>
- <tr>
- <td>
- Read in a Word document
-with minimum formatting
-(no lists, tables, footnotes,
-endnotes, headers, footers)
-and write it back out with the
-result viewable in Word
-97/2000
- </td>
- <td>
- 07/11/2003
- </td>
- <td>
- Ryan
- </td>
- </tr>
- <tr>
- <td>
- Add support for Lists and
-Tables
- </td>
- <td>
- 8/15/2003
- </td>
- <td>
-  
- </td>
- </tr>
- <tr>
- <td>
- HWPF 1.0-alpha release with
-documentation and examples
- </td>
- <td>
- 8/18/2003
- </td>
- <td>
- Praveen/Ryan
- </td>
- </tr>
- <tr>
- <td>
- Add support for Headers,
-Footers, endnotes, and
-footnotes
- </td>
- <td>
- 8/31/2003
- </td>
- <td>
- ?
- </td>
- </tr>
- <tr>
- <td>
- Add support for forms and
-mail merge
- </td>
- <td>
- September/October 2003
- </td>
- <td>
- ?
- </td>
- </tr>
- </table>
- <p>HWPF Task Lists</p>
- <p>Read in a Word document with minimum formatting (no lists, tables, footnotes,
-endnotes, headers, footers) and write it back out with the result viewable in Word 97/2000</p>
- <table>
- <tr>
- <th>
- Task
- </th>
- <th>
- Target Date
- </th>
- <th>
- Owner
- </th>
- </tr>
- <tr>
- <td>
- Create classes to read and
-write low level data
-structures with test cases
- </td>
- <td>
- 7/10/2003
- </td>
- <td>
- Ryan
- </td>
- </tr>
- <tr>
- <td>
- Create classes to read and
-write FontTable and Font
-names with test case
- </td>
- <td>
- 7/10/2003
- </td>
- <td>
- Praveen
- </td>
- </tr>
- <tr>
- <td>
- Final test
- </td>
- <td>
- 7/11/2003
- </td>
- <td>
- Ryan
- </td>
- </tr>
- </table>
- <p>Develop user friendly API so it is fun and easy to read and write word documents
-with java.</p>
- <table>
- <tr>
- <th>
- Task
- </th>
- <th>
- Target Date
- </th>
- <th>
- Owner
- </th>
- </tr>
- <tr>
- <td>
- Develop a way for SPRMS to
-be compressed and
-uncompressed
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Override CHPAbstractType
-with a concrete class that
-exposes attributes with
-human readable names
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Override PAPAbstractType
-with a concrete class that
-exposes attributes with
-human readable names
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Override SEPAbstractType
-with a concrete class that
-exposes attributes with
-human readable names
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Override DOPAbstractType
-with a concrete class that
-exposes attributes with
-human readable names
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Override TAPAbstractType
-with a concrete class that
-exposes attributes with
-human readable names
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Override TCAbstractType
-with a concrete class that
-exposes attributes with
-human readable names
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Develop a VerifyIntegrity
-class for testing so it is easy
-to determine if a Word
-Document is well-formed.
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Develop general intuitive
-API to tie everything together
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- </table>
- <p>Add support for lists and tables</p>
- <table>
- <tr>
- <th>
- Task
- </th>
- <th>
- Target Date
- </th>
- <th>
- Owner
- </th>
- </tr>
- <tr>
- <td>
- Add data structures for
-reading and writing list data
-with test cases.
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Add data structures for
-reading and writing tables
-with test cases.
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- </table>
- <p>HWPF 1.0-alpha release with documentation and examples</p>
- <table>
- <tr>
- <th>
- Task
- </th>
- <th>
- Target Date
- </th>
- <th>
- Owner
- </th>
- </tr>
- <tr>
- <td>
- Document the user model
-API
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Document the low level
-classes
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- <tr>
- <td>
- Come up with detailed How-To’s
- </td>
- <td>
-
- </td>
- <td>
-
- </td>
- </tr>
- </table>
- </body>
-</document>