You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

POIFSFormat.html 33KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837
  1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
  2. <HTML>
  3. <HEAD>
  4. <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=iso-8859-1">
  5. <TITLE></TITLE>
  6. <META NAME="GENERATOR" CONTENT="StarOffice/5.2 (Linux)">
  7. <META NAME="AUTHOR" CONTENT=" ">
  8. <META NAME="CREATED" CONTENT="20010728;10223600">
  9. <META NAME="CHANGEDBY" CONTENT="Marc Johnson">
  10. <META NAME="CHANGED" CONTENT="20010810;13415800">
  11. <STYLE>
  12. <!--
  13. @page { margin-left: 1.25in; margin-right: 1.25in; margin-top: 1in; margin-bottom: 1in }
  14. H1 { margin-bottom: 0.08in; font-size: 16pt }
  15. TD P { margin-bottom: 0.08in }
  16. H2 { margin-bottom: 0.08in; font-size: 14pt; font-style: italic }
  17. H3 { margin-bottom: 0.08in }
  18. H4 { margin-bottom: 0.08in; font-size: 11pt; font-style: italic }
  19. P { margin-bottom: 0.08in }
  20. -->
  21. </STYLE>
  22. </HEAD>
  23. <BODY>
  24. <H1>POI Filesystem format</H1>
  25. <H2>Introduction</H2>
  26. <P STYLE="margin-bottom: 0in; font-weight: medium">
  27. The POI file format is essentially an archive wrapper
  28. around files. It is intended to mimic a filesystem. For
  29. the remainder of this document it is referred to as a
  30. filesystem in order to avoid confusion with the
  31. &quot;files&quot; it contains.
  32. </P>
  33. <P STYLE="margin-bottom: 0in; font-weight: medium; text-decoration: none">
  34. POI filesystems are compatible with those document formats
  35. used by a well-known software company's popular office
  36. productivity suite and programs outputting compatible
  37. data. Because the POI filesystem does not provide
  38. compression, encryption or any other worthwhile feature,
  39. its not a good choice unless you require interoperability
  40. with these programs.
  41. </P>
  42. <P STYLE="margin-bottom: 0in; font-weight: medium">
  43. The POI filesystem does not encode the documents
  44. themselves. For example, if you had a word processor file
  45. with the extension &quot;.doc&quot;, you would actually
  46. have a POI filesystem with a document file archived inside
  47. of the filesystem.
  48. </P>
  49. <H2>Document Conventions</H2>
  50. <P STYLE="margin-bottom: 0in">
  51. This document utilizes the numeric types as described by
  52. the Java Language Specification, which can be found at
  53. java.sun.com. In short:
  54. </P>
  55. <UL>
  56. <LI>
  57. <P STYLE="margin-bottom: 0in">
  58. a byte is an 8 bit signed integer ranging from
  59. (-128) to 127.
  60. </P>
  61. </LI>
  62. <LI>
  63. <P STYLE="margin-bottom: 0in">
  64. a short is a 16 bit signed integer ranging from
  65. (-32768) to 32767
  66. </P>
  67. </LI>
  68. <LI>
  69. <P STYLE="margin-bottom: 0in">
  70. an int is a 32 bit signed integer ranging from
  71. (-2.14e+9) to 2.14e+9
  72. </P>
  73. </LI>
  74. <LI>
  75. <P STYLE="margin-bottom: 0in">
  76. a long is a 64 bit signed integer ranging from
  77. (-9.22e+18) to 9.22e+18
  78. </P>
  79. </LI>
  80. </UL>
  81. <P STYLE="margin-bottom: 0in">
  82. The Java Language Specification spells out a number of
  83. other types that are not referred to by this document.
  84. </P>
  85. <P STYLE="margin-bottom: 0in">
  86. Where this document makes references to &quot;endian
  87. conversion&quot; it is referring to the byte order of
  88. stored numbers. Numbers in &quot;little-endian order&quot;
  89. are stored with the LEAST significant byte first. In order
  90. to properly read a short, for example, you'd read two
  91. bytes and then shift the second byte 8 bits to the left
  92. before performing an <CODE>or</CODE> operation to it
  93. against the first byte while stripping the
  94. &quot;sign&quot; from the first byte. The following code
  95. illustrates this method:
  96. </P>
  97. <P STYLE="text-decoration: none">
  98. <FONT FACE="Courier, monospace"><FONT
  99. SIZE=2><B>public int getShort (byte[ ] rec)
  100. {</B></FONT></FONT>
  101. </P>
  102. <P>
  103. <FONT FACE="Courier, monospace"><FONT SIZE=2><B>return (
  104. (rec[1] &lt;&lt; 8) | (rec[0] &amp; 0xff)
  105. );</B></FONT></FONT>
  106. </P>
  107. <P>
  108. <FONT FACE="Courier, monospace"><FONT
  109. SIZE=2><B>}</B></FONT></FONT>
  110. </P>
  111. <H2>Filesystem Introduction</H2>
  112. <P STYLE="margin-bottom: 0in">
  113. POI filesystems are essentially normal files stored on a
  114. Java-compatible platform's native filesystem. They are
  115. identified by names ending in a four character identifier
  116. noting what type of data they contain. For example, a file
  117. ending in &quot;.xls&quot; would likely contain
  118. spreadsheet data, and a file ending in &quot;.doc&quot;
  119. would probably contain a word processing document. POI
  120. filesystems are called &quot;filesystem&quot;, because
  121. they contain multiple embedded files in a manner similar
  122. to traditional filesystems. Along functional lines, it
  123. would be more accurate to call these POI archives.
  124. </P>
  125. <P STYLE="margin-bottom: 0in">
  126. POI filesystems do not provide encryption, compression, or
  127. any other feature of a modern archive and are therefore a
  128. poor choice for implementing new file formats. It is
  129. suggested that POI filesystems are most useful for
  130. interoperability with legacy applications that use a
  131. compatible file format.
  132. </P>
  133. <H2>Filesystem Walkthrough</H2>
  134. <P STYLE="margin-bottom: 0in">
  135. This is a walkthrough of a POI filesystem and how it is
  136. put together. It is not intended to give a concise
  137. description but to give a &quot;big picture&quot; of the
  138. general structure and how it's interpreted.
  139. </P>
  140. <P STYLE="margin-bottom: 0in">
  141. A POI filesystem begins with a <A
  142. HREF="HeaderBlock"><B><I>header</I></B></A>. This header
  143. identifies locations in the file by function and provides
  144. a sanity check identifying a native filesystem file as
  145. indeed a POI filesystem.
  146. </P>
  147. <P STYLE="margin-bottom: 0in">
  148. The first 64 bits of the header compose a <B><I>magic
  149. number identifier.</I></B> This identifier tells the
  150. client software that this is indeed a POI filesystem and
  151. that it should be treated as such. This is a &quot;sanity
  152. check&quot; to make sure this is a POI filesystem and not
  153. some other format. The header also contains an <B><I>array
  154. of block numbers</I></B>. These block numbers refer to
  155. blocks in the file. When these blocks are read together
  156. they form the <A HREF="#BAT"><B><I>Block Allocation
  157. Table</I></B></A>. The header also contains a pointer to
  158. the first element in the <A
  159. HREF="#PropertyTable"><B><I>property table</I></B></A>
  160. also known as the <A HREF="RootEntry"><B><I>root
  161. element</I></B></A>, and a pointer to the <B>small Block
  162. Allocation Table (SBAT)</B>.
  163. </P>
  164. <P STYLE="margin-bottom: 0in">
  165. The <A HREF="#BAT"><B><I>block allocation
  166. table</I></B></A> or <B><I>BAT</I></B>, along with the <A
  167. HREF="#PropertyTable"><B><I>property table</I></B></A>
  168. specify which blocks in the filesystem belong to which
  169. files. It is somewhat hard to conceptualize the Block
  170. Allocation Table at first. The block allocation table is
  171. essentially an array of integers that point at each
  172. other. These elements form chains.
  173. </P>
  174. <P STYLE="margin-bottom: 0in">
  175. To read the <A HREF="#BAT"><B><I>block allocation
  176. table</I></B></A> you must first read the <B><I>start
  177. block </I></B>of the file from the <A
  178. HREF="#PropertyTable"><B><I>property
  179. table</I></B></A>. This is both your index for the next
  180. element in the <B><I>BAT </I></B>array as well as the
  181. index of the first block in your file. For instance: if
  182. the <B><I>start block</I></B> from your file's property is
  183. 0 then you read block 0 (the first block after the header)
  184. from your filesystem as the first block of your file. You
  185. also read element 0 from the <B><I>BAT array</I></B>.
  186. Supposing this element has a value equal to 2, you'd read
  187. block 2 from your filesystem as the next block of your
  188. file and element 2 from your <B><I>BAT array</I></B>.
  189. This will be covered further later in this document.
  190. </P>
  191. <P STYLE="margin-bottom: 0in">
  192. The <A HREF="#PropertyTable"><B><I>Property
  193. Table</I></B></A> is essentially the directory structure
  194. for the filesystem. It consists of the name of the file or
  195. directory, its <B><I>start block</I></B> in both the
  196. filesystem and <B><I>BAT</I></B>, and its actual size.
  197. The first property in the <A
  198. HREF="#PropertyTable">property table</A> is the <A
  199. HREF="RootEntry"><B><I>root element</I></B></A>. Its real
  200. purpose is to hold the start block for the <B><I>small
  201. blocks.</I></B>
  202. </P>
  203. <H3>Filesystem Structure</H3>
  204. <P STYLE="margin-bottom: 0in; font-weight: medium">
  205. All values in the POI filesystem are stored in
  206. &quot;little-endian&quot; order, meaning you must reverse
  207. the order of the bytes before assigning them to
  208. variables. Assume the values you see below are originally
  209. stored backwards.
  210. </P>
  211. <P STYLE="margin-bottom: 0in; font-weight: medium">
  212. The POI filesystem is divided into 512 byte blocks. Each
  213. block has an implicit block-type. The order and
  214. description of these is described below.
  215. </P>
  216. <A NAME="HeaderBlock"><H3>Header Block</H3></A>
  217. <P STYLE="margin-bottom: 0in; font-weight: medium">
  218. The POI filesystem begins with a <B><I>header
  219. block</I></B>. The first 64 bits of the header form a long
  220. <B><I>file type id</I></B> or <B><I>magic number
  221. identifier</I></B> of
  222. <CODE>0xE11AB1A1E011CFD0L</CODE>. This is basically a
  223. sanity check. If this isn't the first thing in the header
  224. (and consequently the filesystem) then this is not a POI
  225. filesystem and should be read with some other library.
  226. </P>
  227. <P STYLE="margin-bottom: 0in; font-weight: medium">
  228. It's important to know the most important parts of the
  229. header. These are discussed in the rest of this
  230. section.
  231. </P>
  232. <H4>BATs</H4>
  233. <P STYLE="margin-bottom: 0in">
  234. At offset <B>0x2c</B> is an int specifying the number of
  235. elements in the <B><I>BAT array</I></B>. The array at
  236. <B>0x4c</B> an array of ints. This array contains the
  237. indices of every block in the <A HREF="#BAT">Block
  238. Allocation Table</A>.
  239. </P>
  240. <H4><I><B>XBATs</B></I></H4>
  241. <P STYLE="margin-bottom: 0in">
  242. Very large POI archives may have more blocks than can be
  243. addressed by the BAT blocks enumerated in the header
  244. block. How large? Well, the BAT array in the header can
  245. contain up to 109 BAT block indices; each BAT block
  246. references up to 128 blocks, and each block is 512 bytes,
  247. so we're talking about 109 * 128 * 512 = 6.8MB. That's a
  248. pretty respectable document! But, you could have much more
  249. data than that, and in today's world of cheap gigabyte
  250. drives, why not? So, the BAT may be extended in that
  251. event. The integer value at offset <B>0x44</B> of the
  252. header is the index of the first <B><I>extended BAT (XBAT)
  253. block</I></B>. At offset <B>0x48</B> of the header, there
  254. is an int value that specifies how many XBAT blocks there
  255. are. The XBAT blocks begin at the specified index into the
  256. array of blocks making up the POI filesystem, and continue
  257. in sequence for the specified count of XBAT blocks.
  258. </p>
  259. <p>
  260. Each XBAT block contains the indices of up to 128 BAT
  261. blocks, so the document size can be expanded by another
  262. 8MB for each XBAT block. The BAT blocks indexed by an XBAT
  263. block are appended to the end of the list of BAT blocks
  264. enumerated in the header block. Thus the BAT blocks
  265. enumerated in the header block are BAT blocks 0 through
  266. 108, the BAT blocks enumerated in the first XBAT block are
  267. BAT blocks 109 through 236, the BAT blocks enumerated in
  268. the second XBAT block are BAT blocks 237 through 364, and
  269. so on.
  270. </P>
  271. <p>
  272. Through the use of XBAT blocks, the limit on the overall
  273. document size is that imposed by the 4-byte block indices;
  274. if the indices are unsigned ints, the maximum file size is
  275. 2 terabytes, 1 terabyte if the indices are treated as
  276. signed ints. Either way, I have yet to see a disk drive
  277. large enough to accommodate such a file on the shelves at
  278. the local office supply stores.
  279. </p>
  280. <H4>SBATs</H4>
  281. <P STYLE="margin-bottom: 0in">
  282. If a file contained in a POI archive is smaller than 4096
  283. bytes, it is stored in small blocks. Small blocks are 64
  284. bytes in length and are contained within big blocks, up to
  285. 8 to a big block. As the main BAT is used to navigate the
  286. array of big blocks, so the <B><I>small block allocation
  287. table</I></B> is used to navigate the array of small
  288. blocks. The SBAT's start block index is found at offset
  289. <B>0x3C</B> of the header block, and remaining blocks
  290. constituting the SBAT are found by walking the main BAT as
  291. if it were an ordinary file in the POI filesystem (this
  292. process is described below).
  293. </P>
  294. <H4>Property Table Start Index</H4>
  295. <P STYLE="margin-bottom: 0in">
  296. An integer at address <B>0x30</B> specifies the start
  297. index of the <A HREF="#PropertyTable">property
  298. table</A>. This integer is specified as a
  299. <B><I>&quot;block index&quot;. </I></B>The <A
  300. HREF="#PropertyTable">Property Table</A> is stored, as is
  301. almost everything in a POI file system, in big blocks and
  302. walked via the BAT. The <A HREF="#PropertyTable">Property
  303. Table</A> is described below.
  304. </P>
  305. <A NAME="PropertyTable"><H3>Property Table</H3></A>
  306. <P STYLE="margin-bottom: 0in">
  307. The property table is essentially nothing more than the
  308. directory system. Properties are 128 byte records
  309. contained within the 512 byte blocks. The first property
  310. is always the <A HREF="RootEntry">Root Entry</A>. The
  311. following applies to individual properties within a
  312. property table:
  313. </P>
  314. <P STYLE="margin-bottom: 0in">
  315. At offset <B>0x00</B> in the property is the
  316. &quot;<B><I>name</I></B>&quot;. This is stored as an
  317. uncompressed 16 bit unicode string. In short every other
  318. byte corresponds to an &quot;ASCII&quot; character. The
  319. size of this string is stored at offset <B>0x40</B>
  320. (<B><I>string size</I></B>) as a short.
  321. </P>
  322. <P STYLE="margin-bottom: 0in">
  323. At offset <B>0x42</B> is the <B><I>property type</I></B>
  324. (byte). The type is 1 for directory, 2 for file or 5 for
  325. the Root Entry.
  326. </P>
  327. <P STYLE="margin-bottom: 0in">
  328. At offset <B>0x43</B> is the <B><I>node color</I></B>
  329. (byte). The color is either 1, (black), or 0,
  330. (red). Properties are apparently meant to be arranged in a
  331. red-black binary tree, subject to the following rules:
  332. <A name="node_rules"></A>
  333. <OL>
  334. <LI>The root of the tree is always black
  335. <LI>Two consecutive nodes cannot both be red
  336. <LI>A property is less than another property if its
  337. name length is less than the other property's name
  338. length
  339. <LI>If two properties have the same name length, the
  340. sort order is determined by the sort order of the
  341. properties' names.
  342. </OL>
  343. </P>
  344. <P STYLE="margin-bottom: 0in">
  345. At offset <B>0x44</B> is the index (int) of the
  346. <B><I>previous property</I></B>.
  347. </P>
  348. <P STYLE="margin-bottom: 0in">
  349. At offset <B>0x48</B> is the index (int) of the <B><I>next
  350. property</I></B>.
  351. </P>
  352. <P STYLE="margin-bottom: 0in">
  353. At offset <B>0x4C</B> is the index (int) of the
  354. <B><I>first directory entry</I></B>.
  355. </P>
  356. <P STYLE="margin-bottom: 0in">
  357. At offset <B>0x74</B> is an integer giving the <B><I>start
  358. block</I></B> for the file described by this
  359. property. This index corresponds to an index in the array
  360. of indices that is the Block Allocation Table (or the
  361. Small Block Allocation Table) as well as the index of the
  362. first block in the file.
  363. </P>
  364. <P STYLE="margin-bottom: 0in">
  365. At offset <B>0x78</B> is an integer giving the total
  366. <B><I>actual size</I></B> of the file pointed at by this
  367. property. If the file size is less than 4096, the file is
  368. stored in small blocks and the SBAT is used to walk the
  369. small blocks making up the file. If the file size is 4096
  370. or larger, the file is stored in big blocks and the main
  371. BAT is used to walk the big blocks making up the file. The
  372. exception to this rule is the <B><I>Root Entry</I></B>,
  373. which, regardless of its size, is ALWAYS stored in big
  374. blocks and the main BAT is used to walk the big blocks
  375. making up this special file.
  376. </P>
  377. <A NAME="RootEntry"><H3>Root Entry</H3></A>
  378. <P STYLE="margin-bottom: 0in">
  379. The <B><I>Root Entry</I></B> in the <A
  380. HREF="#PropertyTable"><B><I>Property Table</I></B></A>
  381. contains the information necessary to read and write small
  382. files, which are files less than 4096 bytes long. The
  383. start block field of the Root Entry is the start index of
  384. the <B><I>Small Block Array</I></B>, which is read like
  385. any other file in the POI filesysstem. Since the SBAT
  386. cannot be used without the Small Block Array, the Root
  387. Entry MUST be read or written using the <A
  388. HREF="#BAT"><B><I>Block Allocation Table</I></B></A>. The
  389. blocks making up the Small Block Array are divided into
  390. 64-byte small blocks, up to the size indicated in the Root
  391. Entry (which should always be a multiple of 64)
  392. </P>
  393. <H3>Walking the Nodes of the <A HREF="#PropertyTable">Property
  394. Table</A></H3>
  395. <P STYLE="margin-bottom: 0in">
  396. The individual properties form a directory tree, with the
  397. <B><I>Root Entry</I></B> as the directory tree's root, as
  398. shown in the accompanying drawing. Note the numbers in
  399. parentheses in each node; they represent the node's index
  400. in the array of properties. The <B>NEXT_PROP</B>,
  401. <B>PREVIOUS_PROP</B>, and <B>CHILD_PROP</B> fields hold
  402. these indices, and are used to navigate the tree.
  403. </P>
  404. <P>
  405. <IMG SRC="PropertySet.jpg">
  406. </P>
  407. <P STYLE="margin-bottom: 0in">
  408. Each <A NAME="directoryEntry">directory entry</A> (i.e., a
  409. property whose type is <B><I>directory</I></B> or
  410. <B><I>root entry</I></B>) uses its <B>CHILD_PROP</B> field
  411. to point to one of its subordinate (child) properties. It
  412. doesn't seem to matter which of its children it points
  413. to. Thus in the previous drawing, the Root Entry's
  414. CHILD_PROP field may contain 1, 4, or the index of one of
  415. its other children. Similarly, the directory node (index
  416. 1) may have, in its CHILD_PROP field, 2, 3, or the index
  417. of one of its other children.
  418. </P>
  419. <P STYLE="margin-bottom: 0in">
  420. The children of a given <A
  421. HREF="#directoryEntry">directory property</A> point to
  422. each other in a similar fashion by using their
  423. <B>NEXT_PROP</B> and <B>PREVIOUS_PROP</B> fields. The
  424. ordering of the children is governed by rules described <a
  425. href="#node_rules">here</a>
  426. </P>
  427. <P STYLE="margin-bottom: 0in">
  428. Unused <B>NEXT_PROP</B>, <B>PREVIOUS_PROP</B>, and
  429. <B>CHILD_PROP</B> fields contain the marker value of
  430. -1. All file properties have a value of -1 for their
  431. CHILD_PROP fields for example.
  432. </P>
  433. <A NAME="BAT"><H3>Block Allocation Table</H3></A>
  434. <P STYLE="margin-bottom: 0in">
  435. The <B><I>BAT blocks</I></B> are pointed at by the bat
  436. array contained in the <A HREF="HeaderBlock">header</A>
  437. and supplemented, if necessary, by the <B><I>XBAT
  438. blocks</I></B>. These blocks form a large table of
  439. integers. These integers are block numbers. The
  440. <B><I>Block Allocation Table</I></B> holds chains of
  441. integers. These chains are terminated with -2. The
  442. elements in these chains refer to blocks in the files. The
  443. starting block of a file is NOT specified in the BAT. It
  444. is specified by the <B><I>property</I></B> for a given
  445. file. The elements in this BAT are both the block number
  446. (within the file minus the header) AND the number of the
  447. next BAT element in the chain. This can be thought of as a
  448. linked list of blocks. The BAT array contains the links
  449. from one block to the next, including the end of chain
  450. marker.
  451. </P>
  452. <P>
  453. Here's an example: Let's assume that the BAT begins as
  454. follows:
  455. </P>
  456. <P STYLE="margin-bottom: 0in">
  457. <FONT FACE="Courier, monospace"><B>BAT[ 0 ] = 2</B></FONT>
  458. </P>
  459. <P STYLE="margin-bottom: 0in">
  460. <FONT FACE="Courier, monospace"><B>BAT[ 1 ] = 5</B></FONT>
  461. </P>
  462. <P STYLE="margin-bottom: 0in">
  463. <FONT FACE="Courier, monospace"><B>BAT[ 2 ] = 3</B></FONT>
  464. </P>
  465. <P STYLE="margin-bottom: 0in">
  466. <FONT FACE="Courier, monospace"><B>BAT[ 3 ] = 4</B></FONT>
  467. </P>
  468. <P STYLE="margin-bottom: 0in">
  469. <FONT FACE="Courier, monospace"><B>BAT[ 4 ] = 6</B></FONT>
  470. </P>
  471. <P STYLE="margin-bottom: 0in">
  472. <FONT FACE="Courier, monospace"><B>BAT[ 5 ] =
  473. -2</B></FONT>
  474. </P>
  475. <P STYLE="margin-bottom: 0in">
  476. <FONT FACE="Courier, monospace"><B>BAT[ 6 ] = 7</B></FONT>
  477. </P>
  478. <P STYLE="margin-bottom: 0in">
  479. <FONT FACE="Courier, monospace"><B>BAT[ 7 ] =
  480. -2</B></FONT>
  481. </P>
  482. <P STYLE="margin-bottom: 0in">
  483. <B>...</B>
  484. </P>
  485. <P STYLE="margin-bottom: 0in">
  486. Now, if we have a file whose <A
  487. HREF="#PropertyTable">Property Table</A> entry says it
  488. begins with index 0, we walk the BAT array and see that
  489. the file consists of blocks 0 (because the start block is
  490. 0), 2 (because BAT[ 0 ] is 2), 3 (BAT[ 2 ] is 3), 4 (BAT[
  491. 3 ] is 4), 6 (BAT[ 4 ] is 6), and 7 (BAT[ 6 ] is 7). It
  492. ends at block 7 because BAT[ 7 ] is -2, which is the end
  493. of chain marker.
  494. </P>
  495. <P STYLE="margin-bottom: 0in">
  496. Similarly, a file beginning at index 1 consists of
  497. blocks 1 and 5.
  498. </P>
  499. <P STYLE="margin-bottom: 0in">
  500. Other special numbers in a BAT array are:
  501. </P>
  502. <UL>
  503. <LI>
  504. <P STYLE="margin-bottom: 0in">
  505. -1, which indicates an unused block
  506. </P>
  507. </LI>
  508. <LI>
  509. <P STYLE="margin-bottom: 0in">
  510. -3, which indicates a &quot;special&quot; block,
  511. such as a block used to make up the Small Block
  512. Array, the <A HREF="#PropertyTable">Property
  513. Table</A>, the main BAT, or the SBAT
  514. </P>
  515. </LI>
  516. </UL>
  517. <H2>Filesystem Structures</H2>
  518. <P>
  519. The following outlines the basic filesystem structures.
  520. </P>
  521. <H3>Header (block 1) -- 512 (0x200) bytes</H3>
  522. <TABLE BORDER=0 CELLPADDING=4 CELLSPACING=0>
  523. <TR VALIGN=TOP>
  524. <TD><B>Field</B></TD>
  525. <TD><B>Description</B></TD>
  526. <TD><B>Offset</B></TD>
  527. <TD><B>Length</B></TD>
  528. <TD><B>Default value or const</B></TD>
  529. </TR>
  530. <TR VALIGN=TOP>
  531. <TD>FILETYPE</TD>
  532. <TD>Magic number identifying this as a POI
  533. filesystem.</TD>
  534. <TD>0x0000</TD>
  535. <TD>Long</TD>
  536. <TD>0xE11AB1A1E011CFD0</TD>
  537. </TR>
  538. <TR VALIGN=TOP>
  539. <TD>UK1</TD>
  540. <TD>Unknown constant</TD>
  541. <TD>0x0008</TD>
  542. <TD>Integer</TD>
  543. <TD>0</TD>
  544. </TR>
  545. <TR VALIGN=TOP>
  546. <TD>UK2</TD>
  547. <TD>Unknown Constant</TD>
  548. <TD>0x000C</TD>
  549. <TD>Integer</TD>
  550. <TD>0</TD>
  551. </TR>
  552. <TR VALIGN=TOP>
  553. <TD>UK3</TD>
  554. <TD>Unknown Constant</TD>
  555. <TD>0x0014</TD>
  556. <TD>Integer</TD>
  557. <TD>0</TD>
  558. </TR>
  559. <TR VALIGN=TOP>
  560. <TD>UK4</TD>
  561. <TD>Unknown Constant (revision?)</TD>
  562. <TD>0x0018</TD>
  563. <TD>Short</TD>
  564. <TD>0x003B</TD>
  565. </TR>
  566. <TR VALIGN=TOP>
  567. <TD>UK5</TD>
  568. <TD>Unknown Constant (version?)</TD>
  569. <TD>0x001A</TD>
  570. <TD>Short</TD>
  571. <TD>0x0003</TD>
  572. </TR>
  573. <TR VALIGN=TOP>
  574. <TD>UK6</TD>
  575. <TD>Unknown Constant</TD>
  576. <TD>0x001C</TD>
  577. <TD>Short</TD>
  578. <TD>-2</TD>
  579. </TR>
  580. <TR VALIGN=TOP>
  581. <TD>LOG_2_BIG_BLOCK_SIZE</TD>
  582. <TD>Log, base 2, of the big block size</TD>
  583. <TD>0x001E</TD>
  584. <TD>Short</TD>
  585. <TD>9 (2 ^ 9 = 512 bytes)</TD>
  586. </TR>
  587. <TR VALIGN=TOP>
  588. <TD>LOG_2_SMALL_BLOCK_SIZE</TD>
  589. <TD>Log, base 2, of the small block size</TD>
  590. <TD>0x0020</TD>
  591. <TD>Integer</TD>
  592. <TD>6 (2 ^ 6 = 64 bytes)</TD>
  593. </TR>
  594. <TR VALIGN=TOP>
  595. <TD>UK7</TD>
  596. <TD>Unknown Constant</TD>
  597. <TD>0x0024</TD>
  598. <TD>Integer</TD>
  599. <TD>0</TD>
  600. </TR>
  601. <TR VALIGN=TOP>
  602. <TD>UK8</TD>
  603. <TD>Unknown Constant</TD>
  604. <TD>0x0028</TD>
  605. <TD>Integer</TD>
  606. <TD>0</TD>
  607. </TR>
  608. <TR VALIGN=TOP>
  609. <TD>BAT_COUNT</TD>
  610. <TD>Number of elements in the BAT array</TD>
  611. <TD>0x002C</TD>
  612. <TD>Integer</TD>
  613. <TD>required</TD>
  614. </TR>
  615. <TR VALIGN=TOP>
  616. <TD>PROPERTIES_START</TD>
  617. <TD>Block index of the first block of the <A
  618. HREF="#PropertyTable">property table</A></TD>
  619. <TD>0x0030</TD>
  620. <TD>Integer</TD>
  621. <TD>required</TD>
  622. </TR>
  623. <TR VALIGN=TOP>
  624. <TD>UK9</TD>
  625. <TD>Unknown Constant</TD>
  626. <TD>0x0034</TD>
  627. <TD>Integer</TD>
  628. <TD>0</TD>
  629. </TR>
  630. <TR VALIGN=TOP>
  631. <TD>UK10</TD>
  632. <TD>Unknown Constant</TD>
  633. <TD>0x0038</TD>
  634. <TD>Integer</TD>
  635. <TD>0x00001000</TD>
  636. </TR>
  637. <TR VALIGN=TOP>
  638. <TD>SBAT_START</TD>
  639. <TD>Block index of first big block containing the
  640. small block allocation table (SBAT)</TD>
  641. <TD>0x003C</TD>
  642. <TD>Integer</TD>
  643. <TD>-2</TD>
  644. </TR>
  645. <TR VALIGN=TOP>
  646. <TD>UK11</TD>
  647. <TD>Unknown Constant</TD>
  648. <TD>0x0040</TD>
  649. <TD>Integer</TD>
  650. <TD>1</TD>
  651. </TR>
  652. <TR VALIGN=TOP>
  653. <TD>XBAT_START</TD>
  654. <TD>Block index of the first block in the Extended
  655. Block Allocation Table (XBAT)</TD>
  656. <TD>0x0044</TD>
  657. <TD>Integer</TD>
  658. <TD>-2</TD>
  659. </TR>
  660. <TR VALIGN=TOP>
  661. <TD>XBAT_COUNT</TD>
  662. <TD>Number of elements in the Extended Block
  663. Allocation Table (to be added to the BAT)</TD>
  664. <TD>0x0048</TD>
  665. <TD>Integer</TD>
  666. <TD>0</TD>
  667. </TR>
  668. <TR VALIGN=TOP>
  669. <TD>BAT_ARRAY</TD>
  670. <TD>Array of block indicies constituting the <A
  671. HREF="#BAT">Block Allocation Table (BAT)</A></TD>
  672. <TD>0x004C, 0x0050, 0x0054 ... 0x01FC</TD>
  673. <TD>Integer[ ]</TD>
  674. <TD>-1 for unused elements, at least first element
  675. must be filled.</TD>
  676. </TR>
  677. <TR VALIGN=TOP>
  678. <TD>N/A</TD>
  679. <TD>Header block data not otherwise described in this
  680. table</TD>
  681. <TD>N/A</TD>
  682. <TD>N/A</TD>
  683. <TD>-1</TD>
  684. </TR>
  685. </TABLE>
  686. <A HREF="#BAT"><H3><B>Block Allocation Table Block -- 512
  687. (0x200) bytes</B></H3></A>
  688. <TABLE BORDER=0 CELLPADDING=4 CELLSPACING=0>
  689. <TR VALIGN=TOP>
  690. <TD><B>Field</B></TD>
  691. <TD><B>Description</B></TD>
  692. <TD><B>Offset</B></TD>
  693. <TD><B>Length</B></TD>
  694. <TD><B>Default value or const</B></TD>
  695. </TR>
  696. <TR VALIGN=TOP>
  697. <TD>BAT_ELEMENT</TD>
  698. <TD>Any given element in the BAT block</TD>
  699. <TD>0x0000, 0x0004, 0x0008, ... 0x01FC</TD>
  700. <TD>Integer</TD>
  701. <TD>-1 = unused<BR>
  702. -2 = end of chain<BR>
  703. -3 = special (e.g., BAT block)<BR>
  704. All other values point to the next element in the
  705. chain and the next index of a block composing the
  706. file.</TD>
  707. </TR>
  708. </TABLE>
  709. <H3>Property Block -- 512 (0x200) byte block</H3>
  710. <TABLE BORDER=0 CELLPADDING=4 CELLSPACING=0>
  711. <TR VALIGN=TOP>
  712. <TD><B>Field</B></TD>
  713. <TD><B>Description</B></TD>
  714. <TD><B>Offset</B></TD>
  715. <TD><B>Length</B></TD>
  716. <TD><B>Default value or const</B></TD>
  717. </TR>
  718. <TR VALIGN=TOP>
  719. <TD>Properties[ ]</TD>
  720. <TD>This block contains the properties.</TD>
  721. <TD>0x0000, 0x0080, 0x0100, 0x0180</TD>
  722. <TD>128 bytes</TD>
  723. <TD>All unused space is set to -1.</TD>
  724. </TR>
  725. </TABLE>
  726. <H3>Property -- 128 (0x80) byte block</H3>
  727. <TABLE BORDER=0 CELLPADDING=4 CELLSPACING=0>
  728. <TR VALIGN=TOP>
  729. <TD><B>Field</B></TD>
  730. <TD><B>Description</B></TD>
  731. <TD><B>Offset</B></TD>
  732. <TD><B>Length</B></TD>
  733. <TD><B>Default value or const</B></TD>
  734. </TR>
  735. <TR VALIGN=TOP>
  736. <TD>NAME</TD>
  737. <TD>A unicode null-terminated uncompressed 16bit
  738. string (lose the high bytes) containing the name
  739. of the property.</TD>
  740. <TD>0x00, 0x02, 0x04, ... 0x3E</TD>
  741. <TD>Short[ ]</TD>
  742. <TD>0x0000 for unused elements, field required, 32
  743. (0x40) element max</TD>
  744. </TR>
  745. <TR VALIGN=TOP>
  746. <TD>NAME_SIZE</TD>
  747. <TD>Number of characters in the NAME field</TD>
  748. <TD>0x40</TD>
  749. <TD>Short</TD>
  750. <TD>Required</TD>
  751. </TR>
  752. <TR VALIGN=TOP>
  753. <TD>PROPERTY_TYPE</TD>
  754. <TD>Property type (directory, file, or root)</TD>
  755. <TD>0x42</TD>
  756. <TD>Byte</TD>
  757. <TD>1 (directory), 2 (file), or 5 (root entry)</TD>
  758. </TR>
  759. <TR VALIGN=TOP>
  760. <TD>NODE_COLOR</TD>
  761. <TD>Node color</TD>
  762. <TD>0x43</TD>
  763. <TD>Byte</TD>
  764. <TD>0 (red) or 1 (black)</TD>
  765. </TR>
  766. <TR VALIGN=TOP>
  767. <TD>PREVIOUS_PROP</TD>
  768. <TD>Previous property index</TD>
  769. <TD>0x44</TD>
  770. <TD>Integer</TD>
  771. <TD>-1</TD>
  772. </TR>
  773. <TR VALIGN=TOP>
  774. <TD>NEXT_PROP</TD>
  775. <TD>Next property index</TD>
  776. <TD>0x48</TD>
  777. <TD>Integer</TD>
  778. <TD>-1</TD>
  779. </TR>
  780. <TR VALIGN=TOP>
  781. <TD>CHILD_PROP</TD>
  782. <TD>First child property index</TD>
  783. <TD>0x4c</TD>
  784. <TD>Integer</TD>
  785. <TD>-1</TD>
  786. </TR>
  787. <TR VALIGN=TOP>
  788. <TD>SECONDS_1</TD>
  789. <TD>Seconds component of the created timestamp?</TD>
  790. <TD>0x64</TD>
  791. <TD>Integer</TD>
  792. <TD>0</TD>
  793. </TR>
  794. <TR VALIGN=TOP>
  795. <TD>DAYS_1</TD>
  796. <TD>Days since epoch component of the created
  797. timestamp?</TD>
  798. <TD>0x68</TD>
  799. <TD>Integer</TD>
  800. <TD>0</TD>
  801. </TR>
  802. <TR VALIGN=TOP>
  803. <TD>SECONDS_2</TD>
  804. <TD>Seconds component of the modified timestamp?</TD>
  805. <TD>0x6C</TD>
  806. <TD>Integer</TD>
  807. <TD>0</TD>
  808. </TR>
  809. <TR VALIGN=TOP>
  810. <TD>DAYS_2</TD>
  811. <TD>Days since epoch component of the modified
  812. timestamp?</TD>
  813. <TD>0x70</TD>
  814. <TD>Integer</TD>
  815. <TD>0</TD>
  816. </TR>
  817. <TR VALIGN=TOP>
  818. <TD>START_BLOCK</TD>
  819. <TD>Starting block of the file, used as the first
  820. block in the file and the pointer to the next
  821. block from the BAT</TD>
  822. <TD>0x74</TD>
  823. <TD>Integer</TD>
  824. <TD>Required</TD>
  825. </TR>
  826. <TR VALIGN=TOP>
  827. <TD>SIZE</TD>
  828. <TD>Actual size of the file this property points
  829. to. (used to truncate the blocks to the real
  830. size).</TD>
  831. <TD>0x78</TD>
  832. <TD>Integer</TD>
  833. <TD>0</TD>
  834. </TR>
  835. </TABLE>
  836. </BODY>
  837. </HTML>