aboutsummaryrefslogtreecommitdiffstats
path: root/src/documentation/content/xdocs/components/poifs/embeded.xml
blob: 30852e199deed6ff2939f74c21314562480c4af7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
<?xml version="1.0" encoding="UTF-8"?>
<!--
   ====================================================================
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
   ====================================================================
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
<document>
  <header>
    <title>Apache POI™ - POIFS - Documents embedded in other documents</title>
    <subtitle>Overview</subtitle>
    <authors>
      <person name="Nick Burch" email="nick@apache.org"/>     
      <person name="Yegor Kozlov" email="yegor@apache.org"/>     
    </authors>
  </header>
  <body>
    <section><title>Overview</title>
      <p>It is possible for one OLE 2 based document to have other
         OLE 2 documents embedded in it. For example, an Excel file
         may have a Word document and a PowerPoint slideshow 
         embedded as part of it.</p>
      <p>Normally, these other documents are stored in subdirectories
         of the OLE 2 (POIFS) filesystem. The exact location of the
         embedded documents will vary depending on the type of the
         master document, and the exact directory names will differ
         each time. To figure out exactly which directory to look
         in, you will either need to process the appropriate OLE 2
         linking entry in the master document, or simple iterate
         over all the directories in the filesystem.</p>
      <p>As a general rule, you will find the same OLE 2 entries
         in the subdirectories, as you would've found at the root
         of the filesystem were a document to not be embedded.</p>

       <section><title>Files embedded in Excel</title>
         <p>Excel normally stores embedded files in subdirectories
         of the filesystem root. Typically these subdirectories
         are named starting with MBD, with 8 hex characters following.</p>
       </section>

       <section><title>Files embedded in Word</title>
         <p>Word normally stores embedded files in subdirectories
         of the ObjectPool directory, itself a subdirectory of the
         filesystem root. Typically these subdirectories and named
         starting with an underscore, followed by 10 numbers.</p>
       </section>

       <section><title>Files embedded in PowerPoint</title>
         <p>PowerPoint does not normally store embedded files
         in the OLE2 layer. Instead, they are held within records
         of the main PowerPoint file. 
         <br/>See the <a href="./../slideshow/how-to-shapes.html#OLE">HSLF Tutorial</a> 
         for how to retrieve embedded OLE objects from a presentation</p>
       </section>
    </section>

    <section><title>Listing POIFS contents</title>
      <p>POIFS provides a simple tool for listing the contents of
         OLE2 files. This can allow you to see what your POIFS file
         contents, and hence if it has any embedded documents in it,
         and where.</p>
      <p>The tool to use is <em>org.apache.poi.poifs.dev.POIFSLister</em>.
         This tool may be run from the command line, and takes a filename
         as its parameter. It will print out all the directories and 
         files contained within the POIFS file.</p>
    </section>

    <section><title>Opening embedded files</title>
      <p>All of the POIDocument classes (HSSFWorkbook, HSLFSlideShow,
         HWPFDocument and HDGFDiagram) can either be opened from
         a POIFSFileSystem, or from a specific directory within a
         POIFSFileSystem. So, to open embedded files, simply locate the
         appropriate DirectoryNode that represents the subdirectory
         of interest, and pass this + the overall POIFSFileSystem to
         the constructor.</p>
      <p>I you want to extract the textual contents of the embedded file,
         then open the appropriate POIDocument, and then pass this to
         the extractor class, instead of simply passing the POIFSFilesystem
         to the extractor.</p>
    </section>
  </body>
</document>