1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
|
<?xml version="1.0" encoding="UTF-8"?>
<!--
====================================================================
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
====================================================================
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
<document>
<header>
<title>Apache POI™ - Configuration</title>
<authors>
<person id="POI" name="POI Developers" email="dev@poi.apache.org"/>
</authors>
</header>
<body>
<section><title>Overview</title>
<p>The best way to learn about using Apache POI is to read through the <a href="index.html">feature documentation</a>
and other online examples online.
</p>
<p>To keep the features documentation focused on the APIs, there is little mention of some of the configuration
settings that can be enabled that may prove useful to users who have to handle very large documents or very
large throughput.
</p>
</section>
<section><title>Configuration via Java-code when calling Apache POI</title>
<p>These API methods allow to configure behavior of Apache POI for special needs, e.g. when processing excessively
large files.
</p>
<table>
<tr>
<th>Configuration Setting</th>
<th>Description</th>
</tr>
<tr>
<td>org.apache.poi.ooxml.POIXMLTypeLoader.DEFAULT_XML_OPTIONS</td>
<td>POI support for XSSF APIs relies heavily on <a href="https://xmlbeans.apache.org">XMLBeans</a>.
This instance can be <a href="https://xmlbeans.apache.org/docs/5.0.0/org/apache/xmlbeans/XmlOptions.html">configured</a>.
It is recommended to take care if you do change any of the config items.
In POI 5.1.0, we will disallow Doc Type parsing in the XML files embedded in xlsx/docx/pptx/etc files, by default.
DEFAULT_XML_OPTIONS.setDisallowDocTypeDeclaration(false) will undo this change.
</td>
</tr>
<tr>
<td><a href="https://poi.apache.org/apidocs/5.0/org/apache/poi/util/IOUtils.html#setByteArrayMaxOverride-int-">
org.apache.poi.util.IOUtils.setByteArrayMaxOverride(int maxOverride)</a>
</td>
<td>If this value is set to > 0, IOUtils.safelyAllocate(long, int) will ignore the maximum record length parameter.
This is designed to allow users to bypass the hard-coded maximum record lengths if they are willing to accept the risk of allocating memory up to the size specified.
It also allows to impose a lower limit than used for very memory constrained systems.
<p>
<strong>Note</strong>: This is a per-allocation limit and does not allow you to limit overall sum of allocations! Use -1 for using the limits specified per record-type.
</p>
</td>
</tr>
<tr>
<td><a href="https://poi.apache.org/apidocs/5.0/org/apache/poi/openxml4j/util/ZipSecureFile.html#setMinInflateRatio-double-">
org.apache.poi.openxml4j.util.ZipSecureFile.setMinInflateRatio(double ratio)</a>
</td>
<td>Sets the ratio between de- and inflated bytes to detect zipbomb.
It defaults to 1% (= 0.01d), i.e. when the compression is better than 1% for any given read package part, the parsing will fail indicating a Zip-Bomb.
</td>
</tr>
<tr>
<td><a href="https://poi.apache.org/apidocs/5.0/org/apache/poi/openxml4j/util/ZipSecureFile.html#setMaxEntrySize-long-">
org.apache.poi.openxml4j.util.ZipSecureFile.setMaxEntrySize(long maxEntrySize)</a>
</td>
<td>Sets the maximum file size of a single zip entry. It defaults to 4GB, i.e. the 32-bit zip format maximum.
This can be used to limit memory consumption and protect against security vulnerabilities when documents are provided by users.
POI 5.1.0 removes the previous limit of 4GB on this setting.
</td>
</tr>
<tr>
<td><a href="https://poi.apache.org/apidocs/5.0/org/apache/poi/openxml4j/util/ZipSecureFile.html#setMaxTextSize-long-">
org.apache.poi.openxml4j.util.ZipSecureFile.setMaxTextSize(long maxTextSize)</a>
</td>
<td>Sets the maximum number of characters of text that are extracted before an exception is thrown during extracting text from documents.
This can be used to limit memory consumption and protect against security vulnerabilities when documents are provided by users.
The default is approx 10 million chars. Prior to POI 5.1.0, the max allowed was approx 4 billion chars.
</td>
</tr>
<tr>
<td>org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.setThresholdBytesForTempFiles(int thresholdBytes)
</td>
<td><strong>Added in POI 5.1.0.</strong>
Number of bytes at which a zip entry is regarded as too large for holding in memory
and the data is put in a temp file instead - defaults to -1 meaning temp files are not used
and that zip entries with more than 2GB of data after decompressing will fail, 0 means all
zip entries are stored in temp files. A threshold like 50000000 (approx 50Mb is recommended)
</td>
</tr>
<tr>
<td>org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.setEncryptTempFiles(boolean encrypt)
</td>
<td><strong>Added in POI 5.1.0.</strong>
Whether temp files should be encrypted (default false). Only affects temp files related to zip entries.
</td>
</tr>
<tr>
<td>org.apache.poi.openxml4j.opc.ZipPackage.setUseTempFilePackageParts(boolean tempFilePackageParts)
</td>
<td><strong>Added in POI 5.1.0.</strong>
Whether to save package part data in temp files to save memory (default=false).
</td>
</tr>
<tr>
<td>org.apache.poi.openxml4j.opc.ZipPackage.setEncryptTempFilePackageParts(boolean encryptTempFiles)
</td>
<td><strong>Added in POI 5.1.0.</strong>
Whether to encrypt package part temp files (default=false).
</td>
</tr>
<tr>
<td>org.apache.poi.extractor.ExtractorFactory.setThreadPrefersEventExtractors(boolean preferEventExtractors) and
org.apache.poi.extractor.ExtractorFactory.setAllThreadsPreferEventExtractors(Boolean preferEventExtractors)
</td>
<td>
When creating text-extractors for documents, allows to choose a different type of extractor which parses documents
via an event-based parser.
</td>
</tr>
<tr>
<td>Various classes: setMaxRecordLength(int length)
</td>
<td>
Allows to override the default max record length for various classes which
parse input data. E.g. XMLSlideShow, XSSFBParser, HSLFSlideShow, HWPFDocument,
HSSFWorkbook, EmbeddedExtractor, StringUtil, ...
<br/>
This may be useful if you try to process very large files which otherwise trigger
the excessive-memory-allocation prevention in Apache POI.
</td>
</tr>
<tr>
<td>org.apache.poi.xslf.usermodel.XSLFPictureData.setMaxImageSize(int length)
</td>
<td>
Allows to override the default max image size allowed for XSLF pictures.
</td>
</tr>
<tr>
<td>org.apache.poi.xssf.usermodel.XSSFPictureData#setMaxImageSize(int length)
</td>
<td>
Allows to override the default max image size allowed for XSSF pictures.
</td>
</tr>
<tr>
<td>org.apache.poi.xwpf.usermodel.XWPFPictureData#setMaxImageSize(int length)
</td>
<td>
Allows to override the default max image size allowed for XWPF pictures.
</td>
</tr>
</table>
</section>
<section><title>Observed Java System Properties</title>
<p>Apache POI supports some Java System Properties.
</p>
<table>
<tr>
<th>System property</th>
<th>Description</th>
</tr>
<tr>
<td>java.io.tmpdir</td>
<td>
Apache POI uses the default mechanism of the JDK for specifying the location of
temporary files.
</td>
</tr>
<tr>
<td>org.apache.poi.hwpf.preserveBinTables and org.apache.poi.hwpf.preserveTextTable</td>
<td>
Allows to adjust how parsing Word documents via HWPF is handling tables.
</td>
</tr>
<tr>
<td>org.apache.poi.ss.ignoreMissingFontSystem</td>
<td><strong>Added in POI 5.2.3.</strong>
Instructs Apache POI to ignore some errors due to missing fonts and thus allows
to perform more functionality even when no fonts are installed.
<br/>
Note: Some functionality will still not be possible as it cannot use default-values, e.g. rendering
slides, drawing, ...
</td>
</tr>
</table>
</section>
</body>
<footer>
<legal>
Copyright (c) @year@ The Apache Software Foundation. All rights reserved.
<br />
Apache POI, POI, Apache, the Apache feather logo, and the Apache
POI project logo are trademarks of The Apache Software Foundation.
</legal>
</footer>
</document>
|