Tim Allison
e6ff9b74f4
60826 -- add initial support for streaming reading of xlsb files.
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1787228 13f79535-47bb-0310-9956-ffa450edef68
7 years ago
Andreas Beeker
41d8585723
SonarCube fix - make members private
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1773908 13f79535-47bb-0310-9956-ffa450edef68
7 years ago
Andreas Beeker
1690ec2d19
merge trunk to branch
git-svn-id: https://svn.apache.org/repos/asf/poi/branches/hssf_cryptoapi@1762709 13f79535-47bb-0310-9956-ffa450edef68
7 years ago
Dominik Stadler
e6a2d46800
Compiler/IDE warnings, unnecessary keywords,
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1761675 13f79535-47bb-0310-9956-ffa450edef68
7 years ago
Javen O'Neal
b19c1253ee
add cause to exceptions, log exceptions that are caught and suppressed
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1753030 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Javen O'Neal
8fc36c7367
whitespace
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1753028 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Nick Burch
1bbba4e39a
Remove un-used imports
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1752229 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Nick Burch
ef2af2d53d
Start moving logic over into the main and scratchpad jars for OLE2
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1752226 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Nick Burch
39bd51fbe4
Notes on use
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1752222 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Javen O'Neal
a43a344d16
Add comments describing Outlook .msg DirectoryNode names
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1752054 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Javen O'Neal
103116bff9
move string literals out to array that can be for-looped over
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1752053 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Dominik Stadler
bb7c632559
Fix some Sonar issues and some IntelliJ warnings
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1746627 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Dominik Stadler
15d70b0828
Check for null in IOUtils.closeQuietly() to not log this unnecessarily
Add coverage for some more methods in ExtractorFactory
Fix some IntelliJ warnings
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1736146 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Dominik Stadler
71f5735238
Refactor some common code from the various Document-Factories into a helper class
Fix a potential file-handle-leak for password protected workbooks or slideshows
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1734691 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Nick Burch
6e21b85d8e
#59074 More helpful exception if Excel 1-95 files are given to ExtractorFactory
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1732587 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Nick Burch
05d455310c
Refactor out the POIFS directory entry name for Excel 1-95 entries, and have ExtractorFactory detect (but not support) these old files
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1732583 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Nick Burch
7fdd90fecb
Refactor to pull out the list of Excel 97+ directory entry names to a common place, avoiding duplication. Also starts on unit testing #59074
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1732579 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Dominik Stadler
17ed7975e2
One more possible resource leak when creating the TextExtractor fails with a RuntimeException or one of the named exceptions
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1731561 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Dominik Stadler
a74cded68d
Handle some cases better where file handles were left open by the ExtractorFactory, mostly when opening files failed, but also when using the NPOIFSFileSystem for initialization.
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1721064 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Dustin Spicuzza
bc6ee96e1a
Add Visio OOXML text extractor + tests
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1709361 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Andreas Beeker
fad6546d8a
sonar fixes
Very interesting was the exception swallowing in PackagePropertiesPart. When it was properly thrown, it already led to various errors in the junits test - I've fixed the handling for at least the ones which are in our test set
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1706169 13f79535-47bb-0310-9956-ffa450edef68
8 years ago
Nick Burch
5abd6431a2
#56791 More updates from OPOIFS to NPOIFS
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1678801 13f79535-47bb-0310-9956-ffa450edef68
9 years ago
Nick Burch
0227765619
Detect OOXML-strict, and give more helpful exceptions for them
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1666525 13f79535-47bb-0310-9956-ffa450edef68
9 years ago
Nick Burch
47a2847cbe
Give a more helpful exception if a Visio VSDX ooxml file is passed to ExtractorFactory
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1665929 13f79535-47bb-0310-9956-ffa450edef68
9 years ago
Dominik Stadler
6eeb0a7c19
Add missing close and handle theme-pptx in ExtractorFactory. Add creating slide-bitmaps to PPTX integration test.
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1663137 13f79535-47bb-0310-9956-ffa450edef68
9 years ago
Dominik Stadler
a3e087268a
* Verify some more Text-Extraction features as part of integration tests, fix some NullPointerExceptions that showed up now because the event-based extraction does not have a Document available
* Also handle a XLSX which does not have row-numbers in the sheet-xml. Excel can read it so it makes sense to also allow to read it in the XSSFSheetXMLHandler
* Remove some Eclipse warnings in test-code
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1662691 13f79535-47bb-0310-9956-ffa450edef68
9 years ago
Dominik Stadler
76307fe94b
* Add text-extraction verification to integration-tests via a new abstract base FileHandler
* Fix NullPointerException found in some documents when running against the test-data
* Add support for extracting text from Dir-Entries WORKBOOK and BOOK to support some old/strangely formatted XLS files.
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1662652 13f79535-47bb-0310-9956-ffa450edef68
9 years ago
Nick Burch
cef16bab94
Have ExtractorFactory open OPCPackages from files in read-only mode by default, since writing should never be needed when extracting text
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1652877 13f79535-47bb-0310-9956-ffa450edef68
9 years ago
Dominik Stadler
8a1411bda1
fix some Eclipse warnings, unnecessary null-check and missing close() in tests
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1594201 13f79535-47bb-0310-9956-ffa450edef68
10 years ago
Nick Burch
ed23692537
Fix some Eclipse identified warnings
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1589765 13f79535-47bb-0310-9956-ffa450edef68
10 years ago
Sergey Vladimirov
49697de696
Add Word-to-Text converter and use it as replacement for WordExtractor
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1155336 13f79535-47bb-0310-9956-ffa450edef68
13 years ago
Nick Burch
4c8a39924b
Inside ExtractorFactory, support finding embedded OOXML documents and providing extractors for them
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1049802 13f79535-47bb-0310-9956-ffa450edef68
13 years ago
Nick Burch
8261bb3e8a
Support nested outlook files in ExtractorFactory
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@982334 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Nick Burch
fbff3e557b
Refactor to make it easier to tell which content types each POIXMLTextExtractor handles
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@980414 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Nick Burch
11b69146c1
Fix indent, and make the mime type matching more consistent
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@964844 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Nick Burch
fd922298ef
Enable Word6Extractor in ExtractorFactory
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@959360 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Yegor Kozlov
a4dfc23a0b
Properly close internal InputStream in ExtractorFactory#createExtractor(File), see Bugzilla 49147
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@935900 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Nick Burch
63dc16b762
New event based xssf text extractor (XSSFEventBasedExcelExtractor)
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@903182 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Nick Burch
4c1c3a3ae3
Most of support suggested by Phil Varner on the list - ExtractorFactory can now be told to prefer Event Based extractors (current Excel only) on a per-thread or overall basis
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@902927 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Maxim Valyanskiy
2956525db2
revert previous commit
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@899129 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Maxim Valyanskiy
4e3c970131
ExtractorFactory: save OOXML stream into temporary file before text extraction - this reduces memory usage and allows temporary file cleanup
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@899123 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Maxim Valyanskiy
ababd504b5
add more powerpoint xml mime types
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@898292 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Nick Burch
63387c5c31
Add PublisherTextExtractor support to ExtractorFactory
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@897887 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Nick Burch
f37c8f303a
Add embeded (attachment) support to the outlook text extractor
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@897258 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Nick Burch
07551a0925
Rename the outlook extractor to be more consistent with other extractors
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@897249 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Nick Burch
f7ccc5d5f5
Wire up the new HSMFTextExtactor to the ExtractorFactory
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@897246 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Nick Burch
698c9b1279
Add in a few bits of Generics to avoid warnings
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@895477 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Yegor Kozlov
8f4139a9a6
reduced the number of compiler warnings generated by JDK 1.6.13 with -Xlint
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@821551 13f79535-47bb-0310-9956-ffa450edef68
14 years ago
Yegor Kozlov
d09ab59ab0
Fixed ExtractorFactory to support .xltx and .dotx files, see Bugzilla 47517
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@795327 13f79535-47bb-0310-9956-ffa450edef68
15 years ago
Josh Micich
aca8d5187d
Renamed Package (in org.apache.poi.openxml4j.opc) to OPCPackage so as to avoid clash with java.lang.Package (see bugzilla 46859)
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@755699 13f79535-47bb-0310-9956-ffa450edef68
15 years ago