56 Commits (e6ff9b74f4e0ddf1d0c373005a4d295f8c4cc792)

Author SHA1 Message Date
  Tim Allison e6ff9b74f4 60826 -- add initial support for streaming reading of xlsb files. 7 years ago
  Andreas Beeker 41d8585723 SonarCube fix - make members private 7 years ago
  Andreas Beeker 1690ec2d19 merge trunk to branch 7 years ago
  Dominik Stadler e6a2d46800 Compiler/IDE warnings, unnecessary keywords, 7 years ago
  Javen O'Neal b19c1253ee add cause to exceptions, log exceptions that are caught and suppressed 8 years ago
  Javen O'Neal 8fc36c7367 whitespace 8 years ago
  Nick Burch 1bbba4e39a Remove un-used imports 8 years ago
  Nick Burch ef2af2d53d Start moving logic over into the main and scratchpad jars for OLE2 8 years ago
  Nick Burch 39bd51fbe4 Notes on use 8 years ago
  Javen O'Neal a43a344d16 Add comments describing Outlook .msg DirectoryNode names 8 years ago
  Javen O'Neal 103116bff9 move string literals out to array that can be for-looped over 8 years ago
  Dominik Stadler bb7c632559 Fix some Sonar issues and some IntelliJ warnings 8 years ago
  Dominik Stadler 15d70b0828 Check for null in IOUtils.closeQuietly() to not log this unnecessarily 8 years ago
  Dominik Stadler 71f5735238 Refactor some common code from the various Document-Factories into a helper class 8 years ago
  Nick Burch 6e21b85d8e #59074 More helpful exception if Excel 1-95 files are given to ExtractorFactory 8 years ago
  Nick Burch 05d455310c Refactor out the POIFS directory entry name for Excel 1-95 entries, and have ExtractorFactory detect (but not support) these old files 8 years ago
  Nick Burch 7fdd90fecb Refactor to pull out the list of Excel 97+ directory entry names to a common place, avoiding duplication. Also starts on unit testing #59074 8 years ago
  Dominik Stadler 17ed7975e2 One more possible resource leak when creating the TextExtractor fails with a RuntimeException or one of the named exceptions 8 years ago
  Dominik Stadler a74cded68d Handle some cases better where file handles were left open by the ExtractorFactory, mostly when opening files failed, but also when using the NPOIFSFileSystem for initialization. 8 years ago
  Dustin Spicuzza bc6ee96e1a Add Visio OOXML text extractor + tests 8 years ago
  Andreas Beeker fad6546d8a sonar fixes 8 years ago
  Nick Burch 5abd6431a2 #56791 More updates from OPOIFS to NPOIFS 9 years ago
  Nick Burch 0227765619 Detect OOXML-strict, and give more helpful exceptions for them 9 years ago
  Nick Burch 47a2847cbe Give a more helpful exception if a Visio VSDX ooxml file is passed to ExtractorFactory 9 years ago
  Dominik Stadler 6eeb0a7c19 Add missing close and handle theme-pptx in ExtractorFactory. Add creating slide-bitmaps to PPTX integration test. 9 years ago
  Dominik Stadler a3e087268a * Verify some more Text-Extraction features as part of integration tests, fix some NullPointerExceptions that showed up now because the event-based extraction does not have a Document available 9 years ago
  Dominik Stadler 76307fe94b * Add text-extraction verification to integration-tests via a new abstract base FileHandler 9 years ago
  Nick Burch cef16bab94 Have ExtractorFactory open OPCPackages from files in read-only mode by default, since writing should never be needed when extracting text 9 years ago
  Dominik Stadler 8a1411bda1 fix some Eclipse warnings, unnecessary null-check and missing close() in tests 10 years ago
  Nick Burch ed23692537 Fix some Eclipse identified warnings 10 years ago
  Sergey Vladimirov 49697de696 Add Word-to-Text converter and use it as replacement for WordExtractor 13 years ago
  Nick Burch 4c8a39924b Inside ExtractorFactory, support finding embedded OOXML documents and providing extractors for them 13 years ago
  Nick Burch 8261bb3e8a Support nested outlook files in ExtractorFactory 14 years ago
  Nick Burch fbff3e557b Refactor to make it easier to tell which content types each POIXMLTextExtractor handles 14 years ago
  Nick Burch 11b69146c1 Fix indent, and make the mime type matching more consistent 14 years ago
  Nick Burch fd922298ef Enable Word6Extractor in ExtractorFactory 14 years ago
  Yegor Kozlov a4dfc23a0b Properly close internal InputStream in ExtractorFactory#createExtractor(File), see Bugzilla 49147 14 years ago
  Nick Burch 63dc16b762 New event based xssf text extractor (XSSFEventBasedExcelExtractor) 14 years ago
  Nick Burch 4c1c3a3ae3 Most of support suggested by Phil Varner on the list - ExtractorFactory can now be told to prefer Event Based extractors (current Excel only) on a per-thread or overall basis 14 years ago
  Maxim Valyanskiy 2956525db2 revert previous commit 14 years ago
  Maxim Valyanskiy 4e3c970131 ExtractorFactory: save OOXML stream into temporary file before text extraction - this reduces memory usage and allows temporary file cleanup 14 years ago
  Maxim Valyanskiy ababd504b5 add more powerpoint xml mime types 14 years ago
  Nick Burch 63387c5c31 Add PublisherTextExtractor support to ExtractorFactory 14 years ago
  Nick Burch f37c8f303a Add embeded (attachment) support to the outlook text extractor 14 years ago
  Nick Burch 07551a0925 Rename the outlook extractor to be more consistent with other extractors 14 years ago
  Nick Burch f7ccc5d5f5 Wire up the new HSMFTextExtactor to the ExtractorFactory 14 years ago
  Nick Burch 698c9b1279 Add in a few bits of Generics to avoid warnings 14 years ago
  Yegor Kozlov 8f4139a9a6 reduced the number of compiler warnings generated by JDK 1.6.13 with -Xlint 14 years ago
  Yegor Kozlov d09ab59ab0 Fixed ExtractorFactory to support .xltx and .dotx files, see Bugzilla 47517 15 years ago
  Josh Micich aca8d5187d Renamed Package (in org.apache.poi.openxml4j.opc) to OPCPackage so as to avoid clash with java.lang.Package (see bugzilla 46859) 15 years ago