Downloading them becomes nearly impossible now that bintray.dl closes down,
so let's rather persist the jars as part of the source distribution for now.
Andreas Beeker [Wed, 7 Apr 2021 21:40:33 +0000 (21:40 +0000)]
65206 - Migrate ant / maven to gradle build
compile / jar / test of mrJars
don't include ants build.xml anymore
rename directories to match project and maven artifact names
refactor artifacts - so each project has one artifact
replace static references in hssf/dev tests with junit5 constructs, which had problems in parallel tests
increase gradle heap to 4gb because of OOM - maybe less would also work
XSLX2CSV: Do not double-encode if the value is already having quotes and escape double-quotes
Most CSV formats use "" (two quotes) to escape a "-character, we should do this in this
example as well to produce files that can be parsed by other CSV processors correctly.
Also cases where the value is already enclosed in quotes should not lead to additional quotes
Add a simple initial test to module "examples" verify basic functionality of XSLX2CSV
as I often rely on it for converting some very large xlsx-files to csv
Dominik Stadler [Sun, 28 Mar 2021 19:54:54 +0000 (19:54 +0000)]
Remove support for japicmp from Gradle build
I now spent a few hours trying to make it work and the Gradle support is
simply not production-ready and also not maintained, it triggers various
strange errors and does not supporting the usual Gradle conventions.
So I do not want to spend more time on it, feel free to revive it if you know how
to make this work properly.
Andreas Beeker [Sat, 27 Mar 2021 14:03:16 +0000 (14:03 +0000)]
65206 - Migrate ant / maven to gradle build
update gradle files and project structure along https://github.com/centic9/poi/tree/gradle_build
remove eclipse IDE project files
remove obsolete record generator files
Marius Volkhart [Sun, 14 Mar 2021 20:43:43 +0000 (20:43 +0000)]
Change Gradle to use java-library plugin
This plugin is specifically built for libraries. The major difference to the regular java plugin is that is allows defining dependencies as part of the api or implementation. Both are used by the project at compile/runtime, but only api dependencies are made available to dependent projects.
In our current setup, this doesn't matter much. We deploy to maven central using pre-built POMs. It's more of a future-proofing, and it makes it a little bit clearer which gradle projects actually require which dependencies.
Marius Volkhart [Sun, 14 Mar 2021 18:56:30 +0000 (18:56 +0000)]
Exclude batik-script dependency from OOXML artifact
We do not make use of the batik-script dependency. While this is likely to be true of a variety of the Batik dependencies, batik-script causes problems for our users who are using JPMS. See [bug-65103].
Marius Volkhart [Sun, 14 Mar 2021 18:51:12 +0000 (18:51 +0000)]
Specify more granular Batik dependencies
Batik-all is a strange artifact. It's POM declares dependencies on all the sub-JARs, but its JAR has all of the sub-jars repackaged. This results in multiple JARs with the same packages being added to consuming applications. This leads to problems for JPMS users. See [bug-65183].
The Ant build does not use batik-all, so the Maven and Gradle builds should not either.
Marius Volkhart [Sun, 14 Mar 2021 11:31:18 +0000 (11:31 +0000)]
Disable parallel tests on Gradle again
Something is causing parallel tests to fail on CI. I haven't been able to track down what it is. The symptoms look similar to others where the cause was a test modifying the test-data directory.
The integration tests also sometimes run into OutOfMemoryErrors when I run them in parallel.
Marius Volkhart [Sun, 14 Mar 2021 10:42:15 +0000 (10:42 +0000)]
Limit which tests can run in parallel
Some tests modify global resources. Those tests cannot be run in parallel with others, as they cause problems or become flaky. Where possible, indicate to JUnit the resources in contention. Otherwise, mark the tests as needing to run in isolation.
Reduce the number of map lookups necessary to compute the return values for methods that return collections of property details. Since we maintain parity between the `props` and `dictionary` contents, when retrieving property details, we can reference the `props` directly and avoid the `dictionary` indirection.
Marius Volkhart [Tue, 9 Mar 2021 19:26:31 +0000 (19:26 +0000)]
Additional debug logging for unknown records in HSLF
Recently, while debugging app behavior on HSLF documents, I had to dig into the OOXML that Microsoft PowerPoint places into files saved in PPT format. Having information in the logs about when records were not parsed by POI was very helpful. The hex identifier was critical in being able to quickly search the [MS-PPT] spec for what type of record it was, and the integer identifier was helpful in quickly finding the Record type in RecordTypes.java.
Marius Volkhart [Mon, 1 Mar 2021 00:25:23 +0000 (00:25 +0000)]
Deprecate functions that duplicate functionality
DrawingGroupRecord#processChildRecords and AbstractEscherHolderRecord#convertRawBytesToEscherRecords duplicate the functionality of AbstractEscherHolderRecord#decode. This makes the code harder to follow, as it is not clear when certain access patterns repeat. Accordingly, these functions are deprecated and flagged for removal.
Marius Volkhart [Mon, 1 Mar 2021 00:04:51 +0000 (00:04 +0000)]
Review EscherContainerRecord#getChildRecords() call sites for unnecessary work
This started off as wanting to add the EscherContainerRecord#getChildCount() function in order to do an efficient check for how many children the container has. This was desirable in new code for editing HSSF pictures. The existing option of calling getChildRecords().size() was undesirable as this requires a list copy first.
In the process of finding call sites that would benefit from replacing getChildRecords().size(), I realized that several other patterns would benefit from eliminating a copy, such as iterating over the children in a for-each loop, and indexed access to specific children.
Marius Volkhart [Sun, 28 Feb 2021 23:16:14 +0000 (23:16 +0000)]
Add the ability to edit HSLFPictureData contents
Pictures can now be edited by calling HSLFPictureData#setData(byte[]). The byte[] should contain the image data as an image viewer might read it.
To enable this functionality, a tighter coupling between the EscherBSERecords of the slideshow and the HSLFPictureData was required. This ensures that changes in image data size are accurately recorded in the records.
In the course of coupling the records and the HSLFPictureData, various scenarios arose where a mapping of records to pictures was non-trivial. Accordingly, the HSLFSlideShowImpl#matchPicturesAndRecords(...) function was added to perform a more sophisticated matching pass. This function is heavily exercised by org.apache.poi.hslf.usermodel.TestBugs.testFile[5] and PPTX2PNG.render[2], as well as the new TestPictures#testSlideshowWithIncorrectOffsets().
Marius Volkhart [Sun, 28 Feb 2021 19:18:13 +0000 (19:18 +0000)]
Rename EscherRecordHolder to OfficeArtContent
While the class does indeed hold EscherRecords, due to recent refactoring it is much more structured now than it was before. The contents of the class now closely resemble the OfficeArtContent structure referenced in the MS-DOC spec. Naming the class after the specification structure makes it easier to find and understand.
Marius Volkhart [Sun, 28 Feb 2021 18:49:42 +0000 (18:49 +0000)]
Rework EscherRecordHolder parsing
Modify the parsing done by EscherRecordHolder to be more deterministic. The format of the OfficeArtContent structure, which the EscherRecordHolder represents, is well defined in the MS-DOC spec. A clear class structure makes it easier to reason about the availability of data.