Apache POI contains support for reading few variants of encrypted office files:
Some "write-protected" files are encrypted with the built-in password "VelvetSweatshop", POI can read that files too.
Encryption | HSSF | HSLF | HWPF |
---|---|---|---|
XOR obfuscation *) | Yes (Writing since 3.16) | N/A | No |
40-bit RC4 encryption | Yes (Writing since 3.16) | N/A | Yes (since 3.17) |
Office Binary Document RC4 CryptoAPI Encryption | Yes (Since 3.16) | Yes | Yes (since 3.17) |
XSSF | XSLF | XWPF | |
Office Binary Document RC4 Encryption **) | Yes | Yes | Yes |
ECMA-376 Standard Encryption | Yes | Yes | Yes |
ECMA-376 Agile Encryption | Yes | Yes | Yes |
ECMA-376 XML Signature | Yes | Yes | Yes |
*) the xor encryption is flawed and works only for very small files - see #59857.
**) the MS-OFFCRYPTO documentation only mentions the RC4 (without CryptoAPI) encryption as a "in place" encryption, but apparently there's also a container based method with that key generation logic.
As mentioned above, use Biff8EncryptionKey.setCurrentUserPassword(String password) to specify the password.
XML-based formats are stored in OLE-package stream "EncryptedPackage". Use org.apache.poi.poifs.crypt.Decryptor to decode file:
If you want to read file encrypted with build-in password, use Decryptor.DEFAULT_PASSWORD.
Encrypting a file is similar to the above decryption process. Basically you'll need to choose between binaryRC4, standard and agile encryption, the cryptoAPI mode is used internally and its direct use would result in an incomplete file. Apart of the CipherMode, the EncryptionInfo class provides further parameters to specify the cipher and hashing algorithm to be used.
An Office document can be digital signed by a XML Signature to protect it from unauthorized modifications, i.e. modifications without having the original certificate. The current implementation is based on the eID Applet which is dual-licensed to Apache License 2.0 and LGPL v3.0. Instead of using the internal JDK API this version is based on Apache Santuario.
The classes have been tested against the following libraries, which need to be included additionally to the default dependencies:
Depending on the configuration and the activated facets various XAdES levels are supported - the support for higher levels (XAdES-T+) depend on supporting services and although the code is adopted, the integration is not well tested ... please support us on integration (testing) with timestamp and revocation (OCSP) services.
Further test examples can be found in the corresponding test class.
If you want to use a hash algorithm with 64 bytes (currently only applies to SHA512),
a base64 "feature" in xmlsec
leads to line breaks in the digest values, which won't be accepted by Office. To workaround this, you
need to set the following system property:
-Dorg.apache.xml.security.ignoreLineBreaks=true
When saving a OOXML document, POI creates missing relations on the fly. Therefore calling the signing method before would result in an invalid signature. Instead of trying to fix all save invocations, the user is asked to save the stream before in an intermediate byte array (stream) and process this stream instead.
For security-conscious environments where data at rest must be stored encrypted, the creation of plaintext temporary files is a grey area.
The code example, written by PJ Fanning, modifies the behavior of SXSSFWorkbook to extract an OOXML spreadsheet zipped container and write the contents to disk using AES encryption.
See SXSSFWorkbookWithCustomZipEntrySource.java and other files that are needed for this example.
Finding the source of a XML signature problem can be sometimes a pain in the ... neck, because the hashing of the canonicalized form is more or less done in the background.
One of the tripping hazards are different linebreaks in Windows/Unix, therefore use the non-indent form of the xmls. Furthermore the elements/ancestors containing namespace definitions and the used prefix might also differ.
The next thing is to compare successful signed documents from Office vs. POIs generated signature, i.e. unzip both files and look for differences. Usually the package relations (*.rels) will be different, and the sig1.xml, core.xml and [Content_Types].xml due to different order of the references.
The package relationships (*.rels) will be specially handled, i.e. they will be filtered and only a subset will be processed - see 13.2.4.24 Relationships Transform Algorithm.
POI and Santuario (XmlSec) use Log4J 2.x and SLF4J respectively for logging.