But please note that by the nature of processing external files, you should design your application
in a way which limits impact of malicious documents as much as possible. The higher your security-related
requirements are, the more you likely need to invest in your application to contain effects.
Architecting your Application
If you are processing documents from an untrusted source, you should add a number of safeguards to
your application to contain any unexpected side effects.
Apache POI cannot fully protect against some documents causing impact on the current process, therefore
we suggest the following additional layers of security.
Expect any type of Exception when processing documents
As parsing the various formats is very complex and involved, there are some unexpected types of
exceptions which can be thrown. E.g. StackOverflowError or many different types of RuntimeException.
Make sure to have a broad catch-statement around your document-parsing functionality and be prepared
to handle all those gracefully.
Expect long parsing time
As parsing the various formats is very complex and involved, some documents might cause prolonged CPU
usage and long parsing time.
If this is a concern, make sure to have a way to stop processing after some time, maybe by the
sandboxing approach described below.
Memory use can be very high
The data in Microsoft format files is usually compressed so even small files can have a lot of data.
The core POI APIs are not optimized to avoid excessive memory use. POI has streaming APIs for reading
and writing xlsx files - so if you are working with large xlsx files, you should consider using the
streaming APIs.
Consider sandboxing document-parsing
If you operate in a highly sensitive environment and would like to avoid any side effect from
parsing documents on your application, then consider extracting the parsing logic into a separate
process which is configured with appropriate memory settings and which you stop after some timeout.
It is a good idea to be able to auto-restart the process in case of a crash.
Keep up to date with releases
Apache POI does occasionally issue CVEs for security issues. There are also other bug fixes and
improvements in each release. Some of these fixes will be to make POI more robust against malicious
inputs, even if they are not explicitly security-related.