Any information in here that might be perceived as legal information is informational only. We're not lawyers, so consult a legal professional if needed.
The POI project is OpenSource and developed/distributed under the Apache Software License v2. Unlike some other licenses, the Apache license allows free open source development. Unlike some other Open Source licenses, it does not require you to release your source or use any particular license for your code which builds on top of it. (There are a handful of restrictions, especially around attribution, notices and trademarks, so it's worth a read of the license - it isn't scary!). If you wish to contribute to Apache POI (which you're very welcome and encouraged to do so), then you must agree to grant your contributions to us under the same license.
There are a lot of open issues in Bugzilla and TODOs in the code. Please see the section below for more on these. Get in touch using our mailing lists if you want to volunteer.
The Apache Contributors Tech Guide gives a good overview how to start contributing patches.
The Nutch project also have a very useful guide on becoming a new developer in their project. While it is written for their project, a large part of it will apply to POI too. You can read it at http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer. The Apache Community Development Project also provides guidance and mentoring for new contributors.
If you use GitHub, you can submit Pull Requests to https://github.com/apache/poi. It is probably a good idea to create an issue in the Bug Database first and reference it in the PR.
You can add patch files to the Bugzilla issues at Bug Database. If there is already a bug-report, attach it there, otherwise create a new bug, set the subject to [PATCH] followed by a brief description. Explain you patch and any special instructions and submit/save it. Next, go back to the bug, and create attachments for the patch files you created. Be sure to describe not only the files purpose, but its format. (Is that ZIP or a tgz or a bz2 or what?).
Ideally, patches should be submitted early and often. This is for two key reasons. Firstly, it's much easier to review smaller patches than large ones. This means that smaller patches are much more likely to be applied to Git in a timely fashion. Secondly, by sending in your patches earlier rather than later, it's much easier to get feedback on your coding and direction. If you've missed an easier way to do something, or are duplicating some (probably hidden) existing code, or taking things in an unusual direction, it's best to get the feedback sooner rather than later! As such, when submitting patches to POI, as with other Apache Software Foundation projects, do please try to submit early and often, rather than "throwing a large patch over the wall" at the end.
A number of Apache projects provide far more comprehensive guides to producing and submitting patches than we do, you may wish to review some of their information if you're unsure. The Apache Commons one is fairly similar as a starting point.
You may create your patch file using either of the following approaches (the committers recommend the first):
If you are working on a Git clone of Apache POI (see the Version Control page for more info), it is possible to generate a patch of your changes (including new binary files) using Git.
When generating a patch / patch set from Git, for many related and small changes a squashed patch is probably best, as it makes the (manual) review quicker. For larger changes, several distinct patches are probably best.
If you intend to do a noticeable amount of work enhancing Apache POI on your own Git repo, we would suggest sending in patches early and asking for advice. There's nothing worse than spending a week working hard on your own on a change, only to discover you did something on Day 1 that isn't acceptable to the project meaning your whole patch needs re-doing... Git's offline workflow makes this easier, so try not to fall into that trap!
@author
tags.@Disabled
from org.junit
for in-progress work).svn diff
.The long standing Minimal Coding Standards from 2002 still largely apply to the project.
When making changes to an existing file, please try to follow the same style that that file already uses. This will keep things looking similar, and will prevent patches becoming largely about whitespace. Whitespace fixing changes, if needed, should normally be in their own commit, so that they don't crowd out coding changes in review.
Normally, tabs should not be used to indent code. Instead, spaces should be used. If starting on a fresh file, please use 4 spaces to indent your code. If working on an existing file, please use whichever of 3 or 4 spaces that file already follows.
Normally, braces should open on the same line as the decision statement. Braces should normally close on their own line. Brackets should normally have a space before them when they are the first.
Lines normally shouldn't be too long. There's no hard and fast rule, but if you line is getting above about 90 characters think about splitting it, and you should rarely create something over about 100 characters without a very good reason!
The POI project will generally offer committership to contributors who send in consistently good patches over a period of several months.
The requirement for "good patches" generally means patches which can be applied to SVN with little or no changes. These patches should include unit test, and appropriate documentation. Whilst your first patch to POI may require quite a bit of work before it can be committed by an existing committer, with any luck your later patches will be applied with no / minor tweaks. Please do take note of any changes required by your earlier patches, to learn for later ones! If in doubt, ask on the dev mailing list.
The requirement for patches over several months is to ensure that committers remain with the project. It's very easy for a good developer to fire off half a dozen good patches in the couple of weeks that they're working on a POI powered project. However, if that developer then moves away, and stops contributing to POI after that spurt, then they're not a good candidate for committership. As such, we generally require people to stay around for a while, submitting patches and helping on the mailing list before considering them for committership.
Where possible, patches should be submitted early and often. For more details on this, please see the "Submitting Patches" section above.
Where possible, the existing developers will try to help and mentor new contributors. However, everyone involved in POI is a volunteer, and it may happen that your first few patches come in at a time when all the committers are very busy. Do please have patience, and remember to use the dev mailing list so that other contributors can assist you!
For more information on getting started at Apache, mentoring, and local Apache Committers near you who can offer advice, please see the Apache Community Development Project website.
In early 2008, Microsoft made a fairly complete set of documentation on the binary file formats freely and publicly available. These were released under the Open Specification Promise, which does allow us to use them for building open source software under the Apache Software License.
You can download the documentation on Excel, Word, PowerPoint and Escher (drawing) from http://msdn.microsoft.com/en-us/library/cc313118.aspx. Documentation on a few of the supporting technologies used in these file formats can be downloaded from http://msdn.microsoft.com/en-us/library/jj633110.aspx.
For the VSDX format (implemented in Apache POI as XDGF), an introduction is available from Microsoft, and full details are available here and here.
Previously, Microsoft published a book on the Excel 97 file format. It can still be of plenty of use, and is handy dead tree form. Pick up a copy of "Excel 97 Developer's Kit" from your favourite second hand book store.
The newer Office Open XML (ooxml) file formats are documented as part of the ECMA / ISO standardisation effort for the formats. This documentation is quite large, but you can normally find the bit you need without too much effort! This can be downloaded from https://ecma-international.org/publications-and-standards/standards/ecma-376/, and is also under the OSP.
Additionally for the newer Office Open XML (ooxml) file formats, you can find some good introductary documentation (often clearer for getting started with) at officeopenxml.com, which is an independent site documenting the file formats.
It is also worth checking the documentation and code of the other open source implementations of the file formats.
In short, stay away, stay far far away. Implementing these file formats in POI is done strictly by using public information. Most of this Public Information currently comes from the documentation that Microsoft makes freely available (see above). The rest of the public information includes sources from other open source projects, books that state the purpose intended is for allowing implementation of the file format and do not require any non-disclosure agreement and just hard work. We are intent on keeping it legal, by contributing patches you agree to do the same.
If you've ever received information regarding the OLE 2 Compound Document Format under any type of exclusionary agreement from Microsoft, or received such information from a person bound by such an agreement, you cannot participate in this project. Sorry. Well, unless you can persuade Microsoft to release you from the terms of the NDA on the grounds that most of the information is now publicly available. However, if you have been party to a Microsoft NDA, you will need to get clearance from Microsoft before contributing.
Those submitting patches that show insight into the file format may be asked to state explicitly that they have only ever read the publicly available file format information, and not any received under an NDA or similar, and have only made us of the public documentation.