XWPF has a fairly stable core API, providing read and write access to the main parts of a Word .docx file, but it isn't complete. For some things, it may be necessary to dive down into the low level XMLBeans objects to manipulate the ooxml structure. If you find yourself having to do this, please consider sending in a patch to enhance that, see the "Contribution to POI" page.
For basic text extraction, make use of
org.apache.poi.xwpf.extractor.XWPFWordExtractor
. It accepts an input
stream or a XWPFDocument
. The getText()
method can be used to
get the text from all the paragraphs, along with tables, headers etc.
To get specific bits of text, first create a
org.apache.poi.xwpf.XWPFDocument
. Select the IBodyElement
of interest (Table, Paragraph etc), and from there get a XWPFRun
.
Finally fetch the text and properties from that.
To get at the headers and footers of a word document, first create a
org.apache.poi.xwpf.XWPFDocument
. Next, you need to create a
org.apache.poi.xwpf.usermodel.XWPFHeaderFooter
, passing it your
XWPFDocument. Finally, the XWPFHeaderFooter gives you access to the headers and
footers, including first / even / odd page ones if defined in your
document.
From a XWPFParagraph
, it is possible to fetch the existing
XWPFRun
elements that make up the text. To add new text,
the createRun()
method will add a new XWPFRun
to the end of the list. insertNewRun(int)
can instead be
used to add a new XWPFRun
at a specific point in the
paragraph.
Once you have a XWPFRun
, you can use the
setText(String)
method to make changes to the text. To add
whitespace elements such as tabs and line breaks, it is necessary to use
methods like addTab()
and addCarriageReturn()
.
For now, there are a limited number of XWPF examples in the Examples Package. Beyond those, the best source of additional examples is in the unit tests. Browse the XWPF unit tests.