From: Nick Burch Date: Fri, 26 May 2006 10:43:42 +0000 (+0000) Subject: Add a quick guide to using the text extractor and friends, since that's a common use X-Git-Tag: REL_3_0_ALPHA3~102 X-Git-Url: https://source.dussan.org/?a=commitdiff_plain;h=30541423a2f5993cc49c7a383a70401b5f56d6eb;p=poi.git Add a quick guide to using the text extractor and friends, since that's a common use git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@409632 13f79535-47bb-0310-9956-ffa450edef68 --- diff --git a/src/documentation/content/xdocs/hwpf/book.xml b/src/documentation/content/xdocs/hwpf/book.xml index 772577a6ee..d2d95fe9c3 100644 --- a/src/documentation/content/xdocs/hwpf/book.xml +++ b/src/documentation/content/xdocs/hwpf/book.xml @@ -7,6 +7,7 @@ + diff --git a/src/documentation/content/xdocs/hwpf/quick-guide.xml b/src/documentation/content/xdocs/hwpf/quick-guide.xml new file mode 100644 index 0000000000..2a91b50e59 --- /dev/null +++ b/src/documentation/content/xdocs/hwpf/quick-guide.xml @@ -0,0 +1,45 @@ + + + + + +
+ POI-HWPF - A Quick Guide + Overview + + + +
+ + +
Basic Text Extraction +

For basic text extraction, make use of +org.apache.poi.hwpf.extractor.WordExtractor. It accepts an input +stream or a HWPFDocument. The getText() +method can be used to +get the text from all the paragraphs, or getParagraphText() +can be used to fetch the text from each paragraph in turn. The other +option is getTextFromPieces(), which is very fast, but +tends to return things that aren't text from the page. YMMV. +

+
+ +
Specific Text Extraction +

To get specific bits of text, first create a +org.apache.poi.hwpf.HWPFDocument. Fetch the range +with getRange(), then get paragraphs from that. You +can then get text and other properties. +

+
+ +
Changing Text +

It is possible to change the text via + insertBefore() and insertAfter() + on a Range object (either a Range, + Paragraph or CharacterRun). + It is also possible to delete a Range, but this + code is know to have bugs in it. +

+
+ +