From b3d7315bebf58696747a227682245de0d3d70b41 Mon Sep 17 00:00:00 2001
From: Nick Burch For basic text extraction, make use of
- To get specific bits of text, first create a To get specific bits of text, first create a org.apache.poi.extractor.PowerPointExtractor
. It accepts a file or an input
+org.apache.poi.hslf.extractor.PowerPointExtractor
. It accepts a file or an input
stream. The getText()
method can be used to get the text from the slides, and the getNotes()
method can be used to get the text
from the notes. Finally, getText(true,true)
will get the text
from both.
@@ -22,8 +22,8 @@ from both.
org.apache.poi.usermodel.SlideShow
-(from a org.apache.poi.HSLFSlideShow
, which accepts a file or an input
+ org.apache.poi.hslf.usermodel.SlideShow
+(from a org.apache.poi.hslf.HSLFSlideShow
, which accepts a file or an input
stream). Use getSlides()
and getNotes()
to get the slides and notes.
These can be queried to get their page ID (though they should be returned
in the right order).org.apache.poi.extractor.QuickButCruddyTextExtractor
+ org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor
might be of use.
QuickButCruddyTextExtractor doesn't use the normal record parsing code, instead it uses a tree structure blind search @@ -109,7 +109,7 @@ same character and paragraph formatting.
org.apache.poi.hslf.extractor.PowerPointExtractor
Uses the model code to allow extraction of text from files
org.apache.poi.extractor.QuickButCruddyTextExtractor
+ org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor
Uses the record code to extract all the text from files very fast,
but including deleted text (and other bits of Crud).