aboutsummaryrefslogtreecommitdiffstats
path: root/src/documentation/content/xdocs/components/slideshow/quick-guide.xml
diff options
context:
space:
mode:
Diffstat (limited to 'src/documentation/content/xdocs/components/slideshow/quick-guide.xml')
-rw-r--r--src/documentation/content/xdocs/components/slideshow/quick-guide.xml133
1 files changed, 133 insertions, 0 deletions
diff --git a/src/documentation/content/xdocs/components/slideshow/quick-guide.xml b/src/documentation/content/xdocs/components/slideshow/quick-guide.xml
new file mode 100644
index 0000000000..88d85d877c
--- /dev/null
+++ b/src/documentation/content/xdocs/components/slideshow/quick-guide.xml
@@ -0,0 +1,133 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ ====================================================================
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ ====================================================================
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "document-v20.dtd">
+
+<document>
+ <header>
+ <title>POI-HSLF - A Quick Guide</title>
+ <subtitle>Overview</subtitle>
+ <authors>
+ <person name="Nick Burch" email="nick at torchbox dot com"/>
+ </authors>
+ </header>
+
+ <body>
+ <section><title>Basic Text Extraction</title>
+ <p>For basic text extraction, make use of
+ <code>org.apache.poi.sl.extractor.SlideShowExtractor</code>.
+ It accepts a slideshow which can be created from a file or stream via <code>org.apache.poi.sl.usermodel.SlideShowFactory</code>.
+ The <code>getText()</code> method can be used to get the text from the slides.
+ </p>
+ </section>
+
+ <section><title>Specific Text Extraction</title>
+ <p>To get specific bits of text, first create a <code>org.apache.poi.hslf.usermodel.HSLFSlideShow</code>
+(from a <code>org.apache.poi.hslf.usermodel.HSLFSlideShowImpl</code>, which accepts a file or an input
+stream). Use <code>getSlides()</code> and <code>getNotes()</code> to get the slides and notes.
+These can be queried to get their page ID (though they should be returned
+in the right order).</p>
+ <p>You can then call <code>getTextParagraphs()</code> on these, to get
+their blocks of text. (A list of <code>HSLFTextParagraph</code> normally holds all the text in a
+given area of the page, eg in the title bar, or in a box).
+From the <code>HSLFTextParagraph</code>, you can extract the text, and check
+what type of text it is (eg Body, Title). You can also call
+<code>getTextRuns()</code>, which will return the
+<code>HSLFTextRun</code>s that make up the <code>TextParagraph</code>. A
+<code>HSLFTextRun</code> is a text fragment, having the same character formatting.
+The paragraph formatting is defined in the parent <code>HSLFTextParagraph</code>.
+ </p>
+ </section>
+
+ <section><title>Poor Quality Text Extraction</title>
+ <p>If speed is the most important thing for you, you don't care
+ about getting duplicate blocks of text, you don't care about
+ getting text from master sheets, and you don't care about getting
+ old text, then
+ <code>org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor</code>
+ might be of use.</p>
+ <p>QuickButCruddyTextExtractor doesn't use the normal record
+ parsing code, instead it uses a tree structure blind search
+ method to get all text holding records. You will get all the text,
+ including lots of text you normally wouldn't ever want. However,
+ you will get it back very very fast!</p>
+ <p>There are two ways of getting the text back.
+ <code>getTextAsString()</code> will return a single string with all
+ the text in it. <code>getTextAsVector()</code> will return a
+ vector of strings, one for each text record found in the file.
+ </p>
+ </section>
+
+ <section><title>Changing Text</title>
+ <p>It is possible to change the text via
+ <code>HSLFTextParagraph.setText(List&lt;HSLFTextParagraph&gt;,String)</code> or
+ <code>HSLFTextRun.setText(String)</code>. It is possible to add additional TextRuns
+ with <code>HSLFTextParagraph.appendText(List&lt;HSLFTextParagraph&gt;,String,boolean)</code>
+ or <code>HSLFTextParagraph.addTextRun(HSLFTextRun)</code></p>
+ <p>When calling <code>HSLFTextParagraph.setText(List&lt;HSLFTextParagraph&gt;,String)</code>, all
+ the text will end up with the same formatting. When calling
+ <code>HSLFTextRun.setText(String)</code>, the text will retain
+ the old formatting of that <code>HSLFTextRun</code>.
+ </p>
+ </section>
+
+ <section><title>Adding Slides</title>
+ <p>You may add new slides by calling
+ <code>HSLFSlideShow.createSlide()</code>, which will add a new slide
+ to the end of the SlideShow. It is possible to re-order slides with <code>HSLFSlideShow.reorderSlide(...)</code>.
+ </p>
+ </section>
+
+ <section><title>Guide to key classes</title>
+ <ul>
+ <li><code>org.apache.poi.hslf.usermodel.HSLFSlideShowImpl</code>
+ Handles reading in and writing out files. Calls
+ <code>org.apache.poi.hslf.record.record</code> to build a tree
+ of all the records in the file, which it allows access to.
+ </li>
+ <li><code>org.apache.poi.hslf.record.Record</code>
+ Base class of all records. Also provides the main record generation
+ code, which will build up a tree of records for a file.
+ </li>
+ <li><code>org.apache.poi.hslf.usermodel.HSLFSlideShow</code>
+ Builds up model entries from the records, and presents a user facing
+ view of the file
+ </li>
+ <li><code>org.apache.poi.hslf.usermodel.HSLFSlide</code>
+ A user facing view of a Slide in a slideshow. Allows you to get at the
+ Text of the slide, and at any drawing objects on it.
+ </li>
+ <li><code>org.apache.poi.hslf.usermodel.HSLFTextParagraph</code>
+ A list of <code>HSLFTextParagraph</code>s holds all the text in a given area of the Slide, and will
+ contain one or more <code>HSLFTextRun</code>s.
+ </li>
+ <li><code>org.apache.poi.hslf.usermodel.HSLFTextRun</code>
+ Holds a run of text, all having the same character stylings. It is possible to modify text, and/or text stylings.
+ </li>
+ <li><code>org.apache.poi.sl.extractor.SlideShowExtractor</code>
+ Uses the model code to allow extraction of text from files
+ </li>
+ <li><code>org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor</code>
+ Uses the record code to extract all the text from files very fast,
+ but including deleted text (and other bits of Crud).
+ </li>
+ </ul>
+ </section>
+ </body>
+</document>