summaryrefslogtreecommitdiffstats
path: root/contrib/snowball/doc/libstemmer_c_README
diff options
context:
space:
mode:
Diffstat (limited to 'contrib/snowball/doc/libstemmer_c_README')
-rw-r--r--contrib/snowball/doc/libstemmer_c_README125
1 files changed, 125 insertions, 0 deletions
diff --git a/contrib/snowball/doc/libstemmer_c_README b/contrib/snowball/doc/libstemmer_c_README
new file mode 100644
index 000000000..9d3af8bec
--- /dev/null
+++ b/contrib/snowball/doc/libstemmer_c_README
@@ -0,0 +1,125 @@
+libstemmer_c
+============
+
+This document pertains to the C version of the libstemmer distribution,
+available for download from:
+
+http://snowball.tartarus.org/dist/libstemmer_c.tgz
+
+
+Compiling the library
+=====================
+
+A simple makefile is provided for Unix style systems. On such systems, it
+should be possible simply to run "make", and the file "libstemmer.o"
+and the example program "stemwords" will be generated.
+
+If this doesn't work on your system, you need to write your own build
+system (or call the compiler directly). The files to compile are
+all contained in the "libstemmer", "runtime" and "src_c" directories,
+and the public header file is contained in the "include" directory.
+
+The library comes in two flavours; UTF-8 only, and UTF-8 plus other character
+sets. To use the utf-8 only flavour, compile "libstemmer_utf8.c" instead of
+"libstemmer.c".
+
+For convenience "mkinc.mak" is a makefile fragment listing the source files and
+header files used to compile the standard version of the library.
+"mkinc_utf8.mak" is a comparable makefile fragment listing just the source
+files for the UTF-8 only version of the library.
+
+
+Using the library
+=================
+
+The library provides a simple C API. Essentially, a new stemmer can
+be obtained by using "sb_stemmer_new". "sb_stemmer_stem" is then
+used to stem a word, "sb_stemmer_length" returns the stemmed
+length of the last word processed, and "sb_stemmer_delete" is
+used to delete a stemmer.
+
+Creating a stemmer is a relatively expensive operation - the expected
+usage pattern is that a new stemmer is created when needed, used
+to stem many words, and deleted after some time.
+
+Stemmers are re-entrant, but not threadsafe. In other words, if
+you wish to access the same stemmer object from multiple threads,
+you must ensure that all access is protected by a mutex or similar
+device.
+
+libstemmer does not currently incorporate any mechanism for caching the results
+of stemming operations. Such caching can greatly increase the performance of a
+stemmer under certain situations, so suitable patches will be considered for
+inclusion.
+
+The standard libstemmer sources contain an algorithm for each of the supported
+languages. The algorithm may be selected using the english name of the
+language, or using the 2 or 3 letter ISO 639 language codes. In addition,
+the traditional "Porter" stemming algorithm for english is included for
+backwards compatibility purposes, but we recommend use of the "English"
+stemmer in preference for new projects.
+
+(Some minor algorithms which are included only as curiosities in the snowball
+website, such as the Lovins stemmer and the Kraaij Pohlmann stemmer, are not
+included in the standard libstemmer sources. These are not really supported by
+the snowball project, but it would be possible to compile a modified libstemmer
+library containing these if desired.)
+
+
+The stemwords example
+=====================
+
+The stemwords example program allows you to run any of the stemmers
+compiled into the libstemmer library on a sample vocabulary. For
+details on how to use it, run it with the "-h" command line option.
+
+
+Using the library in a larger system
+====================================
+
+If you are incorporating the library into the build system of a larger
+program, I recommend copying the unpacked tarball without modification into
+a subdirectory of the sources of your program. Future versions of the
+library are intended to keep the same structure, so this will keep the
+work required to move to a new version of the library to a minimum.
+
+As an additional convenience, the list of source and header files used
+in the library is detailed in mkinc.mak - a file which is in a suitable
+format for inclusion by a Makefile. By including this file in your build
+system, you can link the snowball system into your program with a few
+extra rules.
+
+Using the library in a system using GNU autotools
+=================================================
+
+The libstemmer_c library can be integrated into a larger system which uses the
+GNU autotool framework (and in particular, automake and autoconf) as follows:
+
+1) Unpack libstemmer_c.tgz in the top level project directory so that there is
+ a libstemmer_c subdirectory of the top level directory of the project.
+
+2) Add a file "Makefile.am" to the unpacked libstemmer_c folder, containing:
+
+noinst_LTLIBRARIES = libstemmer.la
+include $(srcdir)/mkinc.mak
+noinst_HEADERS = $(snowball_headers)
+libstemmer_la_SOURCES = $(snowball_sources)
+
+(You may also need to add other lines to this, for example, if you are using
+compiler options which are not compatible with compiling the libstemmer
+library.)
+
+3) Add libstemmer_c to the AC_CONFIG_FILES declaration in the project's
+ configure.ac file.
+
+4) Add to the top level makefile the following lines (or modify existing
+ assignments to these variables appropriately):
+
+AUTOMAKE_OPTIONS = subdir-objects
+AM_CPPFLAGS = -I$(top_srcdir)/libstemmer_c/include
+SUBDIRS=libstemmer_c
+<name>_LIBADD = libstemmer_c/libstemmer.la
+
+(Where <name> is the name of the library or executable which links against
+libstemmer.)
+