aboutsummaryrefslogtreecommitdiffstats
path: root/vendor/github.com/blevesearch/go-porterstemmer/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'vendor/github.com/blevesearch/go-porterstemmer/README.md')
-rw-r--r--vendor/github.com/blevesearch/go-porterstemmer/README.md118
1 files changed, 118 insertions, 0 deletions
diff --git a/vendor/github.com/blevesearch/go-porterstemmer/README.md b/vendor/github.com/blevesearch/go-porterstemmer/README.md
new file mode 100644
index 0000000000..d96911ace5
--- /dev/null
+++ b/vendor/github.com/blevesearch/go-porterstemmer/README.md
@@ -0,0 +1,118 @@
+# This fork...
+
+I'm maintaining this fork because the original author was not replying to issues or pull requests. For now I plan on maintaining this fork as necessary.
+
+## Status
+
+[![Build Status](https://travis-ci.org/blevesearch/go-porterstemmer.svg?branch=master)](https://travis-ci.org/blevesearch/go-porterstemmer)
+
+[![Coverage Status](https://coveralls.io/repos/blevesearch/go-porterstemmer/badge.png?branch=HEAD)](https://coveralls.io/r/blevesearch/go-porterstemmer?branch=HEAD)
+
+# Go Porter Stemmer
+
+A native Go clean room implementation of the Porter Stemming Algorithm.
+
+This algorithm is of interest to people doing Machine Learning or
+Natural Language Processing (NLP).
+
+This is NOT a port. This is a native Go implementation from the human-readable
+description of the algorithm.
+
+I've tried to make it (more) efficient by NOT internally using string's, but
+instead internally using []rune's and using the same (array) buffer used by
+the []rune slice (and sub-slices) at all steps of the algorithm.
+
+For Porter Stemmer algorithm, see:
+
+http://tartarus.org/martin/PorterStemmer/def.txt (URL #1)
+
+http://tartarus.org/martin/PorterStemmer/ (URL #2)
+
+# Departures
+
+Also, since when I initially implemented it, it failed the tests at...
+
+http://tartarus.org/martin/PorterStemmer/voc.txt (URL #3)
+
+http://tartarus.org/martin/PorterStemmer/output.txt (URL #4)
+
+... after reading the human-readble text over and over again to try to figure out
+what the error I made was (and doing all sorts of things to debug it) I came to the
+conclusion that the some of these tests were wrong according to the human-readable
+description of the algorithm.
+
+This led me to wonder if maybe other people's code that was passing these tests had
+rules that were not in the human-readable description. Which led me to look at the source
+code here...
+
+http://tartarus.org/martin/PorterStemmer/c.txt (URL #5)
+
+... When I looked there I noticed that there are some items marked as a "DEPARTURE",
+which differ from the original algorithm. (There are 2 of these.)
+
+I implemented these departures, and the tests at URL #3 and URL #4 all passed.
+
+## Usage
+
+To use this Golang library, use with something like:
+
+ package main
+
+ import (
+ "fmt"
+ "github.com/reiver/go-porterstemmer"
+ )
+
+ func main() {
+
+ word := "Waxes"
+
+ stem := porterstemmer.StemString(word)
+
+ fmt.Printf("The word [%s] has the stem [%s].\n", word, stem)
+ }
+
+Alternatively, if you want to be a bit more efficient, use []rune slices instead, with code like:
+
+ package main
+
+ import (
+ "fmt"
+ "github.com/reiver/go-porterstemmer"
+ )
+
+ func main() {
+
+ word := []rune("Waxes")
+
+ stem := porterstemmer.Stem(word)
+
+ fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
+ }
+
+Although NOTE that the above code may modify original slice (named "word" in the example) as a side
+effect, for efficiency reasons. And that the slice named "stem" in the example above may be a
+sub-slice of the slice named "word".
+
+Also alternatively, if you already know that your word is already lowercase (and you don't need
+this library to lowercase your word for you) you can instead use code like:
+
+ package main
+
+ import (
+ "fmt"
+ "github.com/reiver/go-porterstemmer"
+ )
+
+ func main() {
+
+ word := []rune("waxes")
+
+ stem := porterstemmer.StemWithoutLowerCasing(word)
+
+ fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
+ }
+
+Again NOTE (like with the previous example) that the above code may modify original slice (named
+"word" in the example) as a side effect, for efficiency reasons. And that the slice named "stem"
+in the example above may be a sub-slice of the slice named "word".