summaryrefslogtreecommitdiffstats
path: root/src/tokenizers/tokenizers.c
Commit message (Collapse)AuthorAgeFilesLines
* Another debian license fix.Vsevolod Stakhov2012-09-101-1/+1
| | | | | Add apache license for regexp that were delivered from SpamAssassin project. Fix debian/copyright for src/dns.c.
* Update copyright (required by debian).Vsevolod Stakhov2012-09-041-3/+3
|
* * Add configuration utils for kvstorageVsevolod Stakhov2011-10-171-3/+0
|
* Fix signness in arithmetic operations.Vsevolod Stakhov2011-08-041-1/+1
|
* * Fix build under CentOS 5 with old glib 2.12Vsevolod Stakhov2011-07-291-1/+1
| | | | | * Fix build of rspamd with CMAKE_BINARY_DIR differs from CMAKE_SOURCE_DIR Rework include style.
* * Add correcting factor to statistics.Vsevolod Stakhov2011-06-281-1/+1
| | | | | | Now learning increments version of a statfile. Avoid learning and classifying of similar text parts if a message has 2 text parts. Several fixes to statistics.
* Fix incorrect calculating of token length.Vsevolod Stakhov2011-06-271-2/+2
|
* * Welcome 0.4.0Vsevolod Stakhov2011-06-241-16/+48
| | | | | | | | | | | | | | | | | | Uncompatible changes: - Statistics is uncompatible in utf8 mode Major changes: - Improved utf8 mode - Convert all characters to lowercase in statistics - Skip URL's in statistics - Improve speed of bayes classifier by using integer arithmetics - Fixed statfiles synchronization that was broken for a long time - Synchronization is now configurable Minor changes: - Bugfixes - Removed some of legacy code - Types polishing
* * Skip short utf words in statisticsVsevolod Stakhov2011-06-031-2/+2
|
* * Major cleanup of cmake build systemVsevolod Stakhov2011-05-061-2/+2
| | | | | | * Add initial version of statshow utility for statfiles debugging * Add debugging for statistics * Remove unused utilities
* * Rewrite URL storage systemVsevolod Stakhov2011-02-241-32/+0
|
* * Write Emails: header in outputVsevolod Stakhov2011-02-111-1/+1
|
* * Tokenize subject using osb tokenizer.Vsevolod Stakhov2011-02-111-13/+5
|
* Fixes in classifying for small messages.Vsevolod Stakhov2011-01-251-1/+1
|
* * Many fixes to fuzzy hashes logic and tokenization.Vsevolod Stakhov2011-01-241-4/+33
|
* * Add Subject header to statisticsVsevolod Stakhov2010-12-241-0/+38
| | | | * Write log message about symbols that are removed when composite symbol is inserted
* * Fix shared usage of statfilesVsevolod Stakhov2010-09-161-2/+4
| | | | | | * Add invalidation of statfiles in case of learning, so now statfiles are invalidated in about a minute after learning * This should fix shared usage of statfile pool by several processes
* * Retab, no functional changesVsevolod Stakhov2009-10-021-48/+48
|
* * Make autolearn workingVsevolod Stakhov2009-07-091-5/+3
|
* * Rework url parsing algorithmsVsevolod Stakhov2009-06-021-1/+5
| | | | | | | * Adopt all parts of rspamd for new url parser * Improve url-extracter utility by avoiding cut&paste of mime parsing * Small fixes to rspamc client * Bump version to 0.1.3
* * Add functions to parse headers and urls into statfile tokensVsevolod Stakhov2009-03-161-0/+116
|
* * Prepare to migrate to cmake (still need to write install target and ↵Vsevolod Stakhov2009-02-161-1/+1
| | | | | | | | working with XS implicitly) * Move all system includes to one file where we detect availability of all that includes * Fix license misprint * Fix some issues with perl initializing
* * Add BSD license textVsevolod Stakhov2009-02-161-0/+24
|
* * Rewrite message parserVsevolod Stakhov2009-01-211-2/+4
| | | | | | * Change mime parts storage * Add html tags striping (ported from php code) * Rework learning to process only text and striped html parts
* * Rewrite perl client for rspamd, now it allows access to both normal and ↵Vsevolod Stakhov2009-01-191-2/+4
| | | | | | control interfaces * Fix small errors in tokenizer and controller interface
* * Fix errors in learning implementationVsevolod Stakhov2009-01-111-3/+7
|
* * Use binary tree in tokenizers, that would provide us fast checking for ↵Vsevolod Stakhov2008-12-041-0/+12
| | | | unique tokens and have O(log n) difficulty
* * Add learning interface to rspamd (still requires classifier to work)Vsevolod Stakhov2008-12-021-0/+18
|
* * Add simple implementation of OSB tokenizerVsevolod Stakhov2008-11-071-0/+45