aboutsummaryrefslogtreecommitdiffstats
path: root/src/tokenizers/tokenizers.h
Commit message (Collapse)AuthorAgeFilesLines
* Refactor worker task structure and API.Vsevolod Stakhov2014-04-211-2/+2
|
* Refactor memory pool naming.Vsevolod Stakhov2014-04-201-3/+3
|
* Parse classifiers and statfiles in ucl.Vsevolod Stakhov2013-11-071-1/+1
|
* New chi2square based bayes normalizer.Vsevolod Stakhov2013-05-231-1/+1
|
* * Fix build under CentOS 5 with old glib 2.12Vsevolod Stakhov2011-07-291-9/+4
| | | | | * Fix build of rspamd with CMAKE_BINARY_DIR differs from CMAKE_SOURCE_DIR Rework include style.
* * Welcome 0.4.0Vsevolod Stakhov2011-06-241-7/+9
| | | | | | | | | | | | | | | | | | Uncompatible changes: - Statistics is uncompatible in utf8 mode Major changes: - Improved utf8 mode - Convert all characters to lowercase in statistics - Skip URL's in statistics - Improve speed of bayes classifier by using integer arithmetics - Fixed statfiles synchronization that was broken for a long time - Synchronization is now configurable Minor changes: - Bugfixes - Removed some of legacy code - Types polishing
* * Skip short utf words in statisticsVsevolod Stakhov2011-06-031-2/+2
|
* * Major cleanup of cmake build systemVsevolod Stakhov2011-05-061-2/+2
| | | | | | * Add initial version of statshow utility for statfiles debugging * Add debugging for statistics * Remove unused utilities
* * Rewrite URL storage systemVsevolod Stakhov2011-02-241-2/+0
|
* * Add Subject header to statisticsVsevolod Stakhov2010-12-241-0/+2
| | | | * Write log message about symbols that are removed when composite symbol is inserted
* * Implement new learning system, now rspamd should be much more intelligent ↵Vsevolod Stakhov2010-05-271-0/+1
| | | | while learning messages
* * Add binlog API implementationVsevolod Stakhov2009-11-061-0/+1
|
* * Add functions to parse headers and urls into statfile tokensVsevolod Stakhov2009-03-161-0/+4
|
* * Rewrite message parserVsevolod Stakhov2009-01-211-2/+2
| | | | | | * Change mime parts storage * Add html tags striping (ported from php code) * Rework learning to process only text and striped html parts
* * Use binary tree in tokenizers, that would provide us fast checking for ↵Vsevolod Stakhov2008-12-041-6/+6
| | | | unique tokens and have O(log n) difficulty
* * Add learning interface to rspamd (still requires classifier to work)Vsevolod Stakhov2008-12-021-0/+15
|
* * Add simple implementation of OSB tokenizerVsevolod Stakhov2008-11-071-0/+29