aboutsummaryrefslogtreecommitdiffstats
path: root/src/tokenizers
Commit message (Collapse)AuthorAgeFilesLines
* * Add configuration utils for kvstorageVsevolod Stakhov2011-10-171-3/+0
|
* Fix signness in arithmetic operations.Vsevolod Stakhov2011-08-041-1/+1
|
* * Fix build under CentOS 5 with old glib 2.12Vsevolod Stakhov2011-07-292-10/+5
| | | | | * Fix build of rspamd with CMAKE_BINARY_DIR differs from CMAKE_SOURCE_DIR Rework include style.
* * Add correcting factor to statistics.Vsevolod Stakhov2011-06-282-16/+15
| | | | | | Now learning increments version of a statfile. Avoid learning and classifying of similar text parts if a message has 2 text parts. Several fixes to statistics.
* Remove debug.Vsevolod Stakhov2011-06-271-1/+0
|
* Fix incorrect calculating of token length.Vsevolod Stakhov2011-06-272-2/+3
|
* * Welcome 0.4.0Vsevolod Stakhov2011-06-243-51/+86
| | | | | | | | | | | | | | | | | | Uncompatible changes: - Statistics is uncompatible in utf8 mode Major changes: - Improved utf8 mode - Convert all characters to lowercase in statistics - Skip URL's in statistics - Improve speed of bayes classifier by using integer arithmetics - Fixed statfiles synchronization that was broken for a long time - Synchronization is now configurable Minor changes: - Bugfixes - Removed some of legacy code - Types polishing
* * Skip short utf words in statisticsVsevolod Stakhov2011-06-033-7/+14
|
* * Rework build process:Vsevolod Stakhov2011-05-101-1/+2
| | | | | | | | | | - add librspamdserver - link this library to all daemons and utils of rspamd - use subdirectories more often * Rework global variables logic - move them to the main process * Fix logging to handle utf-8 correctly * Add statshow utility and make it working * Move printf functions to separate source file
* * Major cleanup of cmake build systemVsevolod Stakhov2011-05-063-5/+8
| | | | | | * Add initial version of statshow utility for statfiles debugging * Add debugging for statistics * Remove unused utilities
* * Rewrite URL storage systemVsevolod Stakhov2011-02-242-34/+0
|
* * Write Emails: header in outputVsevolod Stakhov2011-02-111-1/+1
|
* * Tokenize subject using osb tokenizer.Vsevolod Stakhov2011-02-111-13/+5
|
* Fixes in classifying for small messages.Vsevolod Stakhov2011-01-251-1/+1
|
* * Many fixes to fuzzy hashes logic and tokenization.Vsevolod Stakhov2011-01-241-4/+33
|
* * Add Subject header to statisticsVsevolod Stakhov2010-12-242-0/+40
| | | | * Write log message about symbols that are removed when composite symbol is inserted
* * Fix shared usage of statfilesVsevolod Stakhov2010-09-161-2/+4
| | | | | | * Add invalidation of statfiles in case of learning, so now statfiles are invalidated in about a minute after learning * This should fix shared usage of statfile pool by several processes
* * Implement new learning system, now rspamd should be much more intelligent ↵Vsevolod Stakhov2010-05-272-1/+2
| | | | while learning messages
* * Introduce new logging system:Vsevolod Stakhov2009-12-221-2/+0
| | | | | | | | | - independent and customizeable buffering - line buffering - errors handling support - custom (ip based) debug - append function name automaticaly (based on __FUNCTION__) - add some logic to logs system
* * Add binlog API implementationVsevolod Stakhov2009-11-061-0/+1
|
* * Retab, no functional changesVsevolod Stakhov2009-10-022-62/+62
|
* * Make autolearn workingVsevolod Stakhov2009-07-092-7/+11
|
* * Rework url parsing algorithmsVsevolod Stakhov2009-06-021-1/+5
| | | | | | | * Adopt all parts of rspamd for new url parser * Improve url-extracter utility by avoiding cut&paste of mime parsing * Small fixes to rspamc client * Bump version to 0.1.3
* * Add functions to parse headers and urls into statfile tokensVsevolod Stakhov2009-03-163-13/+121
|
* * Prepare to migrate to cmake (still need to write install target and ↵Vsevolod Stakhov2009-02-162-2/+2
| | | | | | | | working with XS implicitly) * Move all system includes to one file where we detect availability of all that includes * Fix license misprint * Fix some issues with perl initializing
* * Add BSD license textVsevolod Stakhov2009-02-162-0/+48
|
* * Rewrite message parserVsevolod Stakhov2009-01-213-14/+17
| | | | | | * Change mime parts storage * Add html tags striping (ported from php code) * Rework learning to process only text and striped html parts
* * Rewrite perl client for rspamd, now it allows access to both normal and ↵Vsevolod Stakhov2009-01-192-2/+5
| | | | | | control interfaces * Fix small errors in tokenizer and controller interface
* * Fix errors in learning implementationVsevolod Stakhov2009-01-112-4/+8
|
* * Use binary tree in tokenizers, that would provide us fast checking for ↵Vsevolod Stakhov2008-12-043-21/+30
| | | | unique tokens and have O(log n) difficulty
* * Add learning interface to rspamd (still requires classifier to work)Vsevolod Stakhov2008-12-023-2/+35
|
* * Add simple implementation of OSB tokenizerVsevolod Stakhov2008-11-073-0/+143