aboutsummaryrefslogtreecommitdiffstats
path: root/src/tokenizers/osb.c
Commit message (Collapse)AuthorAgeFilesLines
* Another debian license fix.Vsevolod Stakhov2012-09-101-1/+1
| | | | | Add apache license for regexp that were delivered from SpamAssassin project. Fix debian/copyright for src/dns.c.
* Update copyright (required by debian).Vsevolod Stakhov2012-09-041-3/+3
|
* * Add correcting factor to statistics.Vsevolod Stakhov2011-06-281-15/+14
| | | | | | Now learning increments version of a statfile. Avoid learning and classifying of similar text parts if a message has 2 text parts. Several fixes to statistics.
* Remove debug.Vsevolod Stakhov2011-06-271-1/+0
|
* Fix incorrect calculating of token length.Vsevolod Stakhov2011-06-271-0/+1
|
* * Welcome 0.4.0Vsevolod Stakhov2011-06-241-28/+29
| | | | | | | | | | | | | | | | | | Uncompatible changes: - Statistics is uncompatible in utf8 mode Major changes: - Improved utf8 mode - Convert all characters to lowercase in statistics - Skip URL's in statistics - Improve speed of bayes classifier by using integer arithmetics - Fixed statfiles synchronization that was broken for a long time - Synchronization is now configurable Minor changes: - Bugfixes - Removed some of legacy code - Types polishing
* * Skip short utf words in statisticsVsevolod Stakhov2011-06-031-3/+10
|
* * Rework build process:Vsevolod Stakhov2011-05-101-1/+2
| | | | | | | | | | - add librspamdserver - link this library to all daemons and utils of rspamd - use subdirectories more often * Rework global variables logic - move them to the main process * Fix logging to handle utf-8 correctly * Add statshow utility and make it working * Move printf functions to separate source file
* * Major cleanup of cmake build systemVsevolod Stakhov2011-05-061-1/+4
| | | | | | * Add initial version of statshow utility for statfiles debugging * Add debugging for statistics * Remove unused utilities
* * Implement new learning system, now rspamd should be much more intelligent ↵Vsevolod Stakhov2010-05-271-1/+1
| | | | while learning messages
* * Introduce new logging system:Vsevolod Stakhov2009-12-221-2/+0
| | | | | | | | | - independent and customizeable buffering - line buffering - errors handling support - custom (ip based) debug - append function name automaticaly (based on __FUNCTION__) - add some logic to logs system
* * Retab, no functional changesVsevolod Stakhov2009-10-021-14/+14
|
* * Make autolearn workingVsevolod Stakhov2009-07-091-2/+8
|
* * Add functions to parse headers and urls into statfile tokensVsevolod Stakhov2009-03-161-13/+1
|
* * Prepare to migrate to cmake (still need to write install target and ↵Vsevolod Stakhov2009-02-161-1/+1
| | | | | | | | working with XS implicitly) * Move all system includes to one file where we detect availability of all that includes * Fix license misprint * Fix some issues with perl initializing
* * Add BSD license textVsevolod Stakhov2009-02-161-0/+24
|
* * Rewrite message parserVsevolod Stakhov2009-01-211-10/+11
| | | | | | * Change mime parts storage * Add html tags striping (ported from php code) * Rework learning to process only text and striped html parts
* * Rewrite perl client for rspamd, now it allows access to both normal and ↵Vsevolod Stakhov2009-01-191-0/+1
| | | | | | control interfaces * Fix small errors in tokenizer and controller interface
* * Fix errors in learning implementationVsevolod Stakhov2009-01-111-1/+1
|
* * Use binary tree in tokenizers, that would provide us fast checking for ↵Vsevolod Stakhov2008-12-041-15/+12
| | | | unique tokens and have O(log n) difficulty
* * Add learning interface to rspamd (still requires classifier to work)Vsevolod Stakhov2008-12-021-2/+2
|
* * Add simple implementation of OSB tokenizerVsevolod Stakhov2008-11-071-0/+69