aboutsummaryrefslogtreecommitdiffstats
path: root/src/libstat/tokenizers
Commit message (Expand)AuthorAgeFilesLines
* [Feature] Skip stop words in statisticsVsevolod Stakhov2018-11-152-19/+31
* [Fix] Rework bayes calculations...Vsevolod Stakhov2018-11-141-1/+1
* [Minor] Move subject tokenisation to a separate routineVsevolod Stakhov2018-11-082-3/+69
* [CritFix] Fix words decay one more time (affects long messages)Vsevolod Stakhov2018-09-251-4/+8
* [Fix] Fix words decay algorithmVsevolod Stakhov2018-09-111-1/+1
* [Minor] Properly set flag on text tokensVsevolod Stakhov2018-09-071-3/+4
* [Minor] Further fixes in tokenization algorithmVsevolod Stakhov2018-09-071-20/+28
* [Feature] Implement new text tokenizer based on libicuVsevolod Stakhov2018-09-062-203/+218
* [Rework] Rework utf content processing in text partsVsevolod Stakhov2018-09-052-5/+5
* [Project] Start unicode reworkVsevolod Stakhov2018-08-232-20/+28
* [Minor] Fix out-of-boundary accessVsevolod Stakhov2018-03-271-1/+1
* [Fix] Do not skip the last characterVsevolod Stakhov2017-10-311-0/+1
* [Fix] Do not try to dereference last characterVsevolod Stakhov2017-10-311-1/+8
* [Minor] Further g_slice cleanupVsevolod Stakhov2017-10-281-2/+2
* [Fix] Further tokenization fixesVsevolod Stakhov2017-10-211-1/+1
* [Fix] Deal with another case when processing exceptionsVsevolod Stakhov2017-10-211-0/+8
* [Fix] Do not strip last character in the last wordVsevolod Stakhov2017-10-211-2/+2
* [Fix] Fix another tokenization issueVsevolod Stakhov2017-10-211-1/+31
* [CritFix] Another portion of tokenization fixesVsevolod Stakhov2017-10-181-16/+19
* [Feature] Add unigramms support in bayesVsevolod Stakhov2017-04-131-0/+12
* [Minor] More strict boundaries checks and composites policies fixVsevolod Stakhov2017-04-091-0/+2
* [Fix] Fix processing of small tokens vectorsVsevolod Stakhov2017-04-041-3/+8
* [Rework] Set token data as uint64_t instead of chars arrayVsevolod Stakhov2017-04-042-17/+3
* [Minor] Some fixes for displaying tokens infoVsevolod Stakhov2017-03-311-2/+3
* [Feature] Store text tokens inside bayes tokensVsevolod Stakhov2017-03-312-11/+23
* [Minor] Fix various style issuesVsevolod Stakhov2017-03-231-1/+0
* [Minor] Use libicu for tokenizersVsevolod Stakhov2017-02-251-18/+22
* [Rework] Use a special structure for stats tokensVsevolod Stakhov2017-02-143-13/+26
* [Rework] Rework exceptions and newlines processingVsevolod Stakhov2016-07-131-9/+13
* [Fix] Switch hashes to mumhashVsevolod Stakhov2016-07-131-9/+12
* [Feature] New abstract hashing API in cryptoboxVsevolod Stakhov2016-05-101-3/+4
* Refactor UCL APIVsevolod Stakhov2016-02-161-5/+5
* Switch the rest to apache 2Vsevolod Stakhov2016-02-042-42/+24
* Fix tokenizationVsevolod Stakhov2016-01-052-119/+89
* Some more fixes to OSB algorithmVsevolod Stakhov2015-11-231-1/+4
* Implement words decaying for text parts.Vsevolod Stakhov2015-11-122-6/+65
* Fix format issues found by static analysisVsevolod Stakhov2015-11-112-2/+2
* Allow conditional build of snowball.Vsevolod Stakhov2015-10-231-1/+0
* Fix statistics.Vsevolod Stakhov2015-10-063-17/+16
* Rename main.h and main.c to `rspamd.X`Vsevolod Stakhov2015-09-222-2/+2
* More logging updates.Vsevolod Stakhov2015-08-291-7/+7
* Fix sqlite3 backend initialization.Vsevolod Stakhov2015-07-271-0/+1
* Some more fixes to tokenizator init.Vsevolod Stakhov2015-07-271-0/+6
* Fix issues with compatibility tokenization.Vsevolod Stakhov2015-07-271-1/+9
* Fix tokenizers and mmapped file.Vsevolod Stakhov2015-07-272-22/+57
* Fix stat processing.Vsevolod Stakhov2015-07-271-0/+4
* More changes to tokenization.Vsevolod Stakhov2015-07-271-2/+4
* Start tokenizers rework.Vsevolod Stakhov2015-07-271-4/+8
* Use new tokenization by default for created statfiles.Vsevolod Stakhov2015-07-271-1/+1
* Allow adding of prefix for tokenizers.Vsevolod Stakhov2015-07-262-4/+18