index
:
rspamd.git
external-maps
libev-migration
log_json
master
mime-rework
rdns-tcp-rework
rework-symcache
rspamd-0.5
rspamd-0.6
rspamd-0.7
rspamd-0.8
rspamd-0.9
rspamd-1.0
rspamd-1.1
rspamd-1.2
rspamd-1.3
rspamd-1.4
rspamd-1.5
rspamd-1.6
rspamd-1.9
rspamd-3.10
rspamd-3.7
rspamd-3.8
rspamd-3.9
torch-removal
vstakhov-anonymize-mime
vstakhov-another-grow-factor-fix
vstakhov-ci-try
vstakhov-conf-reorg
vstakhov-cpu-detection
vstakhov-cumulative-tcp-timeout
vstakhov-fasttext-langdet
vstakhov-fix-2047-encode
vstakhov-fix-dcc
vstakhov-fuzzy-cxx
vstakhov-fuzzy-limits-display
vstakhov-fuzzy-tcp
vstakhov-gpt-ollama
vstakhov-keypair-encoding
vstakhov-known-senders
vstakhov-llm-anonymize
vstakhov-llm-embeddings
vstakhov-lua-text-api
vstakhov-new-hiredis
vstakhov-openssl-provider-message
vstakhov-remove-control-block
vstakhov-some-build-fixes
vstakhov-ssl-fixes
vstakhov-stringzilla
vstakhov-strip-attachments
vstakhov-surbl-conf-fix
vstakhov-universal-hashing-lua
vstakhov-utf8-mime
vstakhov-zstd-headers
Rapid spam filtering system: https://github.com/rspamd/rspamd
www-data
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
src
/
libstat
/
tokenizers
Commit message (
Expand
)
Author
Age
Files
Lines
*
[Feature] Add unigramms support in bayes
Vsevolod Stakhov
2017-04-13
1
-0
/
+12
*
[Minor] More strict boundaries checks and composites policies fix
Vsevolod Stakhov
2017-04-09
1
-0
/
+2
*
[Fix] Fix processing of small tokens vectors
Vsevolod Stakhov
2017-04-04
1
-3
/
+8
*
[Rework] Set token data as uint64_t instead of chars array
Vsevolod Stakhov
2017-04-04
2
-17
/
+3
*
[Minor] Some fixes for displaying tokens info
Vsevolod Stakhov
2017-03-31
1
-2
/
+3
*
[Feature] Store text tokens inside bayes tokens
Vsevolod Stakhov
2017-03-31
2
-11
/
+23
*
[Minor] Fix various style issues
Vsevolod Stakhov
2017-03-23
1
-1
/
+0
*
[Minor] Use libicu for tokenizers
Vsevolod Stakhov
2017-02-25
1
-18
/
+22
*
[Rework] Use a special structure for stats tokens
Vsevolod Stakhov
2017-02-14
3
-13
/
+26
*
[Rework] Rework exceptions and newlines processing
Vsevolod Stakhov
2016-07-13
1
-9
/
+13
*
[Fix] Switch hashes to mumhash
Vsevolod Stakhov
2016-07-13
1
-9
/
+12
*
[Feature] New abstract hashing API in cryptobox
Vsevolod Stakhov
2016-05-10
1
-3
/
+4
*
Refactor UCL API
Vsevolod Stakhov
2016-02-16
1
-5
/
+5
*
Switch the rest to apache 2
Vsevolod Stakhov
2016-02-04
2
-42
/
+24
*
Fix tokenization
Vsevolod Stakhov
2016-01-05
2
-119
/
+89
*
Some more fixes to OSB algorithm
Vsevolod Stakhov
2015-11-23
1
-1
/
+4
*
Implement words decaying for text parts.
Vsevolod Stakhov
2015-11-12
2
-6
/
+65
*
Fix format issues found by static analysis
Vsevolod Stakhov
2015-11-11
2
-2
/
+2
*
Allow conditional build of snowball.
Vsevolod Stakhov
2015-10-23
1
-1
/
+0
*
Fix statistics.
Vsevolod Stakhov
2015-10-06
3
-17
/
+16
*
Rename main.h and main.c to `rspamd.X`
Vsevolod Stakhov
2015-09-22
2
-2
/
+2
*
More logging updates.
Vsevolod Stakhov
2015-08-29
1
-7
/
+7
*
Fix sqlite3 backend initialization.
Vsevolod Stakhov
2015-07-27
1
-0
/
+1
*
Some more fixes to tokenizator init.
Vsevolod Stakhov
2015-07-27
1
-0
/
+6
*
Fix issues with compatibility tokenization.
Vsevolod Stakhov
2015-07-27
1
-1
/
+9
*
Fix tokenizers and mmapped file.
Vsevolod Stakhov
2015-07-27
2
-22
/
+57
*
Fix stat processing.
Vsevolod Stakhov
2015-07-27
1
-0
/
+4
*
More changes to tokenization.
Vsevolod Stakhov
2015-07-27
1
-2
/
+4
*
Start tokenizers rework.
Vsevolod Stakhov
2015-07-27
1
-4
/
+8
*
Use new tokenization by default for created statfiles.
Vsevolod Stakhov
2015-07-27
1
-1
/
+1
*
Allow adding of prefix for tokenizers.
Vsevolod Stakhov
2015-07-26
2
-4
/
+18
*
Disable signatures detection as it breaks stuff.
Vsevolod Stakhov
2015-07-14
1
-1
/
+1
*
Implement skipping of signatures in text messages.
Vsevolod Stakhov
2015-07-14
2
-13
/
+35
*
Use not common name for tokenization exceptions.
Vsevolod Stakhov
2015-05-21
1
-2
/
+2
*
More fixes to tokenization.
Vsevolod Stakhov
2015-05-21
1
-4
/
+7
*
Fix critical bug in tokenization logic.
Vsevolod Stakhov
2015-05-20
1
-1
/
+1
*
Save OSB window index inside token.
Vsevolod Stakhov
2015-04-13
1
-0
/
+2
*
Use new siphash implementation.
Vsevolod Stakhov
2015-04-08
1
-5
/
+6
*
Fix tokenization of the last token in a message.
Vsevolod Stakhov
2015-04-02
1
-1
/
+1
*
Fix normalization and tokenization.
Vsevolod Stakhov
2015-04-02
1
-1
/
+3
*
Update remain on tokenization.
Vsevolod Stakhov
2015-04-01
1
-0
/
+1
*
Add new UTF8 tokenizer.
Vsevolod Stakhov
2015-04-01
2
-23
/
+142
*
Add compatibility layer for tokenization.
Vsevolod Stakhov
2015-04-01
3
-5
/
+76
*
Rework osb configuration.
Vsevolod Stakhov
2015-04-01
1
-56
/
+112
*
Save classifier configuration inside statfile config.
Vsevolod Stakhov
2015-04-01
2
-4
/
+1
*
Rework tokenization:
Vsevolod Stakhov
2015-02-23
3
-43
/
+124
*
Allow configurable tokenizers.
Vsevolod Stakhov
2015-02-22
2
-3
/
+3
*
Rework tokenization invocation.
Vsevolod Stakhov
2015-01-23
2
-40
/
+0
*
Add initial processing routines.
Vsevolod Stakhov
2015-01-23
2
-7
/
+6
*
Rework types for tokenizers functions.
Vsevolod Stakhov
2015-01-23
3
-27
/
+17
[next]