index
:
rspamd.git
external-maps
libev-migration
log_json
master
mime-rework
rdns-tcp-rework
rework-symcache
rspamd-0.5
rspamd-0.6
rspamd-0.7
rspamd-0.8
rspamd-0.9
rspamd-1.0
rspamd-1.1
rspamd-1.2
rspamd-1.3
rspamd-1.4
rspamd-1.5
rspamd-1.6
rspamd-1.9
rspamd-3.10
rspamd-3.7
rspamd-3.8
rspamd-3.9
torch-removal
vstakhov-anonymize-mime
vstakhov-another-grow-factor-fix
vstakhov-ci-try
vstakhov-conf-reorg
vstakhov-cpu-detection
vstakhov-cumulative-tcp-timeout
vstakhov-fasttext-langdet
vstakhov-fix-2047-encode
vstakhov-fix-dcc
vstakhov-fuzzy-cxx
vstakhov-fuzzy-limits-display
vstakhov-fuzzy-tcp
vstakhov-gpt-ollama
vstakhov-keypair-encoding
vstakhov-known-senders
vstakhov-llm-anonymize
vstakhov-llm-embeddings
vstakhov-lua-text-api
vstakhov-new-hiredis
vstakhov-openssl-provider-message
vstakhov-remove-control-block
vstakhov-some-build-fixes
vstakhov-ssl-fixes
vstakhov-stringzilla
vstakhov-strip-attachments
vstakhov-surbl-conf-fix
vstakhov-universal-hashing-lua
vstakhov-utf8-mime
vstakhov-zstd-headers
Rapid spam filtering system: https://github.com/rspamd/rspamd
www-data
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
src
/
libstat
/
tokenizers
/
tokenizers.h
Commit message (
Expand
)
Author
Age
Files
Lines
*
[Rework] Further types conversion (no functional changes)
Vsevolod Stakhov
2024-03-18
1
-16
/
+16
*
[Rework] Remove some of the GLib types in lieu of standard ones
Vsevolod Stakhov
2024-03-18
1
-1
/
+1
*
[Fix] Fix format string and some length issues
Vsevolod Stakhov
2023-09-26
1
-1
/
+17
*
[Rework] Use clang-format to unify formatting in all sources
Vsevolod Stakhov
2023-07-26
1
-34
/
+34
*
[Minor] Add safety check when using icu ubrk iterators
Vsevolod Stakhov
2019-10-24
1
-1
/
+2
*
[Rework] Add C++ guards to all headers
Vsevolod Stakhov
2019-07-08
1
-16
/
+27
*
[Project] Use more generalised API to produce meta words
Vsevolod Stakhov
2018-11-26
1
-2
/
+3
*
[Project] Another try to normalize unicode properly
Vsevolod Stakhov
2018-11-25
1
-0
/
+1
*
[Project] Rework stemming
Vsevolod Stakhov
2018-11-24
1
-4
/
+5
*
[Project] Add function to normalize unicode on per words basis
Vsevolod Stakhov
2018-11-24
1
-0
/
+4
*
[Feature] Skip stop words in statistics
Vsevolod Stakhov
2018-11-15
1
-11
/
+11
*
[Minor] Move subject tokenisation to a separate routine
Vsevolod Stakhov
2018-11-08
1
-0
/
+2
*
[Feature] Implement new text tokenizer based on libicu
Vsevolod Stakhov
2018-09-06
1
-0
/
+3
*
[Rework] Rework utf content processing in text parts
Vsevolod Stakhov
2018-09-05
1
-1
/
+1
*
[Project] Start unicode rework
Vsevolod Stakhov
2018-08-23
1
-3
/
+11
*
[Rework] Use a special structure for stats tokens
Vsevolod Stakhov
2017-02-14
1
-1
/
+1
*
Fix tokenization
Vsevolod Stakhov
2016-01-05
1
-25
/
+10
*
Implement words decaying for text parts.
Vsevolod Stakhov
2015-11-12
1
-2
/
+2
*
Fix statistics.
Vsevolod Stakhov
2015-10-06
1
-1
/
+1
*
Rename main.h and main.c to `rspamd.X`
Vsevolod Stakhov
2015-09-22
1
-1
/
+1
*
Fix tokenizers and mmapped file.
Vsevolod Stakhov
2015-07-27
1
-4
/
+8
*
Fix stat processing.
Vsevolod Stakhov
2015-07-27
1
-0
/
+4
*
More changes to tokenization.
Vsevolod Stakhov
2015-07-27
1
-2
/
+4
*
Start tokenizers rework.
Vsevolod Stakhov
2015-07-27
1
-4
/
+8
*
Allow adding of prefix for tokenizers.
Vsevolod Stakhov
2015-07-26
1
-2
/
+4
*
Implement skipping of signatures in text messages.
Vsevolod Stakhov
2015-07-14
1
-1
/
+2
*
Add new UTF8 tokenizer.
Vsevolod Stakhov
2015-04-01
1
-1
/
+1
*
Add compatibility layer for tokenization.
Vsevolod Stakhov
2015-04-01
1
-2
/
+12
*
Save classifier configuration inside statfile config.
Vsevolod Stakhov
2015-04-01
1
-3
/
+0
*
Rework tokenization:
Vsevolod Stakhov
2015-02-23
1
-1
/
+1
*
Allow configurable tokenizers.
Vsevolod Stakhov
2015-02-22
1
-2
/
+2
*
Rework tokenization invocation.
Vsevolod Stakhov
2015-01-23
1
-3
/
+0
*
Rework types for tokenizers functions.
Vsevolod Stakhov
2015-01-23
1
-12
/
+9
*
Reorganize libstat API.
Vsevolod Stakhov
2015-01-23
1
-0
/
+49