index
:
rspamd.git
external-maps
libev-migration
log_json
master
mime-rework
rdns-tcp-rework
rework-symcache
rspamd-0.5
rspamd-0.6
rspamd-0.7
rspamd-0.8
rspamd-0.9
rspamd-1.0
rspamd-1.1
rspamd-1.2
rspamd-1.3
rspamd-1.4
rspamd-1.5
rspamd-1.6
rspamd-1.9
rspamd-3.10
rspamd-3.7
rspamd-3.8
rspamd-3.9
torch-removal
vstakhov-anonymize-mime
vstakhov-another-grow-factor-fix
vstakhov-ci-try
vstakhov-conf-reorg
vstakhov-cpu-detection
vstakhov-cumulative-tcp-timeout
vstakhov-fasttext-langdet
vstakhov-fix-2047-encode
vstakhov-fix-dcc
vstakhov-fuzzy-cxx
vstakhov-fuzzy-limits-display
vstakhov-fuzzy-tcp
vstakhov-gpt-ollama
vstakhov-keypair-encoding
vstakhov-known-senders
vstakhov-llm-anonymize
vstakhov-llm-embeddings
vstakhov-lua-text-api
vstakhov-new-hiredis
vstakhov-openssl-provider-message
vstakhov-remove-control-block
vstakhov-some-build-fixes
vstakhov-ssl-fixes
vstakhov-stringzilla
vstakhov-strip-attachments
vstakhov-surbl-conf-fix
vstakhov-universal-hashing-lua
vstakhov-utf8-mime
vstakhov-zstd-headers
Rapid spam filtering system: https://github.com/rspamd/rspamd
www-data
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
src
/
libmime
/
lang_detection.c
Commit message (
Expand
)
Author
Age
Files
Lines
*
[Rework] Change the logic of skipping symbols
Vsevolod Stakhov
2024-09-04
1
-1
/
+1
*
[Fix] Fix another corner case that allows candidates to be freed without init
Vsevolod Stakhov
2024-04-29
1
-8
/
+9
*
[Fix] Apply detection phase if fasttext could not detect language
Vsevolod Stakhov
2024-04-28
1
-71
/
+93
*
[Rework] Further types conversion (no functional changes)
Vsevolod Stakhov
2024-03-18
1
-122
/
+122
*
[Rework] Remove some of the GLib types in lieu of standard ones
Vsevolod Stakhov
2024-03-18
1
-5
/
+5
*
[Fix] Do not save multipatterns to FS in certain cases
Vsevolod Stakhov
2024-03-15
1
-1
/
+1
*
[Minor] Print some more stats
Vsevolod Stakhov
2024-01-19
1
-1
/
+5
*
[Fix] Really fix the language detector statistical heuristic
Vsevolod Stakhov
2024-01-18
1
-12
/
+26
*
[Fix] Make words selection random deterministic upon content
Vsevolod Stakhov
2024-01-18
1
-12
/
+19
*
[Rework] More steps to do refactoring
Vsevolod Stakhov
2023-08-16
1
-4
/
+4
*
[Rework] Use clang-format to unify formatting in all sources
Vsevolod Stakhov
2023-07-26
1
-606
/
+612
*
[Minor] Add some more debug to the fasttext classifier
Vsevolod Stakhov
2023-05-03
1
-2
/
+2
*
[Feature] Allow to use other methods when fasttext detection is enabled
Vsevolod Stakhov
2023-05-02
1
-1
/
+9
*
[Fix] Feed fasttext language model with the pre-tokenized words
Vsevolod Stakhov
2023-05-02
1
-2
/
+1
*
[Project] Some further fixes
vstakhov-fasttext-langdet
Vsevolod Stakhov
2023-04-29
1
-16
/
+33
*
[Fix] Ignore non-unique stop words
Vsevolod Stakhov
2023-04-29
1
-8
/
+30
*
[Project] Implement fasttext language detection
Vsevolod Stakhov
2023-04-29
1
-61
/
+108
*
[Project] Show fasttext info
Vsevolod Stakhov
2023-04-29
1
-2
/
+9
*
Spelling (#4086)
Josh Soref
2022-02-22
1
-32
/
+32
*
[Minor] More divisions by zero
Vsevolod Stakhov
2021-12-25
1
-0
/
+4
*
[Minor] Fix multipattern usage
Vsevolod Stakhov
2020-08-04
1
-8
/
+6
*
[Rework] Refactor libraries structure
Vsevolod Stakhov
2020-02-10
1
-1
/
+1
*
[Minor] Add some more heuristics for stop words detection
Vsevolod Stakhov
2020-02-08
1
-1
/
+39
*
[Minor] Oops, fix format string
Vsevolod Stakhov
2020-02-07
1
-1
/
+1
*
[Minor] Further fixes in stop words detection
Vsevolod Stakhov
2020-02-07
1
-14
/
+15
*
[Fix] Ignore diacritics in chartable module for specific languages
Vsevolod Stakhov
2020-02-04
1
-1
/
+1
*
[Minor] Add diacritics flag for language detector
Vsevolod Stakhov
2020-02-04
1
-9
/
+35
*
[Minor] Langdet: Add threshold for stop words
Vsevolod Stakhov
2019-08-02
1
-0
/
+6
*
[Minor] Langdet: Exclude exceptions (e.g. urls)
Vsevolod Stakhov
2019-08-02
1
-0
/
+1
*
[Minor] Show stop words found
Vsevolod Stakhov
2019-08-02
1
-1
/
+10
*
[Feature] Langdet: Limit number of stop words to be checked
Vsevolod Stakhov
2019-07-25
1
-0
/
+5
*
[Minor] Another try to plug a leak
Vsevolod Stakhov
2019-06-26
1
-0
/
+4
*
[Minor] Plug more leaks
Vsevolod Stakhov
2019-06-26
1
-7
/
+2
*
[Minor] Langdet: Improve debugging slightly
Vsevolod Stakhov
2019-06-05
1
-0
/
+3
*
[CritFix] Langdet: Fix language detection where no stop words found
Vsevolod Stakhov
2019-06-05
1
-3
/
+20
*
[Minor] Langdet: Increase cut-off limit
Vsevolod Stakhov
2019-06-05
1
-1
/
+1
*
[Fix] Lang_det: Try better to distinguish Chinese and Japanese
Vsevolod Stakhov
2019-06-05
1
-13
/
+47
*
[Fix] Fix memory leak in language detector during reloads
Vsevolod Stakhov
2019-05-03
1
-0
/
+2
*
[Minor] Fix leak
Vsevolod Stakhov
2019-02-26
1
-1
/
+4
*
[Minor] Fix loading of unicode multipatterns
Vsevolod Stakhov
2019-02-14
1
-0
/
+14
*
[Rework] Slashing: Distinguish lualibdir, pluginsdir and sharedir
Vsevolod Stakhov
2018-12-26
1
-1
/
+1
*
[Minor] Count words based on text words
Vsevolod Stakhov
2018-11-30
1
-3
/
+3
*
[Fix] Fix double free
Vsevolod Stakhov
2018-11-29
1
-4
/
+0
*
[Minor] Another fail-safety check
Vsevolod Stakhov
2018-11-27
1
-2
/
+5
*
[Minor] Fix indefinite loop in language detector
Vsevolod Stakhov
2018-11-26
1
-5
/
+4
*
[Project] Finish basic tasks in new unicode project
Vsevolod Stakhov
2018-11-25
1
-5
/
+5
*
[Project] Rework language detector to work with ucs32
Vsevolod Stakhov
2018-11-25
1
-30
/
+37
*
[Project] Various unicode fixes in language detector
Vsevolod Stakhov
2018-11-25
1
-40
/
+18
*
[Project] Rework stemming
Vsevolod Stakhov
2018-11-24
1
-16
/
+17
*
[Project] Add function to normalize unicode on per words basis
Vsevolod Stakhov
2018-11-24
1
-1
/
+1
[next]