aboutsummaryrefslogtreecommitdiffstats
path: root/src/libmime/lang_detection.c
Commit message (Collapse)AuthorAgeFilesLines
* [Rework] Change the logic of skipping symbolsVsevolod Stakhov2024-09-041-1/+1
| | | | | We now do not skip pre/post filters even if the task result has reached threshold.
* [Fix] Fix another corner case that allows candidates to be freed without initVsevolod Stakhov2024-04-291-8/+9
|
* [Fix] Apply detection phase if fasttext could not detect languageVsevolod Stakhov2024-04-281-71/+93
| | | | Issue: #4929
* [Rework] Further types conversion (no functional changes)Vsevolod Stakhov2024-03-181-122/+122
|
* [Rework] Remove some of the GLib types in lieu of standard onesVsevolod Stakhov2024-03-181-5/+5
| | | | This types have constant conflicts with the system ones especially on OSX.
* [Fix] Do not save multipatterns to FS in certain casesVsevolod Stakhov2024-03-151-1/+1
|
* [Minor] Print some more statsVsevolod Stakhov2024-01-191-1/+5
|
* [Fix] Really fix the language detector statistical heuristicVsevolod Stakhov2024-01-181-12/+26
|
* [Fix] Make words selection random deterministic upon contentVsevolod Stakhov2024-01-181-12/+19
|
* [Rework] More steps to do refactoringVsevolod Stakhov2023-08-161-4/+4
|
* [Rework] Use clang-format to unify formatting in all sourcesVsevolod Stakhov2023-07-261-606/+612
| | | | No meaningful changes.
* [Minor] Add some more debug to the fasttext classifierVsevolod Stakhov2023-05-031-2/+2
|
* [Feature] Allow to use other methods when fasttext detection is enabledVsevolod Stakhov2023-05-021-1/+9
|
* [Fix] Feed fasttext language model with the pre-tokenized wordsVsevolod Stakhov2023-05-021-2/+1
|
* [Project] Some further fixesvstakhov-fasttext-langdetVsevolod Stakhov2023-04-291-16/+33
|
* [Fix] Ignore non-unique stop wordsVsevolod Stakhov2023-04-291-8/+30
|
* [Project] Implement fasttext language detectionVsevolod Stakhov2023-04-291-61/+108
|
* [Project] Show fasttext infoVsevolod Stakhov2023-04-291-2/+9
|
* Spelling (#4086)Josh Soref2022-02-221-32/+32
| | | [Rework] Massive spelling fix from @jsoref
* [Minor] More divisions by zeroVsevolod Stakhov2021-12-251-0/+4
|
* [Minor] Fix multipattern usageVsevolod Stakhov2020-08-041-8/+6
|
* [Rework] Refactor libraries structureVsevolod Stakhov2020-02-101-1/+1
| | | | | | | * Move logger implementation to libserver * Move fuzzy backend files to a separate subdir TODO: Move HTTP code from libutil
* [Minor] Add some more heuristics for stop words detectionVsevolod Stakhov2020-02-081-1/+39
|
* [Minor] Oops, fix format stringVsevolod Stakhov2020-02-071-1/+1
|
* [Minor] Further fixes in stop words detectionVsevolod Stakhov2020-02-071-14/+15
|
* [Fix] Ignore diacritics in chartable module for specific languagesVsevolod Stakhov2020-02-041-1/+1
| | | | Issue: #3156
* [Minor] Add diacritics flag for language detectorVsevolod Stakhov2020-02-041-9/+35
|
* [Minor] Langdet: Add threshold for stop wordsVsevolod Stakhov2019-08-021-0/+6
|
* [Minor] Langdet: Exclude exceptions (e.g. urls)Vsevolod Stakhov2019-08-021-0/+1
|
* [Minor] Show stop words foundVsevolod Stakhov2019-08-021-1/+10
|
* [Feature] Langdet: Limit number of stop words to be checkedVsevolod Stakhov2019-07-251-0/+5
|
* [Minor] Another try to plug a leakVsevolod Stakhov2019-06-261-0/+4
|
* [Minor] Plug more leaksVsevolod Stakhov2019-06-261-7/+2
|
* [Minor] Langdet: Improve debugging slightlyVsevolod Stakhov2019-06-051-0/+3
|
* [CritFix] Langdet: Fix language detection where no stop words foundVsevolod Stakhov2019-06-051-3/+20
|
* [Minor] Langdet: Increase cut-off limitVsevolod Stakhov2019-06-051-1/+1
|
* [Fix] Lang_det: Try better to distinguish Chinese and JapaneseVsevolod Stakhov2019-06-051-13/+47
|
* [Fix] Fix memory leak in language detector during reloadsVsevolod Stakhov2019-05-031-0/+2
|
* [Minor] Fix leakVsevolod Stakhov2019-02-261-1/+4
|
* [Minor] Fix loading of unicode multipatternsVsevolod Stakhov2019-02-141-0/+14
|
* [Rework] Slashing: Distinguish lualibdir, pluginsdir and sharedirVsevolod Stakhov2018-12-261-1/+1
|
* [Minor] Count words based on text wordsVsevolod Stakhov2018-11-301-3/+3
|
* [Fix] Fix double freeVsevolod Stakhov2018-11-291-4/+0
|
* [Minor] Another fail-safety checkVsevolod Stakhov2018-11-271-2/+5
|
* [Minor] Fix indefinite loop in language detectorVsevolod Stakhov2018-11-261-5/+4
|
* [Project] Finish basic tasks in new unicode projectVsevolod Stakhov2018-11-251-5/+5
|
* [Project] Rework language detector to work with ucs32Vsevolod Stakhov2018-11-251-30/+37
|
* [Project] Various unicode fixes in language detectorVsevolod Stakhov2018-11-251-40/+18
|
* [Project] Rework stemmingVsevolod Stakhov2018-11-241-16/+17
|
* [Project] Add function to normalize unicode on per words basisVsevolod Stakhov2018-11-241-1/+1
|