Starting from rspamd 1.0, we propose to use `sqlite3` as backed and `osb` as tokenizer. That also enables additional features, such as tokens normalization and
metainformation in statistics. The following configuration demonstrates the recommended statistics configuration:
-~~~nginx
+~~~ucl
classifier {
type = "bayes";
tokenizer {
It is also possible to create custom lua scripts to use customized user or language for a specific task. Here is an example
of such a script for extracting domain names from recipients organizing thus per-domain statistics:
-~~~nginx
+~~~ucl
classifier {
tokenizer {
name = "osb";
Rspamd allows to learn and to check multiple classifiers for a single messages. This might be useful, for example, if you have common and per user statistics. It is even possible to use the same statfiles for these purposes. Classifiers **might** have the same symbols (thought it is not recommended) and they should have a **unique** `name` attribute that is used for learning. Here is an example of such a configuration:
-~~~nginx
+~~~ucl
classifier {
tokenizer {
name = "osb";
From version 1.1, it is also possible to specify redis as a backend for statistics and cache of learned messages. Redis is recommended for clustered configurations as it allows simultaneous learn and checks and, besides, is very fast. To setup redis, you could use `redis` backend for a classifier (cache is set to the same servers accordingly).
-~~~nginx
+~~~ucl
classifier {
tokenizer {
name = "osb";
name = "bayes";
min_tokens = 11;
backend = "redis";
-
+ servers = "localhost:6379";
+ #write_servers = "localhost:6379"; # If needed another servers for learning
+ #password = "xxx"; # Optional password
+ #database = "2"; # Optional database id
statfile {
- servers = "127.0.0.1";
- write_servers = "127.0.0.1";
symbol = "BAYES_SPAM";
}
statfile {
- servers = "127.0.0.1";
- write_servers = "127.0.0.1";
symbol = "BAYES_HAM";
}
per_user = true;
~~~
`per_languages` is not supported by redis - it just stores everything in the same place. `write_servers` are used in the
-`master-slave` rotation by default and used for learning, whilst `read_servers` are selected randomly each time:
+`master-slave` rotation by default and used for learning, whilst `servers` are selected randomly each time:
write_servers = "master.example.com:6379:10, slave.example.com:6379:1"
write_servers = "master.example.com:6379, slave.example.com:6379"
* `autolearn = [1, 10]`: autolearn as ham if score is less than minimum of 2 numbers (< `1` here) and as spam if score is more than maximum of 2 numbers (> `10` in this case)
* `autolearn = "return function(task) ... end"`: use the following lua function to detect if autolearn is needed (function should return 'ham' if learn as ham is needed and string 'spam' if learn as spam is needed, if no learn is needed then a function can return anything including `nil`)
-Redis backend is highly recommended for autolearning purposes since it's the only backend with high concurrency level when multiple writers are properly synchronized.
\ No newline at end of file
+Redis backend is highly recommended for autolearning purposes since it's the only backend with high concurrency level when multiple writers are properly synchronized.