diff options
author | Vsevolod Stakhov <vsevolod@highsecure.ru> | 2016-07-08 11:32:43 +0100 |
---|---|---|
committer | Vsevolod Stakhov <vsevolod@highsecure.ru> | 2016-07-08 11:32:43 +0100 |
commit | ce2688ea374d49cac7fd3cbb669f0d50d606f0d3 (patch) | |
tree | 6b3de5213b61036d9fd306fcb2cfa7daea0b850c /doc/markdown/configuration | |
parent | dcf3b7caff4fe32a42d1dbdfa92e874c8ae0b74c (diff) | |
download | rspamd-ce2688ea374d49cac7fd3cbb669f0d50d606f0d3.tar.gz rspamd-ce2688ea374d49cac7fd3cbb669f0d50d606f0d3.zip |
[Doc] Massive documentation rework
Diffstat (limited to 'doc/markdown/configuration')
-rw-r--r-- | doc/markdown/configuration/composites.md | 4 | ||||
-rw-r--r-- | doc/markdown/configuration/index.md | 20 | ||||
-rw-r--r-- | doc/markdown/configuration/logging.md | 14 | ||||
-rw-r--r-- | doc/markdown/configuration/metrics.md | 20 | ||||
-rw-r--r-- | doc/markdown/configuration/options.md | 18 | ||||
-rw-r--r-- | doc/markdown/configuration/settings.md | 4 | ||||
-rw-r--r-- | doc/markdown/configuration/statistic.md | 24 |
7 files changed, 52 insertions, 52 deletions
diff --git a/doc/markdown/configuration/composites.md b/doc/markdown/configuration/composites.md index 90c633228..c5e97ed1d 100644 --- a/doc/markdown/configuration/composites.md +++ b/doc/markdown/configuration/composites.md @@ -1,4 +1,4 @@ -# rspamd composite symbols +# Rspamd composite symbols ## Introduction @@ -45,7 +45,7 @@ composite { } ~~~ -Composites should not be recursive; this is normally detected by rspamd. +Composites should not be recursive; this is normally detected by Rspamd. ## Composite weight rules diff --git a/doc/markdown/configuration/index.md b/doc/markdown/configuration/index.md index 6d480b7c0..6cc5e049e 100644 --- a/doc/markdown/configuration/index.md +++ b/doc/markdown/configuration/index.md @@ -1,25 +1,25 @@ -# rspamd configuration +# Rspamd configuration -rspamd uses the Universal Configuration Language (UCL) for its configuration. The UCL format is described in detail in this [document](ucl.md). rspamd defines several variables and macros to extend +rspamd uses the Universal Configuration Language (UCL) for its configuration. The UCL format is described in detail in this [document](ucl.md). Rspamd defines several variables and macros to extend UCL functionality. -## rspamd variables +## Rspamd variables -- *CONFDIR*: configuration directory for rspamd, found in `$PREFIX/etc/rspamd/` +- *CONFDIR*: configuration directory for Rspamd, found in `$PREFIX/etc/rspamd/` - *RUNDIR*: runtime directory to store pidfiles or unix sockets - *DBDIR*: persistent databases directory (used for statistics or symbols cache). - *LOGDIR*: a directory to store log files - *PLUGINSDIR*: plugins directory for lua plugins - *PREFIX*: basic installation prefix -- *VERSION*: rspamd version string (e.g. "0.6.6") +- *VERSION*: Rspamd version string (e.g. "0.6.6") -## rspamd specific macros +## Rspamd specific macros - *.include_map*: defines a map that is dynamically reloaded and updated if its content has changed. This macro is intended to define dynamic configuration files. -## rspamd basic configuration +## Rspamd basic configuration -The basic rspamd configuration is stored in `$CONFDIR/rspamd.conf`. By default, this file looks like this one: +The basic Rspamd configuration is stored in `$CONFDIR/rspamd.conf`. By default, this file looks like this one: ~~~ucl lua = "$CONFDIR/lua/rspamd.lua" @@ -39,7 +39,7 @@ modules { } ~~~ -In this file, we read a lua script placed in `$CONFDIR/lua/rspamd.lua` and load lua rules from it. Then we include a global [options](options.md) section followed by [logging](logging.md) logging configuration. The [metrics](metrics.md) section defines metric settings, including rule weights and rspamd actions. The [workers](../workers/index.md) section specifies rspamd workers settings. [Composites](composites.md) is a utility section that describes composite symbols. Statistical filters are defined in the [statistic](statistic.md) section. rspamd stores module configurations (for both lua and internal modules) in the [modules](../modules/index.md) section while modules themselves are loaded from the following portion of the configuration: +In this file, we read a lua script placed in `$CONFDIR/lua/rspamd.lua` and load lua rules from it. Then we include a global [options](options.md) section followed by [logging](logging.md) logging configuration. The [metrics](metrics.md) section defines metric settings, including rule weights and Rspamd actions. The [workers](../workers/index.md) section specifies Rspamd workers settings. [Composites](composites.md) is a utility section that describes composite symbols. Statistical filters are defined in the [statistic](statistic.md) section. Rspamd stores module configurations (for both lua and internal modules) in the [modules](../modules/index.md) section while modules themselves are loaded from the following portion of the configuration: ~~~ucl modules { @@ -49,4 +49,4 @@ modules { The modules section defines the path or paths of directories or specific files. If a directory is specified then all files with a `.lua` suffix are loaded as lua plugins (the directory path is treated as a `*.lua` shell pattern). -This configuration is not intended to be changed by the user, rather you can include your own configuration options as `.include`s. To redefine symbol weights and actions, it is recommended to use [dynamic configuration](settings.md). Nevertheless, the rspamd installation script will never overwrite a user's configuration if it exists already. Please read the rspamd changelog carefully, if you upgrade rspamd to a new version, for all incompatible configuration changes. +This configuration is not intended to be changed by the user, rather you can include your own configuration options as `.include`s. To redefine symbol weights and actions, it is recommended to use [dynamic configuration](settings.md). Nevertheless, the Rspamd installation script will never overwrite a user's configuration if it exists already. Please read the Rspamd changelog carefully, if you upgrade Rspamd to a new version, for all incompatible configuration changes. diff --git a/doc/markdown/configuration/logging.md b/doc/markdown/configuration/logging.md index b7c6c44b2..63e0799b7 100644 --- a/doc/markdown/configuration/logging.md +++ b/doc/markdown/configuration/logging.md @@ -1,4 +1,4 @@ -# rspamd logging settings +# Rspamd logging settings ## Introduction rspamd has a number of logging options. Firstly, there are three types of log output that are supported: console logging (just output log messages to console), file logging (output log messages to file) and logging via syslog. It is also possible to restrict logging to a specific level: @@ -8,17 +8,17 @@ rspamd has a number of logging options. Firstly, there are three types of log ou * `info` - log all non-debug messages * `debug` - log all including debug messages (huge amount of logging) -It is possible to turn on debug messages for specific ip addresses. This can be useful for testing. For each logging type there are special mandatory parameters: log facility for syslog (read `syslog(3)` man page for details about facilities), log file for file logging. Also, file logging may be buffered for performance. To reduce logging noise, rspamd detects sequential matching log messages and replaces them with a total number of repeats: +It is possible to turn on debug messages for specific ip addresses. This can be useful for testing. For each logging type there are special mandatory parameters: log facility for syslog (read `syslog(3)` man page for details about facilities), log file for file logging. Also, file logging may be buffered for performance. To reduce logging noise, Rspamd detects sequential matching log messages and replaces them with a total number of repeats: - #81123(fuzzy): May 11 19:41:54 rspamd file_log_function: Last message repeated 155 times - #81123(fuzzy): May 11 19:41:54 rspamd process_write_command: fuzzy hash was successfully added + #81123(fuzzy): May 11 19:41:54 Rspamd file_log_function: Last message repeated 155 times + #81123(fuzzy): May 11 19:41:54 Rspamd process_write_command: fuzzy hash was successfully added ## Unique id -From version 1.0, rspamd logs contain a unique id for each logging message. This allows finding relevant messages quickly. Moreover, there is now a `module` definition: for example, `task` or `cfg` modules. Here is a quick example of how it works: imagine that we have an incoming task for some message. Then you'd see something like this in the logs: +From version 1.0, Rspamd logs contain a unique id for each logging message. This allows finding relevant messages quickly. Moreover, there is now a `module` definition: for example, `task` or `cfg` modules. Here is a quick example of how it works: imagine that we have an incoming task for some message. Then you'd see something like this in the logs: 2015-09-02 16:41:59 #45015(normal) <ed2abb>; task; accept_socket: accepted connection from ::1 port 52895 - 2015-09-02 16:41:59 #45015(normal) <ed2abb>; task; rspamd_message_parse: loaded message; id: <F66099EE-BCAB-4D4F-A4FC-7C15A6686397@FreeBSD.org>; queue-id: <undef> + 2015-09-02 16:41:59 #45015(normal) <ed2abb>; task; Rspamd_message_parse: loaded message; id: <F66099EE-BCAB-4D4F-A4FC-7C15A6686397@FreeBSD.org>; queue-id: <undef> So the tag is `ed2abb` in this case. All subsequent processing related to this task will have the same tag. It is enabled not only on the `task` module, but also others, such as the `spf` or `lua` modules. For other modules, such as `cfg`, the tag is generated statically using a specific characteristic, for example the configuration file checksum. @@ -44,7 +44,7 @@ Here is summary of logging parameters: + `dkim` - messages from dkim module + `main` - messages from the main process + `dns` - messages from DNS resolver - + `map` - messages from maps in rspamd + + `map` - messages from maps in Rspamd + `logger` - messages from the logger itself ### Log format diff --git a/doc/markdown/configuration/metrics.md b/doc/markdown/configuration/metrics.md index 3ec495db1..8c6d55fdd 100644 --- a/doc/markdown/configuration/metrics.md +++ b/doc/markdown/configuration/metrics.md @@ -1,10 +1,10 @@ -# rspamd metrics settings +# Rspamd metrics settings ## Introduction -The metrics section configures weights for symbols and actions applied to a message by rspamd. You can imagine a metric as a decision made by rspamd for a specific message by a set of rules. Each rule can insert a `symbol` into the metric, which means that this rule is true for this message. Each symbol can have a floating point value called a `weight`, which means the significance of the corresponding rule. Rules with a positive weight increase the spam factor, while rules with negative weights increase the ham factor. The result is the overall message score. +The metrics section configures weights for symbols and actions applied to a message by Rspamd. You can imagine a metric as a decision made by Rspamd for a specific message by a set of rules. Each rule can insert a `symbol` into the metric, which means that this rule is true for this message. Each symbol can have a floating point value called a `weight`, which means the significance of the corresponding rule. Rules with a positive weight increase the spam factor, while rules with negative weights increase the ham factor. The result is the overall message score. -After a score is evaluated, rspamd selects an appropriate `action` for a message. rspamd defines the following actions, ordered by spam factor, in ascending order: +After a score is evaluated, Rspamd selects an appropriate `action` for a message. Rspamd defines the following actions, ordered by spam factor, in ascending order: 1. `no action` - a message is likely ham 2. `greylist` - a message should be greylisted to ensure sender's validity @@ -13,12 +13,12 @@ After a score is evaluated, rspamd selects an appropriate `action` for a message 5. `soft reject` - temporarily reject a message 6. `reject` - permanently reject a message -Actions are assumed to be applied simultaneously, meaning that the `add header` action implies, for example, the `greylist` action. `add header` and `rewrite subject` are equivalent to rspamd. They are just two options with the same purpose: to mark a message as probable spam. The `soft reject` action is mainly used to indicate temporary issues in mail delivery, for instance, exceeding a rate limit. +Actions are assumed to be applied simultaneously, meaning that the `add header` action implies, for example, the `greylist` action. `add header` and `rewrite subject` are equivalent to Rspamd. They are just two options with the same purpose: to mark a message as probable spam. The `soft reject` action is mainly used to indicate temporary issues in mail delivery, for instance, exceeding a rate limit. -There is also a special purpose metric called `default` that acts as the main metric to treat a message as spam or ham. Actually, all clients that use rspamd just check the default metric to determine whether a message is spam or ham. Therefore, the default configuration just defines the `default` metric. +There is also a special purpose metric called `default` that acts as the main metric to treat a message as spam or ham. Actually, all clients that use Rspamd just check the default metric to determine whether a message is spam or ham. Therefore, the default configuration just defines the `default` metric. ## Configuring metrics -Each metric is defined by a `metric` object in the rspamd configuration file. This object has one mandatory attribute - `name` - which defines the name of the metric: +Each metric is defined by a `metric` object in the Rspamd configuration file. This object has one mandatory attribute - `name` - which defines the name of the metric: ~~~ucl metric { @@ -41,9 +41,9 @@ $$ By default this value is `1.0` meaning that no weight growing is defined. By increasing this value you increase the effective score of messages with multiple `spam` rules matched. This value is not affected by negative score values. * `subject` - string value that is prepended to the message's subject if the `rewrite subject` action is applied -* `unknown_weight` - weight for unknown rules. If this parameter is specified, all rules can add symbols to this metric. If such a rule is not specified by this metric then its weight is equal to this option's value. Please note, that adding this option means that all rules will be checked by rspamd, on the contrary, if no `unknown_weight` metric is specified then rules that are not registered anywhere are silently ignored by rspamd. +* `unknown_weight` - weight for unknown rules. If this parameter is specified, all rules can add symbols to this metric. If such a rule is not specified by this metric then its weight is equal to this option's value. Please note, that adding this option means that all rules will be checked by Rspamd, on the contrary, if no `unknown_weight` metric is specified then rules that are not registered anywhere are silently ignored by Rspamd. -The content of this section is in two parts: symbols and actions. Actions is an object of all actions defined by this metric. If some actions are skipped, they won't be ever suggested by rspamd. The Actions section looks as follows: +The content of this section is in two parts: symbols and actions. Actions is an object of all actions defined by this metric. If some actions are skipped, they won't be ever suggested by Rspamd. The Actions section looks as follows: ~~~ucl metric { @@ -65,7 +65,7 @@ Symbols are defined by an object with the following properties: * `name` - symbolic name for a symbol (mandatory attribute) * `group` - a group of symbols, for example `DNSBL symbols` (as shown in WebUI) * `description` - optional symbolic description for WebUI -* `one_shot` - normally, rspamd inserts a symbol as many times as the corresponding rule matches for the specific message; however, if `one_shot` is `true` then only the **maximum** weight is added to the metric. `grow_factor` is correspondingly not modified by a repeated triggering of `one_shot` rules. +* `one_shot` - normally, Rspamd inserts a symbol as many times as the corresponding rule matches for the specific message; however, if `one_shot` is `true` then only the **maximum** weight is added to the metric. `grow_factor` is correspondingly not modified by a repeated triggering of `one_shot` rules. A symbol definition can look like this: @@ -82,7 +82,7 @@ A single metric can contain multiple symbols definitions. ## Symbol groups -Symbols can be grouped to specify their common functionality. For example, one could group all `RBL` symbols together. Moreover, from rspamd version 0.9 it is possible to specify a group score limit, which could be useful, for instance, if a specific group should not unconditionally send a message to the `spam` class. Here is an example of such a functionality: +Symbols can be grouped to specify their common functionality. For example, one could group all `RBL` symbols together. Moreover, from Rspamd version 0.9 it is possible to specify a group score limit, which could be useful, for instance, if a specific group should not unconditionally send a message to the `spam` class. Here is an example of such a functionality: ~~~ucl metric { diff --git a/doc/markdown/configuration/options.md b/doc/markdown/configuration/options.md index 0f08d3369..fcf524288 100644 --- a/doc/markdown/configuration/options.md +++ b/doc/markdown/configuration/options.md @@ -1,8 +1,8 @@ -# rspamd options settings +# Rspamd options settings ## Introduction -The options section defines basic rspamd behaviour. Options are global for all types of workers. The default options are shown in the following example snippet: +The options section defines basic Rspamd behaviour. Options are global for all types of workers. The default options are shown in the following example snippet: ~~~ucl filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check"; @@ -32,29 +32,29 @@ control_socket = "$DBDIR/rspamd.sock mode=0600"; ## Global options -* `filters`: comma separated string that defines enabled **internal** rspamd filters; for a list of the internal filters please check the [modules page](../modules/) +* `filters`: comma separated string that defines enabled **internal** Rspamd filters; for a list of the internal filters please check the [modules page](../modules/) * `one_shot`: if this flag is set to `true` then multiple rule triggers do not increase the total score of messages (however, this option can also be individually configured in the `metric` section for each symbol) -* `cache_file`: used to store information about rules and their statistics; this file is automatically generated if rspamd detects that a symbol's list has been changed. +* `cache_file`: used to store information about rules and their statistics; this file is automatically generated if Rspamd detects that a symbol's list has been changed. * `map_watch_interval`: interval between map scanning; the actual check interval is jittered to avoid simultaneous checking, so the real interval is from this value up to 2x this value * `check_all_filters`: turns off optimizations when a message gains an overall score more than the `reject` score for the default metric; this optimization can also be turned off for each request individually * `history_file`: this file is automatically created and refreshed on shutdown to preserve the rolling history of operations displayed by the WebUI across restarts * `temp_dir`: a directory for temporary files (can also be set via the environment variable `TMPDIR`). -* `url_tld`: path to file with top level domain suffixes used by rspamd to find URLs in messages; by default this file is shipped with rspamd and should not be touched manually -* `pid_file`: file used to store pid of the rspamd main process (not used with systemd) -* `min_word_len`: minimum size in letters (valid for utf-8 as well) for a sequence of characters to be treated as a word; normally rspamd skips sequences if they are shorter or equal to three symbols +* `url_tld`: path to file with top level domain suffixes used by Rspamd to find URLs in messages; by default this file is shipped with Rspamd and should not be touched manually +* `pid_file`: file used to store pid of the Rspamd main process (not used with systemd) +* `min_word_len`: minimum size in letters (valid for utf-8 as well) for a sequence of characters to be treated as a word; normally Rspamd skips sequences if they are shorter or equal to three symbols * `control_socket`: path/bind for the control socket * `classify_headers`: list of headers that are processed by statistics * `history_rows`: number of rows in the recent history table * `explicit_modules`: always load modules from the list even if they have no configuration section in the file * `disable_hyperscan`: disable hyperscan optimizations (if enabled at compile time) -* `cores_dir`: directory where rspamd should drop core files +* `cores_dir`: directory where Rspamd should drop core files * `max_cores_size`: maximum total size of core files that are placed in `cores_dir` * `max_cores_count`: maximum number of files in `cores_dir` * `local_addrs` or `local_networks`: map or list of ip networks used as local, so certain checks are skipped for them (e.g. SPF checks) ## DNS options -These options are in a separate subsection named `dns` and specify the behaviour of rspamd name resolution. Here is a list of available tunables: +These options are in a separate subsection named `dns` and specify the behaviour of Rspamd name resolution. Here is a list of available tunables: * `nameserver`: list (or array) of DNS servers to be used (if this option is skipped, then `/etc/resolv.conf` is parsed instead). It is also possible to specify weights of DNS servers to balance the payload, e.g. diff --git a/doc/markdown/configuration/settings.md b/doc/markdown/configuration/settings.md index 31b175344..0f4a70af2 100644 --- a/doc/markdown/configuration/settings.md +++ b/doc/markdown/configuration/settings.md @@ -1,8 +1,8 @@ -# rspamd user settings +# Rspamd user settings ## Introduction -rspamd allows exceptional control over the settings which will apply to incoming messages. Each setting can define a set of custom metric weights, symbols or actions. An administrator can also skip spam checks for certain messages completely, if required. rspamd settings can be loaded as dynamic maps and updated automatically if a corresponding file or URL has changed since its last update. +rspamd allows exceptional control over the settings which will apply to incoming messages. Each setting can define a set of custom metric weights, symbols or actions. An administrator can also skip spam checks for certain messages completely, if required. Rspamd settings can be loaded as dynamic maps and updated automatically if a corresponding file or URL has changed since its last update. To load settings as a dynamic map, you can set 'settings' to a map string: diff --git a/doc/markdown/configuration/statistic.md b/doc/markdown/configuration/statistic.md index 18e870652..26b2b70e7 100644 --- a/doc/markdown/configuration/statistic.md +++ b/doc/markdown/configuration/statistic.md @@ -2,7 +2,7 @@ ## Introduction -Statistics is used by rspamd to define the `class` of message: either spam or ham. The overall algorithm is based on Bayesian theorem +Statistics is used by Rspamd to define the `class` of message: either spam or ham. The overall algorithm is based on Bayesian theorem that defines probabilities combination. In general, it defines the probability of that a message belongs to the specified class (namely, `spam` or `ham`) base on the following factors: @@ -11,13 +11,13 @@ base on the following factors: ## Statistics Architecture -However, rspamd uses more advanced techniques to combine probabilities, such as sparsed bigramms (OSB) and inverse chi-square distribution. +However, Rspamd uses more advanced techniques to combine probabilities, such as sparsed bigramms (OSB) and inverse chi-square distribution. The key idea of `OSB` algorithm is to use not merely single words as tokens but combinations of words weighted by theirs positions. This schema is displayed in the following picture: ![OSB algorithm](https://rspamd.com/img/rspamd-schemes.004.png "Rspamd OSB scheme") -The main disadvantage is the amount of tokens which is multiplied by size of window. In rspamd, we use a window of 5 tokens that means that +The main disadvantage is the amount of tokens which is multiplied by size of window. In Rspamd, we use a window of 5 tokens that means that the number of tokens is about 5 times larger than the amount of words. Statistical tokens are stored in statfiles which, in turn, are mapped to specific backends. This architecture is displayed in the following image: @@ -26,7 +26,7 @@ Statistical tokens are stored in statfiles which, in turn, are mapped to specifi ## Statistics Configuration -Starting from rspamd 1.0, we propose to use `sqlite3` as backed and `osb` as tokenizer. That also enables additional features, such as tokens normalization and +Starting from Rspamd 1.0, we propose to use `sqlite3` as backed and `osb` as tokenizer. That also enables additional features, such as tokens normalization and metainformation in statistics. The following configuration demonstrates the recommended statistics configuration: ~~~ucl @@ -63,11 +63,11 @@ classifier "bayes" { } ~~~ -It is also possible to organize per-user statistics using sqlite3 backend. However, you should ensure that rspamd is called at the -finally delivery stage (e.g. LDA mode) to avoid multi-recipients messages. In case of a multi-recipient message, rspamd would just use the -first recipient for user-based statistics which might be inappropriate for your configuration (however, rspamd preferes SMTP recipients over MIME ones and prioritize +It is also possible to organize per-user statistics using sqlite3 backend. However, you should ensure that Rspamd is called at the +finally delivery stage (e.g. LDA mode) to avoid multi-recipients messages. In case of a multi-recipient message, Rspamd would just use the +first recipient for user-based statistics which might be inappropriate for your configuration (however, Rspamd preferes SMTP recipients over MIME ones and prioritize the special LDA header called `Deliver-To` that can be appended by `-d` options for `rspamc`). To enable per-user statistics, just add `users_enabled = true` property -to the **classifier** configuration. You can use per-user and per-language statistics simulataneously. For both types of spearation, rspamd also +to the **classifier** configuration. You can use per-user and per-language statistics simulataneously. For both types of spearation, Rspamd also looks to the default language and default user's statistics allowing to have the common set of tokens shared for all users/languages. ## Using lua scripts for `per_user` classifier @@ -115,10 +115,10 @@ EOD ## Applying per-user and per-language statistics -From version 1.1, rspamd uses independent statistics for users and joint statistics for languages. That means the following: +From version 1.1, Rspamd uses independent statistics for users and joint statistics for languages. That means the following: -* If `per_user` is enabled then rspamd looks for users statistics **only** -* If `per_language` is enabled then rspamd looks for language specific statistics **plus** language independent statistics +* If `per_user` is enabled then Rspamd looks for users statistics **only** +* If `per_language` is enabled then Rspamd looks for language specific statistics **plus** language independent statistics It is different from 1.0 version where the second approach was used for both cases. @@ -215,7 +215,7 @@ Where the last number is priority used to distinguish master from slave. ## Autolearning -From version 1.1, rspamd supports autolearning for statfiles. Autolearning is applied after all rules are processed (including statistics) if and only if the same symbol has not been inserted. E.g. a message won't be learned as spam if `BAYES_SPAM` is already in the results of checking. +From version 1.1, Rspamd supports autolearning for statfiles. Autolearning is applied after all rules are processed (including statistics) if and only if the same symbol has not been inserted. E.g. a message won't be learned as spam if `BAYES_SPAM` is already in the results of checking. There are 3 possibilities to specify autolearning: |