From ce2688ea374d49cac7fd3cbb669f0d50d606f0d3 Mon Sep 17 00:00:00 2001 From: Vsevolod Stakhov Date: Fri, 8 Jul 2016 11:32:43 +0100 Subject: [PATCH] [Doc] Massive documentation rework --- doc/markdown/architecture/index.md | 50 ++++++++-------- doc/markdown/architecture/protocol.md | 20 +++---- doc/markdown/configuration/composites.md | 4 +- doc/markdown/configuration/index.md | 20 +++---- doc/markdown/configuration/logging.md | 14 ++--- doc/markdown/configuration/metrics.md | 20 +++---- doc/markdown/configuration/options.md | 18 +++--- doc/markdown/configuration/settings.md | 4 +- doc/markdown/configuration/statistic.md | 24 ++++---- doc/markdown/index.md | 49 ++++++++++++---- doc/markdown/lua/index.md | 72 ++++++++++++------------ doc/markdown/tutorials/migrate_sa.md | 42 +++++++------- doc/markdown/tutorials/writing_rules.md | 40 ++++++------- doc/markdown/workers/index.md | 8 +-- 14 files changed, 205 insertions(+), 180 deletions(-) diff --git a/doc/markdown/architecture/index.md b/doc/markdown/architecture/index.md index f93ce9818..e4959728c 100644 --- a/doc/markdown/architecture/index.md +++ b/doc/markdown/architecture/index.md @@ -1,8 +1,8 @@ -# rspamd architecture +# Rspamd architecture ## Introduction -rspamd is a universal spam filtering system based on an event-driven processing model, which means that rspamd is not intended to block anywhere in the code. To process messages rspamd uses a set of `rules`. Each `rule` is a symbolic name associated with a message property. For example, we can define the following rules: +rspamd is a universal spam filtering system based on an event-driven processing model, which means that Rspamd is not intended to block anywhere in the code. To process messages Rspamd uses a set of `rules`. Each `rule` is a symbolic name associated with a message property. For example, we can define the following rules: - `SPF_ALLOW` - means that a message is validated by SPF; - `BAYES_SPAM` - means that a message is statistically considered as spam; @@ -29,7 +29,7 @@ rspamd uses the HTTP protocol for all operations. This protocol is described in ## Metrics -Rules in rspamd define a logic of checks, but it is required to set up weights for each rule. (For rspamd, weight means `significance`.) Rules with a greater absolute value of weight are considered more important. The weight of rules is defined in `metrics`. Each metric is a set of grouped rules with specific weights. For example, we may define the following weights for our SPF rules: +Rules in Rspamd define a logic of checks, but it is required to set up weights for each rule. (For Rspamd, weight means `significance`.) Rules with a greater absolute value of weight are considered more important. The weight of rules is defined in `metrics`. Each metric is a set of grouped rules with specific weights. For example, we may define the following weights for our SPF rules: - `SPF_ALLOW`: -1 - `SPF_DENY`: 2 @@ -39,7 +39,7 @@ Positive weights mean that this rule increases a messages 'spammyness', while ne ### Rules scheduler -To avoid unnecessary checks rspamd uses a scheduler of rules for each message. If a message is considered as definite spam then further checks are not performed. This scheduler is rather naive and it performs the following logic: +To avoid unnecessary checks Rspamd uses a scheduler of rules for each message. If a message is considered as definite spam then further checks are not performed. This scheduler is rather naive and it performs the following logic: - select negative rules *before* positive ones to prevent false positives; - prefer rules with the following characteristics: @@ -49,11 +49,11 @@ To avoid unnecessary checks rspamd uses a scheduler of rules for each message. I These optimizations can filter definite spam more quickly than a generic queue. -Since rspamd-0.9 there are further optimizations for rules and expressions that are described generally in the [following presentation](http://highsecure.ru/ast-rspamd.pdf). +Since Rspamd-0.9 there are further optimizations for rules and expressions that are described generally in the [following presentation](http://highsecure.ru/ast-rspamd.pdf). ## Actions -Another important property of metrics is their actions set. This set defines recommended actions for a message if it reaches a certain score defined by all rules which have been triggered. rspamd defines the following actions: +Another important property of metrics is their actions set. This set defines recommended actions for a message if it reaches a certain score defined by all rules which have been triggered. Rspamd defines the following actions: - `No action`: a message is likely to be ham; - `Greylist`: greylist a message if it is not certainly ham; @@ -65,21 +65,21 @@ These actions are just recommendations for the MTA and are not to be strictly fo ## Rules weight -The weight of rules is not necessarily constant. For example, for statistics rules we have no certain confidence if a message is spam or not; instead we have a measure of probability. To allow fuzzy rules weight, rspamd supports `dynamic weights`. Generally, it means that a rule may add a dynamic range from 0 to a defined weight in the metric. So if we define the symbol `BAYES_SPAM` with a weight of 5.0, then this rule can add a resulting symbol with a weight from 0 to 5.0. To distribute values, rspamd uses a form of Sigma function to provide a fair distribution curve. The majority of rspamd rules, with the exception of fuzzy rules, use static weights. +The weight of rules is not necessarily constant. For example, for statistics rules we have no certain confidence if a message is spam or not; instead we have a measure of probability. To allow fuzzy rules weight, Rspamd supports `dynamic weights`. Generally, it means that a rule may add a dynamic range from 0 to a defined weight in the metric. So if we define the symbol `BAYES_SPAM` with a weight of 5.0, then this rule can add a resulting symbol with a weight from 0 to 5.0. To distribute values, Rspamd uses a form of Sigma function to provide a fair distribution curve. The majority of Rspamd rules, with the exception of fuzzy rules, use static weights. ## Statistics -rspamd uses statistic algorithms to precisely calculate the final score of a message. Currently, the only algorithm defined is OSB-Bayes. You can find details of this algorithm in the following [paper](http://osbf-lua.luaforge.net/papers/osbf-eddc.pdf). rspamd uses a window size of 5 words in its classification. During the classification procedure, rspamd splits a message into a set of tokens. Tokens are separated by punctuation or whitespace characters. Short tokens (less than 3 symbols) are ignored. For each token, rspamd calculates two non-cryptographic hashes used subsequently as indices. All these tokens are stored in different statistics backends (mmapped files, sqlite3 database or redis server). Currently, the recommended backend for statistics is `redis`. +rspamd uses statistic algorithms to precisely calculate the final score of a message. Currently, the only algorithm defined is OSB-Bayes. You can find details of this algorithm in the following [paper](http://osbf-lua.luaforge.net/papers/osbf-eddc.pdf). Rspamd uses a window size of 5 words in its classification. During the classification procedure, Rspamd splits a message into a set of tokens. Tokens are separated by punctuation or whitespace characters. Short tokens (less than 3 symbols) are ignored. For each token, Rspamd calculates two non-cryptographic hashes used subsequently as indices. All these tokens are stored in different statistics backends (mmapped files, sqlite3 database or redis server). Currently, the recommended backend for statistics is `redis`. -## Running rspamd +## Running Rspamd -There are several command-line options that can be passed to rspamd. All of them can be displayed by passing the `--help` argument. +There are several command-line options that can be passed to Rspamd. All of them can be displayed by passing the `--help` argument. -All options are optional: by default rspamd will try to read the `etc/rspamd.conf` config file and run as a daemon. Also there is a test mode that can be turned on by passing the `-t` argument. In test mode, rspamd reads the config file and checks its syntax. If a configuration file is OK, the exit code is zero. Test mode is useful for testing new config files without restarting rspamd. +All options are optional: by default Rspamd will try to read the `etc/rspamd.conf` config file and run as a daemon. Also there is a test mode that can be turned on by passing the `-t` argument. In test mode, Rspamd reads the config file and checks its syntax. If a configuration file is OK, the exit code is zero. Test mode is useful for testing new config files without restarting Rspamd. -## Managing rspamd using signals +## Managing Rspamd using signals -It is important to note that all user signals should be sent to the rspamd main process and not to its children (as for child processes these signals can have other meanings). You can identify the main process: +It is important to note that all user signals should be sent to the Rspamd main process and not to its children (as for child processes these signals can have other meanings). You can identify the main process: - by reading the pidfile: @@ -87,20 +87,20 @@ It is important to note that all user signals should be sent to the rspamd main - by getting process info: - $ ps auxwww | grep rspamd - nobody 28378 0.0 0.2 49744 9424 rspamd: main process - nobody 64082 0.0 0.2 50784 9520 rspamd: worker process - nobody 64083 0.0 0.3 51792 11036 rspamd: worker process - nobody 64084 0.0 2.7 158288 114200 rspamd: controller process - nobody 64085 0.0 1.8 116304 75228 rspamd: fuzzy storage + $ ps auxwww | grep Rspamd + nobody 28378 0.0 0.2 49744 9424 Rspamd: main process + nobody 64082 0.0 0.2 50784 9520 Rspamd: worker process + nobody 64083 0.0 0.3 51792 11036 Rspamd: worker process + nobody 64084 0.0 2.7 158288 114200 Rspamd: controller process + nobody 64085 0.0 1.8 116304 75228 Rspamd: fuzzy storage - $ ps auxwww | grep rspamd | grep main - nobody 28378 0.0 0.2 49744 9424 rspamd: main process + $ ps auxwww | grep Rspamd | grep main + nobody 28378 0.0 0.2 49744 9424 Rspamd: main process -After getting the pid of the main process it is possible to manage rspamd with signals, as follows: +After getting the pid of the main process it is possible to manage Rspamd with signals, as follows: -- `SIGHUP` - restart rspamd: reread config file, start new workers (as well as controller and other processes), stop accepting connections by old workers, reopen all log files. Note that old workers would be terminated after one minute which should allow processing of all pending requests. All new requests to rspamd will be processed by the newly started workers. -- `SIGTERM` - terminate rspamd. +- `SIGHUP` - restart Rspamd: reread config file, start new workers (as well as controller and other processes), stop accepting connections by old workers, reopen all log files. Note that old workers would be terminated after one minute which should allow processing of all pending requests. All new requests to Rspamd will be processed by the newly started workers. +- `SIGTERM` - terminate Rspamd. - `SIGUSR1` - reopen log files (useful for log file rotation). -These signals may be used in rc-style scripts. Restarting of rspamd is performed softly: no connections are dropped and if a new config is incorrect then the old config is used. +These signals may be used in rc-style scripts. Restarting of Rspamd is performed softly: no connections are dropped and if a new config is incorrect then the old config is used. diff --git a/doc/markdown/architecture/protocol.md b/doc/markdown/architecture/protocol.md index 56e1da1d4..09bffd4d1 100644 --- a/doc/markdown/architecture/protocol.md +++ b/doc/markdown/architecture/protocol.md @@ -1,10 +1,10 @@ -# rspamd protocol +# Rspamd protocol ## Protocol basics -rspamd uses the HTTP protocol, either version 1.0 or 1.1. (There is also a compatibility layer described further in this document.) rspamd defines some headers which allow the passing of extra information about a scanned message, such as envelope data, IP address or SMTP sasl authentication data, etc. rspamd supports normal and chunked encoded HTTP requests. +rspamd uses the HTTP protocol, either version 1.0 or 1.1. (There is also a compatibility layer described further in this document.) Rspamd defines some headers which allow the passing of extra information about a scanned message, such as envelope data, IP address or SMTP sasl authentication data, etc. Rspamd supports normal and chunked encoded HTTP requests. -## rspamd HTTP request +## Rspamd HTTP request rspamd encourages the use of the HTTP protocol since it is standard and can be used by every programming language without the use of exotic libraries. A typical HTTP request looks like the following: @@ -28,7 +28,7 @@ Normally, you should just use '/check' here. However, if you want to communicate ### HTTP headers -To avoid unnecessary work, rspamd allows an MTA to pass pre-processed data about the message by using either HTTP headers or a JSON control block (described further in this document). rspamd supports the following non-standard HTTP headers: +To avoid unnecessary work, Rspamd allows an MTA to pass pre-processed data about the message by using either HTTP headers or a JSON control block (described further in this document). Rspamd supports the following non-standard HTTP headers: | Header | Description | | :-------------- | :-------------------------------- | @@ -50,13 +50,13 @@ Controller also defines certain headers: Standard HTTP headers, such as `Content-Length`, are also supported. -## rspamd HTTP reply +## Rspamd HTTP reply rspamd reply is encoded in `JSON`. Here is a typical HTTP reply: HTTP/1.1 200 OK Connection: close - Server: rspamd/0.9.0 + Server: Rspamd/0.9.0 Date: Mon, 30 Mar 2015 16:19:35 GMT Content-Length: 825 Content-Type: application/json @@ -131,13 +131,13 @@ Additional keys which may be in the reply include: * `urls` - a list of urls found in a message (only hostnames) * `emails` - a list of emails found in a message * `message-id` - ID of message (useful for logging) -* `messages` - array of optional messages added by rspamd filters (such as `SPF`) +* `messages` - array of optional messages added by Rspamd filters (such as `SPF`) -## rspamd JSON control block +## Rspamd JSON control block -Since rspamd version 0.9 it is also possible to pass additional data by prepending a JSON control block to a message. So you can use either headers or a JSON block to pass data from the MTA to rspamd. +Since Rspamd version 0.9 it is also possible to pass additional data by prepending a JSON control block to a message. So you can use either headers or a JSON block to pass data from the MTA to Rspamd. -To use a JSON control block, you need to pass an extra header called `Message-Length` to rspamd. This header should be equal to the size of the message **excluding** the JSON control block. Therefore, the size of the control block is equal to `Content-Length - Message-Length`. rspamd assumes that a message starts immediately after the control block (with no extra CRLF). This method is equally compatible with streaming transfer, however even if you are not specifying `Content-Length` you are still required to specify `Message-Length`. +To use a JSON control block, you need to pass an extra header called `Message-Length` to Rspamd. This header should be equal to the size of the message **excluding** the JSON control block. Therefore, the size of the control block is equal to `Content-Length - Message-Length`. Rspamd assumes that a message starts immediately after the control block (with no extra CRLF). This method is equally compatible with streaming transfer, however even if you are not specifying `Content-Length` you are still required to specify `Message-Length`. Here is an example of a JSON control block: diff --git a/doc/markdown/configuration/composites.md b/doc/markdown/configuration/composites.md index 90c633228..c5e97ed1d 100644 --- a/doc/markdown/configuration/composites.md +++ b/doc/markdown/configuration/composites.md @@ -1,4 +1,4 @@ -# rspamd composite symbols +# Rspamd composite symbols ## Introduction @@ -45,7 +45,7 @@ composite { } ~~~ -Composites should not be recursive; this is normally detected by rspamd. +Composites should not be recursive; this is normally detected by Rspamd. ## Composite weight rules diff --git a/doc/markdown/configuration/index.md b/doc/markdown/configuration/index.md index 6d480b7c0..6cc5e049e 100644 --- a/doc/markdown/configuration/index.md +++ b/doc/markdown/configuration/index.md @@ -1,25 +1,25 @@ -# rspamd configuration +# Rspamd configuration -rspamd uses the Universal Configuration Language (UCL) for its configuration. The UCL format is described in detail in this [document](ucl.md). rspamd defines several variables and macros to extend +rspamd uses the Universal Configuration Language (UCL) for its configuration. The UCL format is described in detail in this [document](ucl.md). Rspamd defines several variables and macros to extend UCL functionality. -## rspamd variables +## Rspamd variables -- *CONFDIR*: configuration directory for rspamd, found in `$PREFIX/etc/rspamd/` +- *CONFDIR*: configuration directory for Rspamd, found in `$PREFIX/etc/rspamd/` - *RUNDIR*: runtime directory to store pidfiles or unix sockets - *DBDIR*: persistent databases directory (used for statistics or symbols cache). - *LOGDIR*: a directory to store log files - *PLUGINSDIR*: plugins directory for lua plugins - *PREFIX*: basic installation prefix -- *VERSION*: rspamd version string (e.g. "0.6.6") +- *VERSION*: Rspamd version string (e.g. "0.6.6") -## rspamd specific macros +## Rspamd specific macros - *.include_map*: defines a map that is dynamically reloaded and updated if its content has changed. This macro is intended to define dynamic configuration files. -## rspamd basic configuration +## Rspamd basic configuration -The basic rspamd configuration is stored in `$CONFDIR/rspamd.conf`. By default, this file looks like this one: +The basic Rspamd configuration is stored in `$CONFDIR/rspamd.conf`. By default, this file looks like this one: ~~~ucl lua = "$CONFDIR/lua/rspamd.lua" @@ -39,7 +39,7 @@ modules { } ~~~ -In this file, we read a lua script placed in `$CONFDIR/lua/rspamd.lua` and load lua rules from it. Then we include a global [options](options.md) section followed by [logging](logging.md) logging configuration. The [metrics](metrics.md) section defines metric settings, including rule weights and rspamd actions. The [workers](../workers/index.md) section specifies rspamd workers settings. [Composites](composites.md) is a utility section that describes composite symbols. Statistical filters are defined in the [statistic](statistic.md) section. rspamd stores module configurations (for both lua and internal modules) in the [modules](../modules/index.md) section while modules themselves are loaded from the following portion of the configuration: +In this file, we read a lua script placed in `$CONFDIR/lua/rspamd.lua` and load lua rules from it. Then we include a global [options](options.md) section followed by [logging](logging.md) logging configuration. The [metrics](metrics.md) section defines metric settings, including rule weights and Rspamd actions. The [workers](../workers/index.md) section specifies Rspamd workers settings. [Composites](composites.md) is a utility section that describes composite symbols. Statistical filters are defined in the [statistic](statistic.md) section. Rspamd stores module configurations (for both lua and internal modules) in the [modules](../modules/index.md) section while modules themselves are loaded from the following portion of the configuration: ~~~ucl modules { @@ -49,4 +49,4 @@ modules { The modules section defines the path or paths of directories or specific files. If a directory is specified then all files with a `.lua` suffix are loaded as lua plugins (the directory path is treated as a `*.lua` shell pattern). -This configuration is not intended to be changed by the user, rather you can include your own configuration options as `.include`s. To redefine symbol weights and actions, it is recommended to use [dynamic configuration](settings.md). Nevertheless, the rspamd installation script will never overwrite a user's configuration if it exists already. Please read the rspamd changelog carefully, if you upgrade rspamd to a new version, for all incompatible configuration changes. +This configuration is not intended to be changed by the user, rather you can include your own configuration options as `.include`s. To redefine symbol weights and actions, it is recommended to use [dynamic configuration](settings.md). Nevertheless, the Rspamd installation script will never overwrite a user's configuration if it exists already. Please read the Rspamd changelog carefully, if you upgrade Rspamd to a new version, for all incompatible configuration changes. diff --git a/doc/markdown/configuration/logging.md b/doc/markdown/configuration/logging.md index b7c6c44b2..63e0799b7 100644 --- a/doc/markdown/configuration/logging.md +++ b/doc/markdown/configuration/logging.md @@ -1,4 +1,4 @@ -# rspamd logging settings +# Rspamd logging settings ## Introduction rspamd has a number of logging options. Firstly, there are three types of log output that are supported: console logging (just output log messages to console), file logging (output log messages to file) and logging via syslog. It is also possible to restrict logging to a specific level: @@ -8,17 +8,17 @@ rspamd has a number of logging options. Firstly, there are three types of log ou * `info` - log all non-debug messages * `debug` - log all including debug messages (huge amount of logging) -It is possible to turn on debug messages for specific ip addresses. This can be useful for testing. For each logging type there are special mandatory parameters: log facility for syslog (read `syslog(3)` man page for details about facilities), log file for file logging. Also, file logging may be buffered for performance. To reduce logging noise, rspamd detects sequential matching log messages and replaces them with a total number of repeats: +It is possible to turn on debug messages for specific ip addresses. This can be useful for testing. For each logging type there are special mandatory parameters: log facility for syslog (read `syslog(3)` man page for details about facilities), log file for file logging. Also, file logging may be buffered for performance. To reduce logging noise, Rspamd detects sequential matching log messages and replaces them with a total number of repeats: - #81123(fuzzy): May 11 19:41:54 rspamd file_log_function: Last message repeated 155 times - #81123(fuzzy): May 11 19:41:54 rspamd process_write_command: fuzzy hash was successfully added + #81123(fuzzy): May 11 19:41:54 Rspamd file_log_function: Last message repeated 155 times + #81123(fuzzy): May 11 19:41:54 Rspamd process_write_command: fuzzy hash was successfully added ## Unique id -From version 1.0, rspamd logs contain a unique id for each logging message. This allows finding relevant messages quickly. Moreover, there is now a `module` definition: for example, `task` or `cfg` modules. Here is a quick example of how it works: imagine that we have an incoming task for some message. Then you'd see something like this in the logs: +From version 1.0, Rspamd logs contain a unique id for each logging message. This allows finding relevant messages quickly. Moreover, there is now a `module` definition: for example, `task` or `cfg` modules. Here is a quick example of how it works: imagine that we have an incoming task for some message. Then you'd see something like this in the logs: 2015-09-02 16:41:59 #45015(normal) ; task; accept_socket: accepted connection from ::1 port 52895 - 2015-09-02 16:41:59 #45015(normal) ; task; rspamd_message_parse: loaded message; id: ; queue-id: + 2015-09-02 16:41:59 #45015(normal) ; task; Rspamd_message_parse: loaded message; id: ; queue-id: So the tag is `ed2abb` in this case. All subsequent processing related to this task will have the same tag. It is enabled not only on the `task` module, but also others, such as the `spf` or `lua` modules. For other modules, such as `cfg`, the tag is generated statically using a specific characteristic, for example the configuration file checksum. @@ -44,7 +44,7 @@ Here is summary of logging parameters: + `dkim` - messages from dkim module + `main` - messages from the main process + `dns` - messages from DNS resolver - + `map` - messages from maps in rspamd + + `map` - messages from maps in Rspamd + `logger` - messages from the logger itself ### Log format diff --git a/doc/markdown/configuration/metrics.md b/doc/markdown/configuration/metrics.md index 3ec495db1..8c6d55fdd 100644 --- a/doc/markdown/configuration/metrics.md +++ b/doc/markdown/configuration/metrics.md @@ -1,10 +1,10 @@ -# rspamd metrics settings +# Rspamd metrics settings ## Introduction -The metrics section configures weights for symbols and actions applied to a message by rspamd. You can imagine a metric as a decision made by rspamd for a specific message by a set of rules. Each rule can insert a `symbol` into the metric, which means that this rule is true for this message. Each symbol can have a floating point value called a `weight`, which means the significance of the corresponding rule. Rules with a positive weight increase the spam factor, while rules with negative weights increase the ham factor. The result is the overall message score. +The metrics section configures weights for symbols and actions applied to a message by Rspamd. You can imagine a metric as a decision made by Rspamd for a specific message by a set of rules. Each rule can insert a `symbol` into the metric, which means that this rule is true for this message. Each symbol can have a floating point value called a `weight`, which means the significance of the corresponding rule. Rules with a positive weight increase the spam factor, while rules with negative weights increase the ham factor. The result is the overall message score. -After a score is evaluated, rspamd selects an appropriate `action` for a message. rspamd defines the following actions, ordered by spam factor, in ascending order: +After a score is evaluated, Rspamd selects an appropriate `action` for a message. Rspamd defines the following actions, ordered by spam factor, in ascending order: 1. `no action` - a message is likely ham 2. `greylist` - a message should be greylisted to ensure sender's validity @@ -13,12 +13,12 @@ After a score is evaluated, rspamd selects an appropriate `action` for a message 5. `soft reject` - temporarily reject a message 6. `reject` - permanently reject a message -Actions are assumed to be applied simultaneously, meaning that the `add header` action implies, for example, the `greylist` action. `add header` and `rewrite subject` are equivalent to rspamd. They are just two options with the same purpose: to mark a message as probable spam. The `soft reject` action is mainly used to indicate temporary issues in mail delivery, for instance, exceeding a rate limit. +Actions are assumed to be applied simultaneously, meaning that the `add header` action implies, for example, the `greylist` action. `add header` and `rewrite subject` are equivalent to Rspamd. They are just two options with the same purpose: to mark a message as probable spam. The `soft reject` action is mainly used to indicate temporary issues in mail delivery, for instance, exceeding a rate limit. -There is also a special purpose metric called `default` that acts as the main metric to treat a message as spam or ham. Actually, all clients that use rspamd just check the default metric to determine whether a message is spam or ham. Therefore, the default configuration just defines the `default` metric. +There is also a special purpose metric called `default` that acts as the main metric to treat a message as spam or ham. Actually, all clients that use Rspamd just check the default metric to determine whether a message is spam or ham. Therefore, the default configuration just defines the `default` metric. ## Configuring metrics -Each metric is defined by a `metric` object in the rspamd configuration file. This object has one mandatory attribute - `name` - which defines the name of the metric: +Each metric is defined by a `metric` object in the Rspamd configuration file. This object has one mandatory attribute - `name` - which defines the name of the metric: ~~~ucl metric { @@ -41,9 +41,9 @@ $$ By default this value is `1.0` meaning that no weight growing is defined. By increasing this value you increase the effective score of messages with multiple `spam` rules matched. This value is not affected by negative score values. * `subject` - string value that is prepended to the message's subject if the `rewrite subject` action is applied -* `unknown_weight` - weight for unknown rules. If this parameter is specified, all rules can add symbols to this metric. If such a rule is not specified by this metric then its weight is equal to this option's value. Please note, that adding this option means that all rules will be checked by rspamd, on the contrary, if no `unknown_weight` metric is specified then rules that are not registered anywhere are silently ignored by rspamd. +* `unknown_weight` - weight for unknown rules. If this parameter is specified, all rules can add symbols to this metric. If such a rule is not specified by this metric then its weight is equal to this option's value. Please note, that adding this option means that all rules will be checked by Rspamd, on the contrary, if no `unknown_weight` metric is specified then rules that are not registered anywhere are silently ignored by Rspamd. -The content of this section is in two parts: symbols and actions. Actions is an object of all actions defined by this metric. If some actions are skipped, they won't be ever suggested by rspamd. The Actions section looks as follows: +The content of this section is in two parts: symbols and actions. Actions is an object of all actions defined by this metric. If some actions are skipped, they won't be ever suggested by Rspamd. The Actions section looks as follows: ~~~ucl metric { @@ -65,7 +65,7 @@ Symbols are defined by an object with the following properties: * `name` - symbolic name for a symbol (mandatory attribute) * `group` - a group of symbols, for example `DNSBL symbols` (as shown in WebUI) * `description` - optional symbolic description for WebUI -* `one_shot` - normally, rspamd inserts a symbol as many times as the corresponding rule matches for the specific message; however, if `one_shot` is `true` then only the **maximum** weight is added to the metric. `grow_factor` is correspondingly not modified by a repeated triggering of `one_shot` rules. +* `one_shot` - normally, Rspamd inserts a symbol as many times as the corresponding rule matches for the specific message; however, if `one_shot` is `true` then only the **maximum** weight is added to the metric. `grow_factor` is correspondingly not modified by a repeated triggering of `one_shot` rules. A symbol definition can look like this: @@ -82,7 +82,7 @@ A single metric can contain multiple symbols definitions. ## Symbol groups -Symbols can be grouped to specify their common functionality. For example, one could group all `RBL` symbols together. Moreover, from rspamd version 0.9 it is possible to specify a group score limit, which could be useful, for instance, if a specific group should not unconditionally send a message to the `spam` class. Here is an example of such a functionality: +Symbols can be grouped to specify their common functionality. For example, one could group all `RBL` symbols together. Moreover, from Rspamd version 0.9 it is possible to specify a group score limit, which could be useful, for instance, if a specific group should not unconditionally send a message to the `spam` class. Here is an example of such a functionality: ~~~ucl metric { diff --git a/doc/markdown/configuration/options.md b/doc/markdown/configuration/options.md index 0f08d3369..fcf524288 100644 --- a/doc/markdown/configuration/options.md +++ b/doc/markdown/configuration/options.md @@ -1,8 +1,8 @@ -# rspamd options settings +# Rspamd options settings ## Introduction -The options section defines basic rspamd behaviour. Options are global for all types of workers. The default options are shown in the following example snippet: +The options section defines basic Rspamd behaviour. Options are global for all types of workers. The default options are shown in the following example snippet: ~~~ucl filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check"; @@ -32,29 +32,29 @@ control_socket = "$DBDIR/rspamd.sock mode=0600"; ## Global options -* `filters`: comma separated string that defines enabled **internal** rspamd filters; for a list of the internal filters please check the [modules page](../modules/) +* `filters`: comma separated string that defines enabled **internal** Rspamd filters; for a list of the internal filters please check the [modules page](../modules/) * `one_shot`: if this flag is set to `true` then multiple rule triggers do not increase the total score of messages (however, this option can also be individually configured in the `metric` section for each symbol) -* `cache_file`: used to store information about rules and their statistics; this file is automatically generated if rspamd detects that a symbol's list has been changed. +* `cache_file`: used to store information about rules and their statistics; this file is automatically generated if Rspamd detects that a symbol's list has been changed. * `map_watch_interval`: interval between map scanning; the actual check interval is jittered to avoid simultaneous checking, so the real interval is from this value up to 2x this value * `check_all_filters`: turns off optimizations when a message gains an overall score more than the `reject` score for the default metric; this optimization can also be turned off for each request individually * `history_file`: this file is automatically created and refreshed on shutdown to preserve the rolling history of operations displayed by the WebUI across restarts * `temp_dir`: a directory for temporary files (can also be set via the environment variable `TMPDIR`). -* `url_tld`: path to file with top level domain suffixes used by rspamd to find URLs in messages; by default this file is shipped with rspamd and should not be touched manually -* `pid_file`: file used to store pid of the rspamd main process (not used with systemd) -* `min_word_len`: minimum size in letters (valid for utf-8 as well) for a sequence of characters to be treated as a word; normally rspamd skips sequences if they are shorter or equal to three symbols +* `url_tld`: path to file with top level domain suffixes used by Rspamd to find URLs in messages; by default this file is shipped with Rspamd and should not be touched manually +* `pid_file`: file used to store pid of the Rspamd main process (not used with systemd) +* `min_word_len`: minimum size in letters (valid for utf-8 as well) for a sequence of characters to be treated as a word; normally Rspamd skips sequences if they are shorter or equal to three symbols * `control_socket`: path/bind for the control socket * `classify_headers`: list of headers that are processed by statistics * `history_rows`: number of rows in the recent history table * `explicit_modules`: always load modules from the list even if they have no configuration section in the file * `disable_hyperscan`: disable hyperscan optimizations (if enabled at compile time) -* `cores_dir`: directory where rspamd should drop core files +* `cores_dir`: directory where Rspamd should drop core files * `max_cores_size`: maximum total size of core files that are placed in `cores_dir` * `max_cores_count`: maximum number of files in `cores_dir` * `local_addrs` or `local_networks`: map or list of ip networks used as local, so certain checks are skipped for them (e.g. SPF checks) ## DNS options -These options are in a separate subsection named `dns` and specify the behaviour of rspamd name resolution. Here is a list of available tunables: +These options are in a separate subsection named `dns` and specify the behaviour of Rspamd name resolution. Here is a list of available tunables: * `nameserver`: list (or array) of DNS servers to be used (if this option is skipped, then `/etc/resolv.conf` is parsed instead). It is also possible to specify weights of DNS servers to balance the payload, e.g. diff --git a/doc/markdown/configuration/settings.md b/doc/markdown/configuration/settings.md index 31b175344..0f4a70af2 100644 --- a/doc/markdown/configuration/settings.md +++ b/doc/markdown/configuration/settings.md @@ -1,8 +1,8 @@ -# rspamd user settings +# Rspamd user settings ## Introduction -rspamd allows exceptional control over the settings which will apply to incoming messages. Each setting can define a set of custom metric weights, symbols or actions. An administrator can also skip spam checks for certain messages completely, if required. rspamd settings can be loaded as dynamic maps and updated automatically if a corresponding file or URL has changed since its last update. +rspamd allows exceptional control over the settings which will apply to incoming messages. Each setting can define a set of custom metric weights, symbols or actions. An administrator can also skip spam checks for certain messages completely, if required. Rspamd settings can be loaded as dynamic maps and updated automatically if a corresponding file or URL has changed since its last update. To load settings as a dynamic map, you can set 'settings' to a map string: diff --git a/doc/markdown/configuration/statistic.md b/doc/markdown/configuration/statistic.md index 18e870652..26b2b70e7 100644 --- a/doc/markdown/configuration/statistic.md +++ b/doc/markdown/configuration/statistic.md @@ -2,7 +2,7 @@ ## Introduction -Statistics is used by rspamd to define the `class` of message: either spam or ham. The overall algorithm is based on Bayesian theorem +Statistics is used by Rspamd to define the `class` of message: either spam or ham. The overall algorithm is based on Bayesian theorem that defines probabilities combination. In general, it defines the probability of that a message belongs to the specified class (namely, `spam` or `ham`) base on the following factors: @@ -11,13 +11,13 @@ base on the following factors: ## Statistics Architecture -However, rspamd uses more advanced techniques to combine probabilities, such as sparsed bigramms (OSB) and inverse chi-square distribution. +However, Rspamd uses more advanced techniques to combine probabilities, such as sparsed bigramms (OSB) and inverse chi-square distribution. The key idea of `OSB` algorithm is to use not merely single words as tokens but combinations of words weighted by theirs positions. This schema is displayed in the following picture: ![OSB algorithm](https://rspamd.com/img/rspamd-schemes.004.png "Rspamd OSB scheme") -The main disadvantage is the amount of tokens which is multiplied by size of window. In rspamd, we use a window of 5 tokens that means that +The main disadvantage is the amount of tokens which is multiplied by size of window. In Rspamd, we use a window of 5 tokens that means that the number of tokens is about 5 times larger than the amount of words. Statistical tokens are stored in statfiles which, in turn, are mapped to specific backends. This architecture is displayed in the following image: @@ -26,7 +26,7 @@ Statistical tokens are stored in statfiles which, in turn, are mapped to specifi ## Statistics Configuration -Starting from rspamd 1.0, we propose to use `sqlite3` as backed and `osb` as tokenizer. That also enables additional features, such as tokens normalization and +Starting from Rspamd 1.0, we propose to use `sqlite3` as backed and `osb` as tokenizer. That also enables additional features, such as tokens normalization and metainformation in statistics. The following configuration demonstrates the recommended statistics configuration: ~~~ucl @@ -63,11 +63,11 @@ classifier "bayes" { } ~~~ -It is also possible to organize per-user statistics using sqlite3 backend. However, you should ensure that rspamd is called at the -finally delivery stage (e.g. LDA mode) to avoid multi-recipients messages. In case of a multi-recipient message, rspamd would just use the -first recipient for user-based statistics which might be inappropriate for your configuration (however, rspamd preferes SMTP recipients over MIME ones and prioritize +It is also possible to organize per-user statistics using sqlite3 backend. However, you should ensure that Rspamd is called at the +finally delivery stage (e.g. LDA mode) to avoid multi-recipients messages. In case of a multi-recipient message, Rspamd would just use the +first recipient for user-based statistics which might be inappropriate for your configuration (however, Rspamd preferes SMTP recipients over MIME ones and prioritize the special LDA header called `Deliver-To` that can be appended by `-d` options for `rspamc`). To enable per-user statistics, just add `users_enabled = true` property -to the **classifier** configuration. You can use per-user and per-language statistics simulataneously. For both types of spearation, rspamd also +to the **classifier** configuration. You can use per-user and per-language statistics simulataneously. For both types of spearation, Rspamd also looks to the default language and default user's statistics allowing to have the common set of tokens shared for all users/languages. ## Using lua scripts for `per_user` classifier @@ -115,10 +115,10 @@ EOD ## Applying per-user and per-language statistics -From version 1.1, rspamd uses independent statistics for users and joint statistics for languages. That means the following: +From version 1.1, Rspamd uses independent statistics for users and joint statistics for languages. That means the following: -* If `per_user` is enabled then rspamd looks for users statistics **only** -* If `per_language` is enabled then rspamd looks for language specific statistics **plus** language independent statistics +* If `per_user` is enabled then Rspamd looks for users statistics **only** +* If `per_language` is enabled then Rspamd looks for language specific statistics **plus** language independent statistics It is different from 1.0 version where the second approach was used for both cases. @@ -215,7 +215,7 @@ Where the last number is priority used to distinguish master from slave. ## Autolearning -From version 1.1, rspamd supports autolearning for statfiles. Autolearning is applied after all rules are processed (including statistics) if and only if the same symbol has not been inserted. E.g. a message won't be learned as spam if `BAYES_SPAM` is already in the results of checking. +From version 1.1, Rspamd supports autolearning for statfiles. Autolearning is applied after all rules are processed (including statistics) if and only if the same symbol has not been inserted. E.g. a message won't be learned as spam if `BAYES_SPAM` is already in the results of checking. There are 3 possibilities to specify autolearning: diff --git a/doc/markdown/index.md b/doc/markdown/index.md index 1a74b7e7e..68955f8e3 100644 --- a/doc/markdown/index.md +++ b/doc/markdown/index.md @@ -1,16 +1,41 @@ -# Rspamd documentation project +# Rspamd documentation -## Introduction -Rspamd is a fast and advanced spam filtering system. It is based on an event-driven processing model which allows it to work with multiple messages simultaneously without blocking anywhere during message processing. Rspamd contains various modules shipped in the default distribution and allows extension with custom modules and rules written in [Lua](http://lua.org). +## Tutorials and introduction documents -Rspamd uses a complex estimation system based on a set of rules. Each of these rules has its own score and the final score of a message is defined as the sum of the scores that were true for that message. This approach is similar to other complex spam filtering systems, such as [SpamAssassin](http://spamassassin.apache.org). Rspamd also implements fuzzy logic, including fuzzy hashes and a statistics module, to process messages. +Here are the main introduction documents that are recommended for reading if you are going to use Rspamd in your mail system. -## Table of Contents +* **[Quick Start](quick_start.md)** - learn how to install, setup and perform initial configuring of Rspamd +* **[Upgrading](migration.md)** - the list of incompatible changes between versions of Rspamd +* **[Frequently asked questions](faq.md)** - common questions about Rspamd and Rmilter +* **[Migrating from SA](migrate_sa.md)** - the guide for those who wants to migrate an existing SpamAssassin system to Rspamd +* **[MTA integration](integration.md)** document describes how to integrate Rspamd into your mail infrastructure +* **[Creating your fuzzy storage](http://rspamd.com/doc/fuzzy_storage.html)** document provides information about how to make your own hashes storage and how to learn it efficiently -- [Tutorials](tutorials/) a collection of tutorial-like documents for rspamd -- [Architecture](architecture/) presents the architecture of rspamd and explains how spam filtering is performed -- [Rspamd configuration](configuration/) describes the principles of rspamd configuration -- [Modules](modules/) lists rspamd modules and defines their configuration attributes -- [Workers](workers/) describes worker processes that are implemented in rspamd -- [Lua API](lua/) explains how to extend rspamd with lua modules -- [Migration](migration.md) contains the list of incompatible changes between rspamd versions and recommendations on how to update your rspamd system +### Rspamd and Dovecot Antispam integration + +* [Training Rspamd with dovecot antispam plugin, part 1](https://kaworu.ch/blog/2014/03/25/dovecot-antispam-with-Rspamd/) - this tutorial describes how to train Rspamd automatically using the `antispam` pluging of the `dovecot` IMAP server +* [Training Rspamd with dovecot antispam plugin, part 2](https://kaworu.ch/blog/2015/10/12/dovecot-antispam-with-Rspamd-part2/) - continuation of the previous tutorial + +## Configuration + +This section contains documents about various configuration details. + +* **[General information](./configuration/index.md)** explains basic principles of Rspamd configuration +* **[Modules documentation](./modules/)** gives a detailed description of each Rspamd module +* **[Workers documentation](./workers/)** contains information about different Rspamd worker processes: scanners, controller, fuzzy storage and so on +* **[Users settings description](./configuration/settings.md)** could be useful if you need to setup per-user configuration or want process mail in different ways, for example, for inbound and outbound messages. + +## Architecture + +These documents are useful if you need to know details about Rspamd internals. + +* **[General information](./architecture/index.md)** provides an overview of the Rspamd architecture +* **[Protocol documentation](./architecture/protocol.md)** describes Rspamd protocol which is used to communicate with external tools, such as Rmilter or `rspamc` client utility + + +## Extending Rspamd + +This section contains documents about writing new rules for Rspamd and, in particular, Rspamd Lua API. + +* **[Writing Rspamd rules](./tutorials/writing_rules.md)** is a step-by-step guide that describes how to write rules for Rspamd +* **[LUA API reference](./lua/)** provides the extensive information about all LUA modules available in Rspamd diff --git a/doc/markdown/lua/index.md b/doc/markdown/lua/index.md index d003959d9..8015c6171 100644 --- a/doc/markdown/lua/index.md +++ b/doc/markdown/lua/index.md @@ -1,10 +1,10 @@ -# Rspamd lua API {#top} +# Rspamd Lua API {#top} -Rspamd lua api is a core part of rspamd functionality. Lua is used for writing rules and plugins in rspamd. There are several objects and libraries that simplify classifying of mail. +Lua api is a core part of Rspamd functionality. [Lua language](http://www.lua.org) is used for writing rules and plugins. -## Using lua API from rules {#luarules} +## Using Lua API from rules {#luarules} -Many lua rules are shipped with rspamd. They can be included to rspamd by using tag **lua** in rspamd.conf: +Many Lua rules are shipped with Rspamd. They can be included to Rspamd by using tag **lua** in Rspamd.conf: ~~~ucl lua = "$CONFDIR/lua/rspamd.lua" @@ -12,7 +12,7 @@ lua = "$CONFDIR/lua/rspamd.lua" ### Global configuration tables {#luaglobal} -While load of this file rspamd defines two global variables: +While load of this file Rspamd defines two global variables: - *config* - a global table of modules configuration. Here is a sample of usage of this table: ~~~lua @@ -70,7 +70,7 @@ classifiers['bayes'] = function(classifier, task, is_learn, is_spam) for _,st in pairs(classifier:get_statfiles()) do local st_l = st:get_param('language') if st_l and st_l == language then - -- Insert statfile with specified language + -- Insert statfile with specified language table.insert(selected, st) end end @@ -100,7 +100,7 @@ end ## Writing advanced rules {#luarules} -So by using these two tables it is possible to configure rules and metrics. Also note that it is possible to use any lua functions and rspamd libraries: +So by using these two tables it is possible to configure rules and metrics. Also note that it is possible to use any Lua functions and Rspamd libraries: ~~~lua -- Declare variable that contains regexp rule definition @@ -110,14 +110,14 @@ local rulebody = string.format('%s & !%s', '/re1/', '/re2') rspamd_logger.info('Loaded test rule: ' .. rulebody) ~~~ -Also it is possible to declare functions and use `closures` when defining rspamd rules: +Also it is possible to declare functions and use `closures` when defining Rspamd rules: ~~~lua -- Here is a sample of using closure function inside rule local function check_headers_tab(task, header_name) -- Extract raw headers from message local raw_headers = task:get_raw_header(header_name) - -- Make match of headers, that are separated with tabs, not spaces + -- Make match of headers, that are separated with tabs, not spaces if raw_headers then for _,rh in ipairs(raw_headers) do if rh['tab_separated'] then @@ -127,7 +127,7 @@ local function check_headers_tab(task, header_name) end end return false -end +end rspamd_config.HEADER_TAB_FROM_WHITELISTED = function(task) return check_headers_tab(task, "From") end rspamd_config.HEADER_TAB_TO_WHITELISTED = function(task) return check_headers_tab(task, "To") end @@ -137,15 +137,15 @@ rspamd_config.HEADER_TAB_DATE_WHITELISTED = function(task) return check_headers_ rspamd_config.R_EMPTY_IMAGE = { callback = function(task) local tp = task:get_text_parts() -- get text parts in a message - + for _,p in ipairs(tp) do -- iterate over text parts array using `ipairs` if p:is_html() then -- if the current part is html part local hc = p:get_html() -- we get HTML context local len = p:get_length() -- and part's length - + if len < 50 then -- if we have a part that has less than 50 bytes of text local images = hc:get_images() -- then we check for HTML images - + if images then -- if there are images for _,i in ipairs(images) do -- then iterate over images in the part if i['height'] + i['width'] >= 400 then -- if we have a large image @@ -169,19 +169,19 @@ rspamd_config.R_EMPTY_IMAGE = { } ~~~ -Using lua in rules provides many abilities to write complex mail filtering rules. +Using Lua in rules provides many abilities to write complex mail filtering rules. -## Writing lua plugins {#luaplugins} +## Writing Lua plugins {#luaplugins} -Plugins are more complex filters than ordinary rules. Plugins can have their own configuration parameters and multiple callbacks. Plugins can make DNS requests, read from rspamd maps and insert custom results. +Plugins are more complex filters than ordinary rules. Plugins can have their own configuration parameters and multiple callbacks. Plugins can make DNS requests, read from Rspamd maps and insert custom results. ### Structure of the typical plugin -Each rspamd plugin has a common structure: +Each Rspamd plugin has a common structure: - Registering configuration parameters - Reading configuration parameters and set up callbacks -- Callbacks that are called by rspamd during message processing +- Callbacks that are called by Rspamd during message processing Here is a simple plugin example: @@ -195,12 +195,12 @@ end -- Reading configuration -- Get all options for this plugin -local opts = rspamd_config:get_all_opt('sample') +local opts = Rspamd_config:get_all_opt('sample') if opts then if opts['config'] then - config_param = opts['config'] + config_param = opts['config'] -- Register callback - rspamd_config:register_symbol('some_symbol', sample_callback) + Rspamd_config:register_symbol('some_symbol', sample_callback) end end ~~~ @@ -209,10 +209,10 @@ This plugin uses global variable *rspamd_config* to extract configuration option ### Using DNS requests inside plugins -It is often required to make DNS requests for messages checks. Here is an example of making asynchronous DNS request from rspamd lua plugin: +It is often required to make DNS requests for messages checks. Here is an example of making asynchronous DNS request from Rspamd Lua plugin: ~~~lua --- Function-callback of rspamd rule +-- Function-callback of Rspamd rule local function symbol_cb(task) -- Task is now local variable @@ -224,37 +224,37 @@ local function symbol_cb(task) end end -- Resolve 'example.com' using primitives from the task passed - task:get_resolver():resolve_a(task:get_session(), task:get_mempool(), + task:get_resolver():resolve_a(task:get_session(), task:get_mempool(), 'example.com', dns_cb, 'sample string') end ~~~ -### Using maps from lua plugin +### Using maps from Lua plugin Maps hold dynamically loaded data like lists or ip trees. It is possible to use 3 types of maps: -* **radix_tree** stores ip addresses +* **radix_tree** stores ip addresses * **hash_map** stores plain strings (domains usually) -* **callback** call for a specified lua callback when a map is loaded or changed, map's content is passed to that callback as a parameter +* **callback** call for a specified Lua callback when a map is loaded or changed, map's content is passed to that callback as a parameter -Here is a sample of using maps from lua API: +Here is a sample of using maps from Lua API: ~~~lua -local rspamd_logger = require "rspamd_logger" +local Rspamd_logger = require "rspamd_logger" -- Add two maps in configuration section -local hash_map = rspamd_config:add_hash_map('file:///path/to/file', 'sample map') -local radix_tree = rspamd_config:add_radix_map('http://somehost.com/test.dat', 'sample ip map') -local generic_map = rspamd_config:add_map('file:///path/to/file', 'sample generic map', +local hash_map = Rspamd_config:add_hash_map('file:///path/to/file', 'sample map') +local radix_tree = Rspamd_config:add_radix_map('http://somehost.com/test.dat', 'sample ip map') +local generic_map = Rspamd_config:add_map('file:///path/to/file', 'sample generic map', function(str) -- This callback is called when a map is loaded or changed -- Str contains map content - rspamd_logger.info('Got generic map content: ' .. str) + Rspamd_logger.info('Got generic map content: ' .. str) end) local function sample_symbol_cb(task) -- Check whether hash map contains from address of message - if hash_map:get_key(task:get_from()) then + if hash_map:get_key(task:get_from()) then -- Check whether radix map contains client's ip if radix_map:get_key(task:get_from_ip_num()) then ... @@ -265,9 +265,9 @@ end ## Conclusions {#luaconclusion} -Lua plugins is a powerful tool for creating complex filters that can access practically all features of rspamd. Lua plugins can be used for writing custom rules and interact with rspamd in many ways, can use maps and make DNS requests. Rspamd is shipped with a couple of lua plugins that can be used as examples while writing your own plugins. +Lua plugins is a powerful tool for creating complex filters that can access practically all features of Rspamd. Lua plugins can be used for writing custom rules and interact with Rspamd in many ways, can use maps and make DNS requests. Rspamd is shipped with a couple of Lua plugins that can be used as examples while writing your own plugins. ## References {#luareference} - [Lua manual](http://www.lua.org/manual/5.2/) -- [Programming in lua](http://www.lua.org/pil/) +- [Programming in Lua](http://www.lua.org/pil/) diff --git a/doc/markdown/tutorials/migrate_sa.md b/doc/markdown/tutorials/migrate_sa.md index 6af9e8441..3ccc6de4a 100644 --- a/doc/markdown/tutorials/migrate_sa.md +++ b/doc/markdown/tutorials/migrate_sa.md @@ -1,34 +1,34 @@ # Migrating from SpamAssassin to Rspamd -This guide provides information for those who wants to migrate an existing system from [SpamAssassin](https://spamassassin.apache.org) to rspamd. You will find information about major differences between the spam filtering engines and how to deal with the transition process. +This guide provides information for those who wants to migrate an existing system from [SpamAssassin](https://spamassassin.apache.org) to Rspamd. You will find information about major differences between the spam filtering engines and how to deal with the transition process. -## Why migrate to rspamd +## Why migrate to Rspamd -rspamd runs **significantly faster** than SpamAssassin while providing approximately the same quality of filtering. However, if you don't care about the performance and resource consumption of your spam filtering engine you might still find rspamd useful because it has a simple but powerful web management system (WebUI). +rspamd runs **significantly faster** than SpamAssassin while providing approximately the same quality of filtering. However, if you don't care about the performance and resource consumption of your spam filtering engine you might still find Rspamd useful because it has a simple but powerful web management system (WebUI). On the other hand, if you have a lot of custom rules, or you use Pyzor/Razor/DCC, or you have some commercial 3rd party products that depend on SpamAssassin then you may not want to migrate. -In short: rspamd is for **speed**! +In short: Rspamd is for **speed**! ## What about dspam/spamoracle...? -You could also move from these projects to rspamd. You should bear in mind, however, that rspamd and SA are multi-factor spam filtering systems that use three main approaches to filter messages: +You could also move from these projects to Rspamd. You should bear in mind, however, that Rspamd and SA are multi-factor spam filtering systems that use three main approaches to filter messages: * Content filtering - static rules that are designed to find known bad patterns in messages (usually regexp or other custom rules) * Dynamic lists - DNS or reputation lists that are used to filter known bad content, such as abused IP addresses or URL domains * Statistical filters - which learn to distinguish spam and ham messages -`dspam`, `spamoracle` and others usually implement the third approach, only providing statistical filtering. This method is quite powerful but it can cause false-positives and is not very suitable for multi-user environments. rspamd and SA, in contrast, are designed for systems with many users. rspamd, in particular, was written for a very large system with more than 40 million users and about 10 million emails per hour. +`dspam`, `spamoracle` and others usually implement the third approach, only providing statistical filtering. This method is quite powerful but it can cause false-positives and is not very suitable for multi-user environments. Rspamd and SA, in contrast, are designed for systems with many users. Rspamd, in particular, was written for a very large system with more than 40 million users and about 10 million emails per hour. ## Before you start There are a couple of things you need to know before transition: -1. rspamd does not support SpamAssassin statistics so you'd need to **train** your filter from scratch with spam and ham samples (or install the [pre-built statistics](https://rspamd.com/rspamd_statistics/)). rspamd uses a different statistical engine - called [OSB-Bayes](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf) - which is intended to be more precise than SA's 'naive' Bayes classifier -2. rspamd uses `Lua` for plugins and rules, so basic knowledge of this language is more than useful for playing with rspamd; however, Lua is very simple and can be learned [very quickly](http://lua-users.org/wiki/LuaTutorial) -3. rspamd uses the `HTTP` protocol to communicate with the MTA or milter, so SA native milters might not communicate with rspamd. There is some limited support of the SpamAssassin protocol, though some commands are not supported, in particular those which require copying of data between scanner and milter. More importantly, `Length`-less messages are not supported by rspamd as they completely break HTTP semantics and will never be supported. To achieve the same functionality, a dedicated scanner could use, e.g. HTTP `chunked` encoding. -4. rspamd is **NOT** intended to work with blocking libraries or services, hence, something like `mysql` or `postgresql` will likely not be supported -5. rspamd is developing quickly so you should be aware that there might be some incompatible changes between major versions - they are usually listed in the [migration](../migration.md) section of the site. +1. Rspamd does not support SpamAssassin statistics so you'd need to **train** your filter from scratch with spam and ham samples (or install the [pre-built statistics](https://rspamd.com/rspamd_statistics/)). Rspamd uses a different statistical engine - called [OSB-Bayes](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf) - which is intended to be more precise than SA's 'naive' Bayes classifier +2. Rspamd uses `Lua` for plugins and rules, so basic knowledge of this language is more than useful for playing with Rspamd; however, Lua is very simple and can be learned [very quickly](http://lua-users.org/wiki/LuaTutorial) +3. Rspamd uses the `HTTP` protocol to communicate with the MTA or milter, so SA native milters might not communicate with Rspamd. There is some limited support of the SpamAssassin protocol, though some commands are not supported, in particular those which require copying of data between scanner and milter. More importantly, `Length`-less messages are not supported by Rspamd as they completely break HTTP semantics and will never be supported. To achieve the same functionality, a dedicated scanner could use, e.g. HTTP `chunked` encoding. +4. Rspamd is **NOT** intended to work with blocking libraries or services, hence, something like `mysql` or `postgresql` will likely not be supported +5. Rspamd is developing quickly so you should be aware that there might be some incompatible changes between major versions - they are usually listed in the [migration](../migration.md) section of the site. 6. Unlike SA where there are only `spam` and `ham` results, Rspamd supports five levels of messages called `actions`: + `no action` - ham message + `greylist` - turn on adaptive greylisting (which is also used on higher levels) @@ -36,19 +36,19 @@ There are a couple of things you need to know before transition: + `rewrite subject` - rewrite subject to `*** SPAM *** original subject` + `reject` - ultimately reject message -Each action can have its own score limit which could also be modified by a user's settings. rspamd assumes the following order of actions: `no action` <= `greylist` <= `add header` <= `rewrite subject` <= `reject`. +Each action can have its own score limit which could also be modified by a user's settings. Rspamd assumes the following order of actions: `no action` <= `greylist` <= `add header` <= `rewrite subject` <= `reject`. -Actions are **NOT** performed by rspamd itself - they are just recommendations for the MTA agent, rmilter for example, that performs the necessary actions such as adding headers or rejecting mail. +Actions are **NOT** performed by Rspamd itself - they are just recommendations for the MTA agent, rmilter for example, that performs the necessary actions such as adding headers or rejecting mail. -SA `spam` is almost equal to the rspamd `add header` action in the default setup. With this action, users will be able to check messages in their `Junk` folder, which is usually a desired behaviour. +SA `spam` is almost equal to the Rspamd `add header` action in the default setup. With this action, users will be able to check messages in their `Junk` folder, which is usually a desired behaviour. -## First steps with rspamd +## First steps with Rspamd -To install rspamd, I recommend using one of the [official packages](https://rspamd.com/downloads.html) that are available for many popular platforms. If you'd like to have more features then you can consider the `experimental` branch of packages, while if you would like to have more stability then you can select the `stable` branch. However, normally even the `experimental` branch is stable enough for production use, and bugs are fixed more quickly in the `experimental` branch. +To install Rspamd, I recommend using one of the [official packages](https://rspamd.com/downloads.html) that are available for many popular platforms. If you'd like to have more features then you can consider the `experimental` branch of packages, while if you would like to have more stability then you can select the `stable` branch. However, normally even the `experimental` branch is stable enough for production use, and bugs are fixed more quickly in the `experimental` branch. ## General SpamAssassin rules -For those who have a lot of custom rules, there is good news: rspamd supports a certain set of SpamAssassin rules via a special [plugin](../modules/spamassassin.md) that allows **direct** loading of SA rules into rspamd. You just need to specify your SA configuration files in the plugin configuration: +For those who have a lot of custom rules, there is good news: Rspamd supports a certain set of SpamAssassin rules via a special [plugin](../modules/spamassassin.md) that allows **direct** loading of SA rules into Rspamd. You just need to specify your SA configuration files in the plugin configuration: ~~~ucl spamassassin { @@ -57,20 +57,20 @@ spamassassin { } ~~~ -On the other hand, if you don't have a lot of custom rules and primarily use the default ruleset then you shouldn't use this plugin: many SA rules are already implemented natively in rspamd so you won't get any benefit from including such rules from SA. +On the other hand, if you don't have a lot of custom rules and primarily use the default ruleset then you shouldn't use this plugin: many SA rules are already implemented natively in Rspamd so you won't get any benefit from including such rules from SA. ## Integration -If you have your SA up and running it is usually possible to switch the system to rspamd using the existing tools. However, please check the [integration document](https://rspamd.com/doc/integration.html) for further details. +If you have your SA up and running it is usually possible to switch the system to Rspamd using the existing tools. However, please check the [integration document](https://rspamd.com/doc/integration.html) for further details. ## Statistics -rspamd statistics are not compatible with SA as rspamd uses a more advanced statistics algorithm, described in the following [article](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf), so please bear in mind that you need to **relearn** your statistics. This can be done, for example, by using the `rspamc` command: assuming that you have your messages in separate files (e.g. `maildir` format), placed in directories `spam` and `ham`: +rspamd statistics are not compatible with SA as Rspamd uses a more advanced statistics algorithm, described in the following [article](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf), so please bear in mind that you need to **relearn** your statistics. This can be done, for example, by using the `rspamc` command: assuming that you have your messages in separate files (e.g. `maildir` format), placed in directories `spam` and `ham`: rspamc learn_spam spam/ rspamd learn_ham ham/ -(You will need rspamd up and running to use these commands.) +(You will need Rspamd up and running to use these commands.) ### Learning using mail interface diff --git a/doc/markdown/tutorials/writing_rules.md b/doc/markdown/tutorials/writing_rules.md index 1941b41cd..f7e78bbf5 100644 --- a/doc/markdown/tutorials/writing_rules.md +++ b/doc/markdown/tutorials/writing_rules.md @@ -1,17 +1,17 @@ -# Writing rspamd rules +# Writing Rspamd rules -In this tutorial, I describe how to create new rules for rspamd - both Lua and regexp rules. +In this tutorial, I describe how to create new rules for Rspamd - both Lua and regexp rules. ## Introduction -Rules are the essential part of a spam filtering system and rspamd ships with some prepared rules by default. However, if you run your own system you might want to have your own rules for better spam filtering or a better false positives rate. Rules are usually written in `Lua`, where you can specify both custom logic and generic regular expressions. +Rules are the essential part of a spam filtering system and Rspamd ships with some prepared rules by default. However, if you run your own system you might want to have your own rules for better spam filtering or a better false positives rate. Rules are usually written in `Lua`, where you can specify both custom logic and generic regular expressions. ## Configuration files -Since rspamd ships with its own rules it is a good idea to store your custom rules and configuration in separate files to avoid clashing with the default rules which might change from version to version. There are some possibilities to achieve this: +Since Rspamd ships with its own rules it is a good idea to store your custom rules and configuration in separate files to avoid clashing with the default rules which might change from version to version. There are some possibilities to achieve this: - Local rules in Lua should be stored in the file named `${CONFDIR}/lua/rspamd.local.lua` where `${CONFDIR}` is the directory where your configuration files are placed (e.g. `/etc/rspamd`, or `/usr/local/etc/rspamd` for some systems) -- Local configuration that **adds** options to rspamd should be placed in `${CONFDIR}/rspamd.conf.local` +- Local configuration that **adds** options to Rspamd should be placed in `${CONFDIR}/rspamd.conf.local` - Local configuration that **overrides** the default settings should be placed in `${CONFDIR}/rspamd.conf.override` Lua local configuration can be used to both override and extend: @@ -99,14 +99,14 @@ section "name" { } ~~~ -For each individual configuration file shipped with rspamd, there are two special includes: +For each individual configuration file shipped with Rspamd, there are two special includes: .include(try=true,priority=1) "$CONFDIR/local.d/config.conf" .include(try=true,priority=1) "$CONFDIR/override.d/config.conf" -Therefore, you can either extend (using local.d) or ultimately override (using override.d) any settings in the rspamd configuration. +Therefore, you can either extend (using local.d) or ultimately override (using override.d) any settings in the Rspamd configuration. -For example, let's override some default symbols shipped with rspamd. To do that we can create and edit `etc/rspamd/local.d/metrics.conf`: +For example, let's override some default symbols shipped with Rspamd. To do that we can create and edit `etc/rspamd/local.d/metrics.conf`: symbol "BLAH" { score = 20.0; @@ -130,12 +130,12 @@ as this will set the other actions (`add_header` and `greylist`) as undefined. ## Writing rules -There are two types of rules that are normally defined by rspamd: +There are two types of rules that are normally defined by Rspamd: - `Lua` rules: code in written in Lua - `Regexp` rules: regular expressions and combinations of regular expressions to match specific patterns -Lua rules are useful for some complex tasks: check DNS, query redis or HTTP, examine some task-specific details. Regexp rules are useful since they are heavily optimized by rspamd (especially when `hyperscan` is enabled) and allow matching custom patterns in headers, urls, text parts and even the entire message body. +Lua rules are useful for some complex tasks: check DNS, query redis or HTTP, examine some task-specific details. Regexp rules are useful since they are heavily optimized by Rspamd (especially when `hyperscan` is enabled) and allow matching custom patterns in headers, urls, text parts and even the entire message body. ### Rule weights @@ -187,7 +187,7 @@ rspamd_config.MY_LUA_SYMBOL = { ## Regexp rules -Regexp rules are executed by the `regexp` module of rspamd. You can find a detailed description of the syntax in [the regexp module documentation](../modules/regexp.md) +Regexp rules are executed by the `regexp` module of Rspamd. You can find a detailed description of the syntax in [the regexp module documentation](../modules/regexp.md) Here are some hints to maximise performance of your regexp rules: @@ -196,7 +196,7 @@ Here are some hints to maximise performance of your regexp rules: * If you **really** need to match the whole messages, then you might consider using the [trie](../modules/trie.md) module as it is significantly faster * Avoid complex regexps, avoid backtracing, avoid negative groups `(?!)`, avoid capturing patterns (replace with `(?:)`), avoid potentially empty patterns, e.g. `/^.*$/` -Following these rules allows you to create fast but efficient rules. To add regexp rules you should use the `config` global table that is defined in any Lua file used by rspamd: +Following these rules allows you to create fast but efficient rules. To add regexp rules you should use the `config` global table that is defined in any Lua file used by Rspamd: ~~~lua config['regexp'] = {} -- Remove all regexp rules (including internal ones) @@ -330,7 +330,7 @@ rspamd_config.SUBJ_ALL_CAPS = { } ~~~ -You can also access HTTP headers, urls and other useful properties of rspamd tasks. Moreover, you can use global convenience modules exported by rspamd, such as [rspamd_util](../lua/util.md) or [rspamd_logger](../lua/logger.md) by requiring them in your rules: +You can also access HTTP headers, urls and other useful properties of Rspamd tasks. Moreover, you can use global convenience modules exported by Rspamd, such as [rspamd_util](../lua/util.md) or [rspamd_logger](../lua/logger.md) by requiring them in your rules: ~~~lua rspamd_config.SUBJ_ALL_CAPS = { @@ -342,7 +342,7 @@ rspamd_config.SUBJ_ALL_CAPS = { } ~~~ -## rspamd symbols +## Rspamd symbols rspamd rules fall under three categories: @@ -350,7 +350,7 @@ rspamd rules fall under three categories: 2. Filters - run normally 3. Post-filters - run after all checks -The most common type of rules are generic filters. Each filter is basically a callback that is executed by rspamd at some time, along with an optional symbol name associated with this callback. In general, there are three options to register symbols: +The most common type of rules are generic filters. Each filter is basically a callback that is executed by Rspamd at some time, along with an optional symbol name associated with this callback. In general, there are three options to register symbols: * register callback and associated symbol * register just a plain callback @@ -365,7 +365,7 @@ The last option is useful when you have a single callback but with different pos `nominal_weight` is used to define priority and the initial score multiplier. It should usually be `1.0` for normal symbols and `-1.0` for symbols with negative scores that should be executed before other symbols. Here is an example of registering one callback and a couple of virtual symbols used in the [dmarc](../modules/dmarc.md) module: ~~~lua -local id = rspamd_config:register_callback_symbol('DMARC_CALLBACK', 1.0, +local id = Rspamd_config:register_callback_symbol('DMARC_CALLBACK', 1.0, dmarc_callback) rspamd_config:register_virtual_symbol('DMARC_POLICY_ALLOW', -1, id) rspamd_config:register_virtual_symbol('DMARC_POLICY_REJECT', 1, id) @@ -418,13 +418,13 @@ if rule['score'] then rule['group'] = 'whitelist' end rule['name'] = symbol - rspamd_config:set_metric_symbol(rule) + Rspamd_config:set_metric_symbol(rule) end ~~~ ## Difference between `config` and `rspamd_config` -It might be confusing that there are two variables with a common meaning. (This is a legacy of older versions of rspamd). However, currently `rspamd_config` represents an object that can have many purposes: +It might be confusing that there are two variables with a common meaning. (This is a legacy of older versions of Rspamd). However, currently `rspamd_config` represents an object that can have many purposes: * Get configuration options: @@ -435,7 +435,7 @@ rspamd_config:get_all_opts('section') * Add maps: ~~~lua -rule['map'] = rspamd_config:add_kv_map(rule['domains'], +rule['map'] = Rspamd_config:add_kv_map(rule['domains'], "Whitelist map for " .. symbol) ~~~ @@ -475,7 +475,7 @@ There is a strict order of configuration application: ## Rules check order -Rules in rspamd are checked in the following order: +Rules in Rspamd are checked in the following order: 1. **Pre-filters**: checked every time and can stop all further processing by calling `task:set_pre_result()` 2. **All symbols***: can depend on each other by calling `rspamd_config:add_dependency(from, to)` diff --git a/doc/markdown/workers/index.md b/doc/markdown/workers/index.md index 24208cebc..55857a73d 100644 --- a/doc/markdown/workers/index.md +++ b/doc/markdown/workers/index.md @@ -3,11 +3,11 @@ Rspamd defines several types of worker processes. Each type is designed for its specific purpose, for example to scan mail messages, to perform control actions, such as learning or statistic grabbing. There is also flexible worker type named `lua` worker that allows -to run any lua script as rspamd worker providing proxy from rspamd lua API. +to run any lua script as Rspamd worker providing proxy from Rspamd lua API. ## Worker types -Currently rspamd defines the following worker types: +Currently Rspamd defines the following worker types: - [normal](normal.md): this worker is designed to scan mail messages - [controller](controller.md): this worker performs configuration actions, such as @@ -63,7 +63,7 @@ bind_socket = "*v4:11333"; # any ipv4 address bind_socket = "*v6:11333"; # any ipv6 address ~~~ -Moreover, you can specify systemd sockets if rspamd is invoked by systemd: +Moreover, you can specify systemd sockets if Rspamd is invoked by systemd: ~~~ucl bind_socket = "systemd:1"; # the first socket passed by systemd throught environment @@ -75,7 +75,7 @@ For unix sockets, it is also possible to specify owner and mode using this synta bind_socket = "/tmp/rspamd.sock mode=0666 owner=user"; ~~~ -Without owner and mode, rspamd uses the active user as owner (e.g. if started by root, +Without owner and mode, Rspamd uses the active user as owner (e.g. if started by root, then `root` is used) and `0644` as access mask. Please mention that you need to specify **octal** number for mode, namely prefixed by a zero. Otherwise, modes like `666` will produce a weird result. -- 2.39.5