From: Vsevolod Stakhov Date: Sat, 9 Jul 2016 10:17:18 +0000 (+0100) Subject: [Doc] Documentation now lives in rspamd.com repo X-Git-Tag: 1.3.0~129 X-Git-Url: https://source.dussan.org/?a=commitdiff_plain;h=14803e9faeefeee69e97902573f3e367ceaf9744;p=rspamd.git [Doc] Documentation now lives in rspamd.com repo --- diff --git a/doc/markdown/architecture/index.md b/doc/markdown/architecture/index.md deleted file mode 100644 index 710a21064..000000000 --- a/doc/markdown/architecture/index.md +++ /dev/null @@ -1,106 +0,0 @@ -# Rspamd architecture - -## Introduction - -Rspamd is a universal spam filtering system based on an event-driven processing model, which means that Rspamd is not intended to block anywhere in the code. To process messages Rspamd uses a set of `rules`. Each `rule` is a symbolic name associated with a message property. For example, we can define the following rules: - -- `SPF_ALLOW` - means that a message is validated by SPF; -- `BAYES_SPAM` - means that a message is statistically considered as spam; -- `FORGED_OUTLOOK_MID` - message ID seems to be forged for the Outlook MUA. - -Rules are defined by [modules](../modules/). If there is a module, for example, that performs SPF checks it may define several rules according to SPF policy: - -- `SPF_ALLOW` - a sender is allowed to send messages for this domain; -- `SPF_DENY` - a sender is denied by SPF policy; -- `SPF_SOFTFAIL` - there is no affinity defined by SPF policy. - -Rspamd supports two main types of modules: internal modules written in C and external modules written in Lua. There is no real difference between the two types with the exception that C modules are embedded and can be enabled in a `filters` attribute in the `options` section of the config: - -~~~ucl -options { - filters = "regexp,surbl,spf,dkim,fuzzy_check,chartable,email"; - ... -} -~~~ - -## Protocol - -Rspamd uses the HTTP protocol for all operations. This protocol is described in the [protocol section](protocol.md). - -## Metrics - -Rules in Rspamd define a logic of checks, but it is required to set up weights for each rule. (For Rspamd, weight means `significance`.) Rules with a greater absolute value of weight are considered more important. The weight of rules is defined in `metrics`. Each metric is a set of grouped rules with specific weights. For example, we may define the following weights for our SPF rules: - -- `SPF_ALLOW`: -1 -- `SPF_DENY`: 2 -- `SPF_SOFTFAIL`: 0.5 - -Positive weights mean that this rule increases a messages 'spammyness', while negative weights mean the opposite. - -### Rules scheduler - -To avoid unnecessary checks Rspamd uses a scheduler of rules for each message. If a message is considered as definite spam then further checks are not performed. This scheduler is rather naive and it performs the following logic: - -- select negative rules *before* positive ones to prevent false positives; -- prefer rules with the following characteristics: - - frequent rules; - - rules with more weight; - - faster rules - -These optimizations can filter definite spam more quickly than a generic queue. - -Since Rspamd-0.9 there are further optimizations for rules and expressions that are described generally in the [following presentation](http://highsecure.ru/ast-rspamd.pdf). - -## Actions - -Another important property of metrics is their actions set. This set defines recommended actions for a message if it reaches a certain score defined by all rules which have been triggered. Rspamd defines the following actions: - -- `No action`: a message is likely to be ham; -- `Greylist`: greylist a message if it is not certainly ham; -- `Add header`: a message is likely spam, so add a specific header; -- `Rewrite subject`: a message is likely spam, so rewrite its subject; -- `Reject`: a message is very likely spam, so reject it completely - -These actions are just recommendations for the MTA and are not to be strictly followed. For all actions that are greater or equal than `greylist` it is recommended to perform explicit greylisting. `Add header` and `rewrite subject` actions are very close in semantics and are both considered as probable spam. `Reject` is a strong rule which usually means that a message should be really rejected by the MTA. The triggering score for these actions should be specified according to their logic priorities. If two actions have the same weight, the result is unspecified. - -## Rules weight - -The weight of rules is not necessarily constant. For example, for statistics rules we have no certain confidence if a message is spam or not; instead we have a measure of probability. To allow fuzzy rules weight, Rspamd supports `dynamic weights`. Generally, it means that a rule may add a dynamic range from 0 to a defined weight in the metric. So if we define the symbol `BAYES_SPAM` with a weight of 5.0, then this rule can add a resulting symbol with a weight from 0 to 5.0. To distribute values, Rspamd uses a form of Sigma function to provide a fair distribution curve. The majority of Rspamd rules, with the exception of fuzzy rules, use static weights. - -## Statistics - -Rspamd uses statistic algorithms to precisely calculate the final score of a message. Currently, the only algorithm defined is OSB-Bayes. You can find details of this algorithm in the following [paper](http://osbf-lua.luaforge.net/papers/osbf-eddc.pdf). Rspamd uses a window size of 5 words in its classification. During the classification procedure, Rspamd splits a message into a set of tokens. Tokens are separated by punctuation or whitespace characters. Short tokens (less than 3 symbols) are ignored. For each token, Rspamd calculates two non-cryptographic hashes used subsequently as indices. All these tokens are stored in different statistics backends (mmapped files, SQLite3 database or Redis server). Currently, the recommended backend for statistics is `Redis`. - -## Running rspamd - -There are several command-line options that can be passed to rspamd. All of them can be displayed by passing the `--help` argument. - -All options are optional: by default rspamd will try to read the `etc/rspamd.conf` config file and run as a daemon. Also there is a test mode that can be turned on by passing the `-t` argument. In test mode, rspamd reads the config file and checks its syntax. If a configuration file is OK, the exit code is zero. Test mode is useful for testing new config files without restarting rspamd. - -## Managing rspamd using signals - -It is important to note that all user signals should be sent to the rspamd main process and not to its children (as for child processes these signals can have other meanings). You can identify the main process: - -- by reading the pidfile: - - $ cat pidfile - -- by getting process info: - - $ ps auxwww | grep rspamd - nobody 28378 0.0 0.2 49744 9424 rspamd: main process - nobody 64082 0.0 0.2 50784 9520 rspamd: worker process - nobody 64083 0.0 0.3 51792 11036 rspamd: worker process - nobody 64084 0.0 2.7 158288 114200 rspamd: controller process - nobody 64085 0.0 1.8 116304 75228 rspamd: fuzzy storage - - $ ps auxwww | grep rspamd | grep main - nobody 28378 0.0 0.2 49744 9424 rspamd: main process - -After getting the pid of the main process it is possible to manage rspamd with signals, as follows: - -- `SIGHUP` - restart rspamd: reread config file, start new workers (as well as controller and other processes), stop accepting connections by old workers, reopen all log files. Note that old workers would be terminated after one minute which should allow processing of all pending requests. All new requests to rspamd will be processed by the newly started workers. -- `SIGTERM` - terminate rspamd. -- `SIGUSR1` - reopen log files (useful for log file rotation). - -These signals may be used in rc-style scripts. Restarting of rspamd is performed softly: no connections are dropped and if a new config is incorrect then the old config is used. diff --git a/doc/markdown/architecture/protocol.md b/doc/markdown/architecture/protocol.md deleted file mode 100644 index 81d10d67b..000000000 --- a/doc/markdown/architecture/protocol.md +++ /dev/null @@ -1,154 +0,0 @@ -# Rspamd protocol - -## Protocol basics - -Rspamd uses the HTTP protocol, either version 1.0 or 1.1. (There is also a compatibility layer described further in this document.) Rspamd defines some headers which allow the passing of extra information about a scanned message, such as envelope data, IP address or SMTP SASL authentication data, etc. Rspamd supports normal and chunked encoded HTTP requests. - -## Rspamd HTTP request - -Rspamd encourages the use of the HTTP protocol since it is standard and can be used by every programming language without the use of exotic libraries. A typical HTTP request looks like the following: - - POST /check HTTP/1.0 - Content-Length: 26969 - From: smtp@example.com - Pass: all - Ip: 95.211.146.161 - Helo: localhost.localdomain - Hostname: localhost - - - -You can also use chunked encoding that allows streamlined data transfer which is useful if you don't know the length of a message. - -### HTTP request - -Normally, you should just use '/check' here. However, if you want to communicate with the controller then you might want to use controllers commands. - -(TODO: write this part) - -### HTTP headers - -To avoid unnecessary work, Rspamd allows an MTA to pass pre-processed data about the message by using either HTTP headers or a JSON control block (described further in this document). Rspamd supports the following non-standard HTTP headers: - -| Header | Description | -| :-------------- | :-------------------------------- | -| **Deliver-To:** | Defines actual delivery recipient of message. Can be used for personalized statistics and for user specific options. | -| **IP:** | Defines IP from which this message is received. | -| **Helo:** | Defines SMTP helo | -| **Hostname:** | Defines resolved hostname | -| **From:** | Defines SMTP mail from command data | -| **Queue-Id:** | Defines SMTP queue id for message (can be used instead of message id in logging). | -| **Rcpt:** | Defines SMTP recipient (there may be several `Rcpt` headers) | -| **Pass:** | If this header has `all` value, all filters would be checked for this message. | -| **Subject:** | Defines subject of message (is used for non-mime messages). | -| **User:** | Defines SMTP user. | -| **Message-Length:** | Defines the length of message excluding the control block. | - -Controller also defines certain headers: - -(TODO: write this part) - -Standard HTTP headers, such as `Content-Length`, are also supported. - -## Rspamd HTTP reply - -Rspamd reply is encoded in `JSON`. Here is a typical HTTP reply: - - HTTP/1.1 200 OK - Connection: close - Server: rspamd/0.9.0 - Date: Mon, 30 Mar 2015 16:19:35 GMT - Content-Length: 825 - Content-Type: application/json - -~~~json -{ - "default": { - "is_spam": false, - "is_skipped": false, - "score": 5.2, - "required_score": 7, - "action": "add header", - "DATE_IN_PAST": { - "name": "DATE_IN_PAST", - "score": 0.1 - }, - "FORGED_SENDER": { - "name": "FORGED_SENDER", - "score": 5 - }, - "TEST": { - "name": "TEST", - "score": 100500 - }, - "FUZZY_DENIED": { - "name": "FUZZY_DENIED", - "score": 0, - "options": [ - "1: 1.00 / 1.00", - "1: 1.00 / 1.00" - ] - }, - "HFILTER_HELO_5": { - "name": "HFILTER_HELO_5", - "score": 0.1 - } - }, - "urls": [ - "www.example.com", - "another.example.com" - ], - "emails": [ - "user@example.com" - ], - "message-id": "4E699308EFABE14EB3F18A1BB025456988527794@example" -} -~~~ - -For convenience, the reply is LINTed using [JSONLint](http://jsonlint.com). The actual reply is compressed for speed. - -The reply can be treated as a JSON object where keys are metric names (namely `default`) and values are objects that represent metrics. - -Each metric has the following fields: - -* `is_spam` - boolean value that indicates whether a message is spam -* `is_skipped` - boolean flag that is `true` if a message has been skipped due to settings -* `score` - floating point value representing the effective score of message -* `required_score` - floating point value meaning the threshold value for the metric -* `action` - recommended action for a message: - - `no action` - message is likely ham; - - `greylist` - message should be greylisted; - - `add header` - message is suspicious and should be marked as spam - - `rewrite subject` - message is suspicious and should have subject rewritten - - `soft reject` - message should be temporary rejected (for example, due to rate limit exhausting) - - `reject` - message should be rejected as spam - -Additionally, metric contains all symbols added during a message's processing, indexed by symbol names. - -Additional keys which may be in the reply include: - -* `subject` - if action is `rewrite subject` this value defines the desired subject for a message -* `urls` - a list of URLs found in a message (only hostnames) -* `emails` - a list of emails found in a message -* `message-id` - ID of message (useful for logging) -* `messages` - array of optional messages added by Rspamd filters (such as `SPF`) - -## Rspamd JSON control block - -Since Rspamd version 0.9 it is also possible to pass additional data by prepending a JSON control block to a message. So you can use either headers or a JSON block to pass data from the MTA to Rspamd. - -To use a JSON control block, you need to pass an extra header called `Message-Length` to Rspamd. This header should be equal to the size of the message **excluding** the JSON control block. Therefore, the size of the control block is equal to `Content-Length - Message-Length`. Rspamd assumes that a message starts immediately after the control block (with no extra CRLF). This method is equally compatible with streaming transfer, however even if you are not specifying `Content-Length` you are still required to specify `Message-Length`. - -Here is an example of a JSON control block: - -~~~json -{ - "from": "smtp@example.com", - "pass_all": "true", - "ip": "95.211.146.161", - "helo": "localhost.localdomain", - "hostname": "localhost" -} -~~~ - -Moreover, [UCL](https://github.com/vstakhov/libucl) JSON extensions and syntax conventions are also supported inside the control block. diff --git a/doc/markdown/configuration/composites.md b/doc/markdown/configuration/composites.md deleted file mode 100644 index 3e4596399..000000000 --- a/doc/markdown/configuration/composites.md +++ /dev/null @@ -1,114 +0,0 @@ -# Rspamd composite symbols - -## Introduction - -Rspamd composites are used to combine rules and create more complex rules. Composite rules are defined by `composite` keys. The value of the key should be an object that defines the composite's name and value, which is the combination of rules in a joint expression. - -For example, you can define a composite that is added when two specific symbols are found: - -~~~ucl -composite { - name = "TEST_COMPOSITE"; - expression = "SYMBOL1 and SYMBOL2"; -} -~~~ - -In this case, if a message has both `SYMBOL1` and `SYMBOL2` then they are replaced by symbol `TEST_COMPOSITE`. The weights of `SYMBOL1` and `SYMBOL2` are subtracted from the metric accordingly. - -## Composite expressions - -You can use the following operations in a composite expression: - -* `AND` `&` - matches true only if both operands are true -* `OR` `|` - matches true if any operands are true -* `NOT` `!` - matches true if operand is false - -You also can use braces to define priorities. Otherwise operators are evaluated from left to right. For example: - -~~~ucl -composite { - name = "TEST"; - expression = "SYMBOL1 and SYMBOL2 and ( not SYMBOL3 | not SYMBOL4 | not SYMBOL5 )"; -} -~~~ - -Composite rule can include other composites in the body. There is no restriction on definition order: - -~~~ucl -composite { - name = "TEST1"; - expression = "SYMBOL1 AND TEST2"; -} -composite { - name = "TEST2"; - expression = "SYMBOL2 OR NOT SYMBOL3"; -} -~~~ - -Composites should not be recursive; this is normally detected by Rspamd. - -## Composite weight rules - -Composites can record symbols in a metric or record their weights. That could be used to create non-captive composites. For example, you have symbol `A` and `B` with weights `W_a` and `W_b` and a composite `C` with weight `W_c`. - -* If `C` is `A & B` then if rule `A` and rule `B` matched then these symbols are *removed* and their weights are removed as well, leading to a single symbol `C` with weight `W_c`. -* If `C` is `-A & B`, then rule `A` is preserved, but the symbol `C` is inserted. The weight of `A` is preserved as well, so the total weight of `-A & B` will be `W_a + W_c`. -* If `C` is `~A & B`, then rule `A` is *removed* but its weight is *preserved*, - leading to a single symbol `C` with weight `W_a + W_c` - -When you have multiple composites which include the same symbol and a composite wants to remove the symbol and another composite wants to preserve it, then the symbol is preserved by default. Here are some more examples: - -~~~ucl -composite "COMP1" { - expression = "BLAH || !DATE_IN_PAST"; -} -composite "COMP2" { - expression = "!BLAH || DATE_IN_PAST"; -} -composite "COMP3" { - expression = "!BLAH || -DATE_IN_PAST"; -} -~~~ - -Both `BLAH` and `DATE_IN_PAST` exist in the message's check results. However, `COMP3` wants to preserve `DATE_IN_PAST` so it will be saved in the output. - -If we rewrite the previous example but replace `-` with `~` then `DATE_IN_PAST` will be removed (however, its weight won't be removed): - -~~~ucl -composite "COMP1" { - expression = "BLAH || !DATE_IN_PAST"; -} -composite "COMP2" { - expression = "!BLAH || DATE_IN_PAST"; -} -composite "COMP3" { - expression = "!BLAH || ~DATE_IN_PAST"; -} -~~~ - -When we want to remove a symbol, despite other composites combinations, it is possible to add the prefix `^` to the symbol: - -~~~ucl -composite "COMP1" { - expression = "BLAH || !DATE_IN_PAST"; -} -composite "COMP2" { - expression = "!BLAH || ^DATE_IN_PAST"; -} -composite "COMP3" { - expression = "!BLAH || -DATE_IN_PAST"; -} -~~~ - -In this example `COMP3` wants to save `DATE_IN_PAST` once again, however `COMP2` overrides this and removes `DATE_IN_PAST`. - -## Composites with symbol groups - -It is possible to include a group of symbols in a composite rule. This effectively means **any** symbol of the specified group: - -~~~ucl -composite { - name = "TEST2"; - expression = "SYMBOL2 && !g:mua"; -} -~~~ diff --git a/doc/markdown/configuration/index.md b/doc/markdown/configuration/index.md deleted file mode 100644 index f1c49aa4c..000000000 --- a/doc/markdown/configuration/index.md +++ /dev/null @@ -1,52 +0,0 @@ -# Rspamd configuration - -Rspamd uses the Universal Configuration Language (UCL) for its configuration. The UCL format is described in detail in this [document](ucl.md). Rspamd defines several variables and macros to extend -UCL functionality. - -## Rspamd variables - -- *CONFDIR*: configuration directory for Rspamd, found in `$PREFIX/etc/rspamd/` -- *RUNDIR*: runtime directory to store pidfiles or UNIX sockets -- *DBDIR*: persistent databases directory (used for statistics or symbols cache). -- *LOGDIR*: a directory to store log files -- *PLUGINSDIR*: plugins directory for Lua plugins -- *PREFIX*: basic installation prefix -- *VERSION*: Rspamd version string (e.g. "0.6.6") - -## Rspamd specific macros - -- *.include_map*: defines a map that is dynamically reloaded and updated if its content has changed. This macro is intended to define dynamic configuration files. - -## Rspamd basic configuration - -The basic Rspamd configuration is stored in `$CONFDIR/rspamd.conf`. By default, this file looks like this one: - -~~~ucl -lua = "$CONFDIR/lua/rspamd.lua" - -.include "$CONFDIR/options.conf" -.include "$CONFDIR/logging.conf" -.include "$CONFDIR/metrics.conf" -.include "$CONFDIR/workers.conf" -.include "$CONFDIR/composites.conf" - -.include "$CONFDIR/statistic.conf" - -.include "$CONFDIR/modules.conf" - -modules { - path = "$PLUGINSDIR/lua/" -} -~~~ - -In this file, we read a Lua script placed in `$CONFDIR/lua/rspamd.lua` and load Lua rules from it. Then we include a global [options](options.md) section followed by [logging](logging.md) logging configuration. The [metrics](metrics.md) section defines metric settings, including rule weights and Rspamd actions. The [workers](../workers/index.md) section specifies Rspamd workers settings. [Composites](composites.md) is a utility section that describes composite symbols. Statistical filters are defined in the [statistic](statistic.md) section. Rspamd stores module configurations (for both Lua and internal modules) in the [modules](../modules/index.md) section while modules themselves are loaded from the following portion of the configuration: - -~~~ucl -modules { - path = "$PLUGINSDIR/lua/" -} -~~~ - -The modules section defines the path or paths of directories or specific files. If a directory is specified then all files with a `.lua` suffix are loaded as lua plugins (the directory path is treated as a `*.lua` shell pattern). - -This configuration is not intended to be changed by the user, rather you can include your own configuration options as `.include`s. To redefine symbol weights and actions, it is recommended to use [dynamic configuration](settings.md). Nevertheless, the Rspamd installation script will never overwrite a user's configuration if it exists already. Please read the Rspamd changelog carefully, if you upgrade Rspamd to a new version, for all incompatible configuration changes. diff --git a/doc/markdown/configuration/logging.md b/doc/markdown/configuration/logging.md deleted file mode 100644 index 4ae51d532..000000000 --- a/doc/markdown/configuration/logging.md +++ /dev/null @@ -1,90 +0,0 @@ -# Rspamd logging settings - -## Introduction -Rspamd has a number of logging options. Firstly, there are three types of log output that are supported: console logging (just output log messages to console), file logging (output log messages to file) and logging via syslog. It is also possible to restrict logging to a specific level: - -* `error` - log only critical errors -* `warning` - log errors and warnings -* `info` - log all non-debug messages -* `debug` - log all including debug messages (huge amount of logging) - -It is possible to turn on debug messages for specific IP addresses. This can be useful for testing. For each logging type there are special mandatory parameters: log facility for syslog (read `syslog(3)` man page for details about facilities), log file for file logging. Also, file logging may be buffered for performance. To reduce logging noise, Rspamd detects sequential matching log messages and replaces them with a total number of repeats: - - #81123(fuzzy): May 11 19:41:54 rspamd file_log_function: Last message repeated 155 times - #81123(fuzzy): May 11 19:41:54 rspamd process_write_command: fuzzy hash was successfully added - -## Unique ID - -From version 1.0, Rspamd logs contain a unique ID for each logging message. This allows finding relevant messages quickly. Moreover, there is now a `module` definition: for example, `task` or `cfg` modules. Here is a quick example of how it works: imagine that we have an incoming task for some message. Then you'd see something like this in the logs: - - 2015-09-02 16:41:59 #45015(normal) ; task; accept_socket: accepted connection from ::1 port 52895 - 2015-09-02 16:41:59 #45015(normal) ; task; rspamd_message_parse: loaded message; id: ; queue-id: - -So the tag is `ed2abb` in this case. All subsequent processing related to this task will have the same tag. It is enabled not only on the `task` module, but also others, such as the `spf` or `lua` modules. For other modules, such as `cfg`, the tag is generated statically using a specific characteristic, for example the configuration file checksum. - -## Configuration parameters - -Here is summary of logging parameters: - -- `type` - Defines logging type (file, console or syslog). For some types mandatory attributes may be required: - + `filename` - path to log file for file logging - + `facility` - logging facility for syslog -- `level` - Defines logging level (error, warning, info or debug). -- `log_buffer` - For file and console logging defines buffer size that will be used for logging output. -- `log_urls` - Flag that defines whether all URLs in message should be logged. Useful for testing. -- `debug_ip` - List that contains IP addresses for which debugging should be turned on. -- `log_color` - Turn on coloring for log messages. Default: `no`. -- `debug_modules` - A list of modules that are enabled for debugging. The following modules are available here: - + `task` - task messages - + `cfg` - configuration messages - + `symcache` - messages from symbols cache - + `fuzzy_backend` - messages from fuzzy backend - + `lua` - messages from Lua code - + `spf` - messages from spf module - + `dkim` - messages from dkim module - + `main` - messages from the main process - + `dns` - messages from DNS resolver - + `map` - messages from maps in Rspamd - + `logger` - messages from the logger itself - -### Log format - -Rspamd supports a custom log format when writing information about a message to the log. (This feature is supported since version 1.1.) The format string looks as follows: - - - log_format =<< EOD - id: <$mid>,$if_qid{ qid: <$>,}$if_ip{ ip: $,}$if_user{ user: $,}$if_smtp_from{ from: <$>,} - (default: $is_spam ($action): [$scores] [$symbols]), - len: $len, time: $time_real real, - $time_virtual virtual, dns req: $dns_req - EOD - -Newlines are replaced with spaces. Both text and variables are supported in the log format line. Each variable can have an optional `if_` prefix, which will log only if it is triggered. Moreover, each variable can have an optional body value, where `$` is replaced with the variable value (as many times as it is found in the body, e.g. `$var{$$$$}` will be replaced with the variable's name repeated 4 times). - -Rspamd supports the following variables: - -- `mid` - message ID -- `qid` - queue ID -- `ip` - from IP -- `user` - authenticated user -- `smtp_from` - envelope from (or MIME from if SMTP from is absent) -- `mime_from` - MIME from -- `smtp_rcpt` - envelope rcpt (or MIME from if SMTP from is absent) - the first recipient -- `mime_rcpt` - MIME rcpt - the first recipient -- `smtp_rcpts` - envelope rcpts - all recipients -- `mime_rcpts` - MIME rcpts - all recipients -- `len` - length of message -- `is_spam` - a one-letter rating of spammyness: `T` for spam, `F` for ham and `S` for skipped messages -- `action` - default metric action -- `symbols` - list of all symbols -- `time_real` - real time of task processing -- `time_virtual` - CPU time of task processing -- `dns_req` - number of DNS requests -- `lua` - custom Lua script, e.g: - -~~~lua - $lua{ - return function(task) - return 'text parts: ' .. tostring(#task:get_text_parts()) end - } -~~~ diff --git a/doc/markdown/configuration/metrics.md b/doc/markdown/configuration/metrics.md deleted file mode 100644 index 8c6d55fdd..000000000 --- a/doc/markdown/configuration/metrics.md +++ /dev/null @@ -1,109 +0,0 @@ -# Rspamd metrics settings - -## Introduction - -The metrics section configures weights for symbols and actions applied to a message by Rspamd. You can imagine a metric as a decision made by Rspamd for a specific message by a set of rules. Each rule can insert a `symbol` into the metric, which means that this rule is true for this message. Each symbol can have a floating point value called a `weight`, which means the significance of the corresponding rule. Rules with a positive weight increase the spam factor, while rules with negative weights increase the ham factor. The result is the overall message score. - -After a score is evaluated, Rspamd selects an appropriate `action` for a message. Rspamd defines the following actions, ordered by spam factor, in ascending order: - -1. `no action` - a message is likely ham -2. `greylist` - a message should be greylisted to ensure sender's validity -3. `add header` - add the specific `spam` header indicating that a message is likely spam -4. `rewrite subject` - add spam subject to a message -5. `soft reject` - temporarily reject a message -6. `reject` - permanently reject a message - -Actions are assumed to be applied simultaneously, meaning that the `add header` action implies, for example, the `greylist` action. `add header` and `rewrite subject` are equivalent to Rspamd. They are just two options with the same purpose: to mark a message as probable spam. The `soft reject` action is mainly used to indicate temporary issues in mail delivery, for instance, exceeding a rate limit. - -There is also a special purpose metric called `default` that acts as the main metric to treat a message as spam or ham. Actually, all clients that use Rspamd just check the default metric to determine whether a message is spam or ham. Therefore, the default configuration just defines the `default` metric. - -## Configuring metrics -Each metric is defined by a `metric` object in the Rspamd configuration file. This object has one mandatory attribute - `name` - which defines the name of the metric: - -~~~ucl -metric { - # Define default metric - name = "default"; -} -~~~ -It is also possible to define some generic attributes for the metric: - -* `grow_factor` - the multiplier applied for the subsequent symbols inserting by the following rule: - -$$ -score = score + grow\_factor * symbol\_weight -$$ - -$$ - grow\_factor = grow\_factor * grow\_factor -$$ - -By default this value is `1.0` meaning that no weight growing is defined. By increasing this value you increase the effective score of messages with multiple `spam` rules matched. This value is not affected by negative score values. - -* `subject` - string value that is prepended to the message's subject if the `rewrite subject` action is applied -* `unknown_weight` - weight for unknown rules. If this parameter is specified, all rules can add symbols to this metric. If such a rule is not specified by this metric then its weight is equal to this option's value. Please note, that adding this option means that all rules will be checked by Rspamd, on the contrary, if no `unknown_weight` metric is specified then rules that are not registered anywhere are silently ignored by Rspamd. - -The content of this section is in two parts: symbols and actions. Actions is an object of all actions defined by this metric. If some actions are skipped, they won't be ever suggested by Rspamd. The Actions section looks as follows: - -~~~ucl -metric { -... - actions { - reject = 15; - add_header = 6; - greylist = 4; - }; -... -} -~~~ - -You can use an underscore (`_`) instead of white space in action names to simplify the configuration. - -Symbols are defined by an object with the following properties: - -* `weight` - the symbol weight as floating point number (negative or positive); by default the weight is `1.0` -* `name` - symbolic name for a symbol (mandatory attribute) -* `group` - a group of symbols, for example `DNSBL symbols` (as shown in WebUI) -* `description` - optional symbolic description for WebUI -* `one_shot` - normally, Rspamd inserts a symbol as many times as the corresponding rule matches for the specific message; however, if `one_shot` is `true` then only the **maximum** weight is added to the metric. `grow_factor` is correspondingly not modified by a repeated triggering of `one_shot` rules. - -A symbol definition can look like this: - -~~~ucl -symbol { - name = "RWL_SPAMHAUS_WL_IND"; - weight = -0.7; - description = "Sender listed at Spamhaus whitelist"; -} -~~~ - -A single metric can contain multiple symbols definitions. - - -## Symbol groups - -Symbols can be grouped to specify their common functionality. For example, one could group all `RBL` symbols together. Moreover, from Rspamd version 0.9 it is possible to specify a group score limit, which could be useful, for instance, if a specific group should not unconditionally send a message to the `spam` class. Here is an example of such a functionality: - -~~~ucl -metric { - name = default; # This is mandatory option - - group { - name = "RBL group"; - max_score = 6.0; - - symbol { - name = "RBL1"; - weight = 1; - } - symbol { - name = "RBL2"; - weight = 4; - } - symbol { - name = "RBL3"; - weight = 5; - } - } -} -~~~ diff --git a/doc/markdown/configuration/options.md b/doc/markdown/configuration/options.md deleted file mode 100644 index 7f494fd6f..000000000 --- a/doc/markdown/configuration/options.md +++ /dev/null @@ -1,79 +0,0 @@ -# Rspamd options settings - -## Introduction - -The options section defines basic Rspamd behaviour. Options are global for all types of workers. The default options are shown in the following example snippet: - -~~~ucl -filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check"; -raw_mode = false; -one_shot = false; -cache_file = "$DBDIR/symbols.cache"; -map_watch_interval = 1min; -dynamic_conf = "$DBDIR/rspamd_dynamic"; -history_file = "$DBDIR/rspamd.history"; -check_all_filters = false; -dns { - timeout = 1s; - sockets = 16; - retransmits = 5; -} -tempdir = "/tmp"; -url_tld = "${PLUGINSDIR}/effective_tld_names.dat"; -classify_headers = [ - "User-Agent", - "X-Mailer", - "Content-Type", - "X-MimeOLE", -]; - -control_socket = "$DBDIR/rspamd.sock mode=0600"; -~~~ - -## Global options - -* `filters`: comma separated string that defines enabled **internal** Rspamd filters; for a list of the internal filters please check the [modules page](../modules/) -* `one_shot`: if this flag is set to `true` then multiple rule triggers do not increase the total score of messages (however, this option can also be individually configured in the `metric` section for each symbol) -* `cache_file`: used to store information about rules and their statistics; this file is automatically generated if Rspamd detects that a symbol's list has been changed. -* `map_watch_interval`: interval between map scanning; the actual check interval is jittered to avoid simultaneous checking, so the real interval is from this value up to 2x this value -* `check_all_filters`: turns off optimizations when a message gains an overall score more than the `reject` score for the default metric; this optimization can also be turned off for each request individually -* `history_file`: this file is automatically created and refreshed on shutdown to preserve the rolling history of operations displayed by the WebUI across restarts -* `temp_dir`: a directory for temporary files (can also be set via the environment variable `TMPDIR`). -* `url_tld`: path to file with top level domain suffixes used by Rspamd to find URLs in messages; by default this file is shipped with Rspamd and should not be touched manually -* `pid_file`: file used to store PID of the Rspamd main process (not used with systemd) -* `min_word_len`: minimum size in letters (valid for utf-8 as well) for a sequence of characters to be treated as a word; normally Rspamd skips sequences if they are shorter or equal to three symbols -* `control_socket`: path/bind for the control socket -* `classify_headers`: list of headers that are processed by statistics -* `history_rows`: number of rows in the recent history table -* `explicit_modules`: always load modules from the list even if they have no configuration section in the file -* `disable_hyperscan`: disable Hyperscan optimizations (if enabled at compile time) -* `cores_dir`: directory where Rspamd should drop core files -* `max_cores_size`: maximum total size of core files that are placed in `cores_dir` -* `max_cores_count`: maximum number of files in `cores_dir` -* `local_addrs` or `local_networks`: map or list of IP networks used as local, so certain checks are skipped for them (e.g. SPF checks) - -## DNS options - -These options are in a separate subsection named `dns` and specify the behaviour of Rspamd name resolution. Here is a list of available tunables: - -* `nameserver`: list (or array) of DNS servers to be used (if this option is skipped, then `/etc/resolv.conf` is parsed instead). It is also possible to specify weights of DNS servers to balance the payload, e.g. - -~~~ucl -options { - dns { - # 9/10 on 127.0.0.1 and 1/10 to 8.8.8.8 - nameserver = ["127.0.0.1:10", "8.8.8.8:1"]; - # or - # nameserver = "127.0.0.1:10"; - # nameserver = "8.8.8.8:1"; - } -} -~~~ - -* `timeout`: timeout for each DNS request -* `retransmits`: how many times each request is retransmitted before it is treated as failed (the overall timeout for each request is thus `timeout * retransmits`) -* `sockets`: how many sockets are opened to a remote DNS resolver; can be tuned if you have tens of thousands of requests per second). - -## Upstream options - -**TODO** diff --git a/doc/markdown/configuration/settings.md b/doc/markdown/configuration/settings.md deleted file mode 100644 index c35920368..000000000 --- a/doc/markdown/configuration/settings.md +++ /dev/null @@ -1,81 +0,0 @@ -# Rspamd user settings - -## Introduction - -Rspamd allows exceptional control over the settings which will apply to incoming messages. Each setting can define a set of custom metric weights, symbols or actions. An administrator can also skip spam checks for certain messages completely, if required. Rspamd settings can be loaded as dynamic maps and updated automatically if a corresponding file or URL has changed since its last update. - -To load settings as a dynamic map, you can set 'settings' to a map string: - -~~~ucl -settings = "http://host/url" -~~~ - -If you don't want dynamic updates then you can define settings as an object: - -~~~ucl -settings { - setting1 = { - ... - } - setting2 = { - ... - } -} -~~~ - -## Settings structure - -The settings file should contain a single section called "settings": - -~~~ucl -settings { - some_users { - priority = high; - from = "@example.com"; - rcpt = "admin"; - rcpt = "/user.*/"; - ip = "172.16.0.0/16"; - user = "@example.net"; - apply "default" { - symbol1 = 10.0; - symbol2 = 0.0; - actions { - reject = 100.0; - greylist = 10.0; - "add header" = 5.0; # Please note the space, NOT an underscore - } - } - # Always add these symbols when settings rule has matched - symbols [ - "symbol2", "symbol4" - ] - } - whitelist { - priority = low; - rcpt = "postmaster@example.com"; - want_spam = yes; - } -} -~~~ - -So each setting has the following attributes: - -- `name` - section name that identifies this specific setting (e.g. `some_users`) -- `priority` - high or low; high priority rules are matched first (default priority is low) -- `match list` - list of rules which this rule matches: - + `from` - match SMTP from - + `rcpt` - match RCPT - + `ip` - match source IP address - + `user` - matches authenticated user ID of message sender if any -- `apply` - list of applied rules, identified by metric name (e.g. `default`) - + `symbol` - modify weight of a symbol - + `actions` - defines actions -- `symbols` - add symbols from the list if a rule has matched - -The match section performs `AND` operation on different matches: for example, if you have `from` and `rcpt` in the same rule, then the rule matches only when `from` `AND` `rcpt` match. For similar matches, the `OR` rule applies: if you have multiple `rcpt` matches, then *any* of these will trigger the rule. If a rule is triggered then no more rules are matched. - -Regexp rules can be slow and should not be used extensively. - -The picture below describes the architecture of settings matching. - -![Settings match procedure](settings.png "Settings match procedure") diff --git a/doc/markdown/configuration/settings.png b/doc/markdown/configuration/settings.png deleted file mode 100644 index 2fbec2777..000000000 Binary files a/doc/markdown/configuration/settings.png and /dev/null differ diff --git a/doc/markdown/configuration/statistic.md b/doc/markdown/configuration/statistic.md deleted file mode 100644 index 329832091..000000000 --- a/doc/markdown/configuration/statistic.md +++ /dev/null @@ -1,226 +0,0 @@ -# Rspamd statistic settings - -## Introduction - -Statistics is used by Rspamd to define the `class` of message: either spam or ham. The overall algorithm is based on Bayesian theorem -that defines probabilities combination. In general, it defines the probability of that a message belongs to the specified class (namely, `spam` or `ham`) -base on the following factors: - -- the probability of a specific token to be spam or ham (which means efficiently count of a token's occurrences in spam and ham messages) -- the probability of a specific token to appear in a message (which efficiently means frequency of a token divided by a number of tokens in a message) - -## Statistics Architecture - -However, Rspamd uses more advanced techniques to combine probabilities, such as sparsed bigramms (OSB) and inverse chi-square distribution. -The key idea of `OSB` algorithm is to use not merely single words as tokens but combinations of words weighted by theirs positions. -This schema is displayed in the following picture: - -![OSB algorithm](https://rspamd.com/img/rspamd-schemes.004.png "Rspamd OSB scheme") - -The main disadvantage is the amount of tokens which is multiplied by size of window. In Rspamd, we use a window of 5 tokens that means that -the number of tokens is about 5 times larger than the amount of words. - -Statistical tokens are stored in statfiles which, in turn, are mapped to specific backends. This architecture is displayed in the following image: - -![Statistics architecture](https://rspamd.com/img/rspamd-schemes.005.png "Rspamd statistics architecture") - -## Statistics Configuration - -Starting from Rspamd 1.0, we propose to use `sqlite3` as backed and `osb` as tokenizer. That also enables additional features, such as tokens normalization and -metainformation in statistics. The following configuration demonstrates the recommended statistics configuration: - -~~~ucl -# Classifier's algorithm is BAYES -classifier "bayes" { - tokenizer { - name = "osb"; - } - - # Unique name used to learn the specific classifier - name = "common_bayes"; - - cache { - path = "${DBDIR}/learn_cache.sqlite"; - } - - # Minimum number of words required for statistics processing - min_tokens = 11; - # Minimum learn count for both spam and ham classes to perform classification - min_learns = 200; - - backend = "sqlite3"; - languages_enabled = true; - statfile { - symbol = "BAYES_HAM"; - path = "${DBDIR}/bayes.ham.sqlite"; - spam = false; - } - statfile { - symbol = "BAYES_SPAM"; - path = "${DBDIR}/bayes.spam.sqlite"; - spam = true; - } -} -~~~ - -It is also possible to organize per-user statistics using SQLite3 backend. However, you should ensure that Rspamd is called at the -finally delivery stage (e.g. LDA mode) to avoid multi-recipients messages. In case of a multi-recipient message, Rspamd would just use the -first recipient for user-based statistics which might be inappropriate for your configuration (however, Rspamd prefers SMTP recipients over MIME ones and prioritize -the special LDA header called `Deliver-To` that can be appended by `-d` options for `rspamc`). To enable per-user statistics, just add `users_enabled = true` property -to the **classifier** configuration. You can use per-user and per-language statistics simultaneously. For both types of statistics, Rspamd also -looks to the default language and default user's statistics allowing to have the common set of tokens shared for all users/languages. - -## Using Lua scripts for `per_user` classifier - -It is also possible to create custom Lua scripts to use customized user or language for a specific task. Here is an example -of such a script for extracting domain names from recipients organizing thus per-domain statistics: - -~~~ucl -classifier "bayes" { - tokenizer { - name = "osb"; - } - - name = "bayes2"; - - min_tokens = 11; - min_learns = 200; - - backend = "sqlite3"; - per_language = true; - per_user = < `10` in this case) -* `autolearn = "return function(task) ... end"`: use the following Lua function to detect if autolearn is needed (function should return 'ham' if learn as ham is needed and string 'spam' if learn as spam is needed, if no learn is needed then a function can return anything including `nil`) - -Redis backend is highly recommended for autolearning purposes since it's the only backend with high concurrency level when multiple writers are properly synchronized. diff --git a/doc/markdown/configuration/ucl.md b/doc/markdown/configuration/ucl.md deleted file mode 100644 index 56b582c1f..000000000 --- a/doc/markdown/configuration/ucl.md +++ /dev/null @@ -1,386 +0,0 @@ -# UCL configuration language - -**Table of Contents** *generated with [DocToc](http://doctoc.herokuapp.com/)* - -- [Introduction](#introduction) -- [Basic structure](#basic-structure) -- [Improvements to the json notation](#improvements-to-the-json-notation) - - [General syntax sugar](#general-syntax-sugar) - - [Automatic arrays creation](#automatic-arrays-creation) - - [Named keys hierarchy](#named-keys-hierarchy) - - [Convenient numbers and booleans](#convenient-numbers-and-booleans) -- [General improvements](#general-improvements) - - [Commments](#commments) - - [Macros support](#macros-support) - - [Variables support](#variables-support) - - [Multiline strings](#multiline-strings) -- [Emitter](#emitter) -- [Validation](#validation) -- [Performance](#performance) -- [Conclusion](#conclusion) - -## Introduction {#introduction} - -This document describes the main features and principles of the configuration -language called `UCL` - universal configuration language. - -## Basic structure {#basic-structure} - -UCL is heavily infused by `nginx` configuration as the example of a convenient configuration -system. However, UCL is fully compatible with `JSON` format and is able to parse json files. -For example, you can write the same configuration in the following ways: - -* in nginx like: - -~~~ucl -param = value; -section { - param = value; - param1 = value1; - flag = true; - number = 10k; - time = 0.2s; - string = "something"; - subsection { - host = { - host = "hostname"; - port = 900; - } - host = { - host = "hostname"; - port = 901; - } - } -} -~~~ - -* or in JSON: - -~~~json -{ - "param": "value", - "param1": "value1", - "flag": true, - "subsection": { - "host": [ - { - "host": "hostname", - "port": 900 - }, - { - "host": "hostname", - "port": 901 - } - ] - } -} -~~~ - -## Improvements to the json notation. {#improvements-to-the-json-notation} - -There are various things that make ucl configuration more convenient for editing than strict json: - -### General syntax sugar - -* Braces are not necessary to enclose a top object: it is automatically treated as an object: - -~~~json -"key": "value" -~~~ - -is equal to: - -~~~json -{"key": "value"} -~~~ - -* There is no requirement of quotes for strings and keys, moreover, `:` may be replaced `=` or even be skipped for objects: - -~~~ucl -key = value; -section { - key = value; -} -~~~ - -is equal to: - -~~~json -{ - "key": "value", - "section": { - "key": "value" - } -} -~~~ - -* No commas mess: you can safely place a comma or semicolon for the last element in an array or an object: - -~~~json -{ - "key1": "value", - "key2": "value", -} -~~~ - -### Automatic arrays creation - -* Non-unique keys in an object are allowed and are automatically converted to the arrays internally: - -~~~json -{ - "key": "value1", - "key": "value2" -} -~~~ - -is converted to: - -~~~json -{ - "key": ["value1", "value2"] -} -~~~ - -### Named keys hierarchy - -UCL accepts named keys and organize them into objects hierarchy internally. Here is an example of this process: - -~~~ucl -section "blah" { - key = value; -} -section foo { - key = value; -} -~~~ - -is converted to the following object: - -~~~ucl -section { - blah { - key = value; - } - foo { - key = value; - } -} -~~~ - -Plain definitions may be more complex and contain more than a single level of nested objects: - -~~~ucl -section "blah" "foo" { - key = value; -} -~~~ - -is presented as: - -~~~ucl -section { - blah { - foo { - key = value; - } - } -} -~~~ - -### Convenient numbers and booleans - -* Numbers can have suffixes to specify standard multipliers: - + `[kKmMgG]` - standard 10 base multipliers (so `1k` is translated to 1000) - + `[kKmMgG]b` - 2 power multipliers (so `1kb` is translated to 1024) - + `[s|min|d|w|y]` - time multipliers, all time values are translated to float number of seconds, for example `10min` is translated to 600.0 and `10ms` is translated to 0.01 -* Hexadecimal integers can be used by `0x` prefix, for example `key = 0xff`. However, floating point values can use decimal base only. -* Booleans can be specified as `true` or `yes` or `on` and `false` or `no` or `off`. -* It is still possible to treat numbers and booleans as strings by enclosing them in double quotes. - -## General improvements {#general-improvements} - -### Commments {#comments} - -UCL supports different style of comments: - -* single line: `#` -* multiline: `/* ... */` - -Multiline comments may be nested: - -~~~c -# Sample single line comment -/* - some comment - /* nested comment */ - end of comment -*/ -~~~ - -### Macros support - -UCL supports external macros both multiline and single line ones: - -~~~ucl -.macro "sometext"; -.macro { - Some long text - .... -}; -~~~ - -Moreover, each macro can accept an optional list of arguments in braces. These -arguments themselves are the UCL object that is parsed and passed to a macro as -options: - -~~~ucl -.macro(param=value) "something"; -.macro(param={key=value}) "something"; -.macro(.include "params.conf") "something"; -.macro(#this is multiline macro -param = [value1, value2]) "something"; -.macro(key="()") "something"; -~~~ - -UCL also provide a convenient `include` macro to load content from another files -to the current UCL object. This macro accepts either path to file: - -~~~ucl -.include "/full/path.conf" -.include "./relative/path.conf" -.include "${CURDIR}/path.conf" -~~~ - -or URL (if ucl is built with url support provided by either `libcurl` or `libfetch`): - - .include "http://example.com/file.conf" - -`.include` macro supports a set of options: - -* `try` (default: **false**) - if this option is `true` than UCL treats errors on loading of -this file as non-fatal. For example, such a file can be absent but it won't stop the parsing -of the top-level document. -* `sign` (default: **false**) - if this option is `true` UCL loads and checks the signature for -a file from path named `.sig`. Trusted public keys should be provided for UCL API after -parser is created but before any configurations are parsed. -* `glob` (default: **false**) - if this option is `true` UCL treats the filename as GLOB pattern and load -all files that matches the specified pattern (normally the format of patterns is defined in `glob` manual page -for your operating system). This option is meaningless for URL includes. -* `url` (default: **true**) - allow URL includes. -* `path` (default: empty) - A UCL_ARRAY of directories to search for the include file. -Search ends after the first patch, unless `glob` is true, then all matches are included. -* `prefix` (default false) - Put included contents inside an object, instead -of loading them into the root. If no `key` is provided, one is automatically generated based on each files basename() -* `key` (default: ) - Key to load contents of include into. If -the key already exists, it must be the correct type -* `target` (default: object) - Specify if the `prefix` `key` should be an -object or an array. -* `priority` (default: 0) - specify priority for the include (see below). -* `duplicate` (default: 'append') - specify policy of duplicates resolving: - - `append` - default strategy, if we have new object of higher priority then it replaces old one, if we have new object with less priority it is ignored completely, and if we have two duplicate objects with the same priority then we have a multi-value key (implicit array) - - `merge` - if we have object or array, then new keys are merged inside, if we have a plain object then an implicit array is formed (regardeless of priorities) - - `error` - create error on duplicate keys and stop parsing - - `rewrite` - always rewrite an old value with new one (ignoring priorities) - -Priorities are used by UCL parser to manage the policy of objects rewriting during including other files -as following: - -* If we have two objects with the same priority then we form an implicit array -* If a new object has bigger priority then we overwrite an old one -* If a new object has lower priority then we ignore it - -By default, the priority of top-level object is set to zero (lowest priority). Currently, -you can define up to 16 priorities (from 0 to 15). Includes with bigger priorities will -rewrite keys from the objects with lower priorities as specified by the policy. - -### Variables support - -UCL supports variables in input. Variables are registered by a user of the UCL parser and can be presented in the following forms: - -* `${VARIABLE}` -* `$VARIABLE` - -UCL currently does not support nested variables. To escape variables one could use double dollar signs: - -* `$${VARIABLE}` is converted to `${VARIABLE}` -* `$$VARIABLE` is converted to `$VARIABLE` - -However, if no valid variables are found in a string, no expansion will be performed (and `$$` thus remains unchanged). This may be a subject -to change in future libucl releases. - -### Multiline strings - -UCL can handle multiline strings as well as single line ones. It uses shell/perl like notation for such objects: - - key = < 1 then - return selected - end - else - -- Language not detected - local selected = {} - for _,st in ipairs(classifier:get_statfiles()) do - local st_l = st:get_param('language') - -- Insert only statfiles without language - if not st_l then - table.insert(selected, st) - end - end - if table.maxn(selected) > 1 then - return selected - end - end - - return nil -end -~~~ - -* *rspamd_config* - is a global object that allows you to modify configuration and register new symbols. - -## Writing advanced rules {#luarules} - -So by using these two tables it is possible to configure rules and metrics. Also note that it is possible to use any Lua functions and Rspamd libraries: - -~~~lua --- Declare variable that contains regexp rule definition -local rulebody = string.format('%s & !%s', '/re1/', '/re2') --- Set global table element config['regexp']['test_rule'] = rulebody --- Write message to log -rspamd_logger.info('Loaded test rule: ' .. rulebody) -~~~ - -Also it is possible to declare functions and use `closures` when defining Rspamd rules: - -~~~lua --- Here is a sample of using closure function inside rule -local function check_headers_tab(task, header_name) - -- Extract raw headers from message - local raw_headers = task:get_raw_header(header_name) - -- Make match of headers, that are separated with tabs, not spaces - if raw_headers then - for _,rh in ipairs(raw_headers) do - if rh['tab_separated'] then - -- We have header value separated by tab symbol - return true,rh['name'] - end - end - end - return false -end - -rspamd_config.HEADER_TAB_FROM_WHITELISTED = function(task) return check_headers_tab(task, "From") end -rspamd_config.HEADER_TAB_TO_WHITELISTED = function(task) return check_headers_tab(task, "To") end -rspamd_config.HEADER_TAB_DATE_WHITELISTED = function(task) return check_headers_tab(task, "Date") end - --- Table form of rule definition -rspamd_config.R_EMPTY_IMAGE = { - callback = function(task) - local tp = task:get_text_parts() -- get text parts in a message - - for _,p in ipairs(tp) do -- iterate over text parts array using `ipairs` - if p:is_html() then -- if the current part is html part - local hc = p:get_html() -- we get HTML context - local len = p:get_length() -- and part's length - - if len < 50 then -- if we have a part that has less than 50 bytes of text - local images = hc:get_images() -- then we check for HTML images - - if images then -- if there are images - for _,i in ipairs(images) do -- then iterate over images in the part - if i['height'] + i['width'] >= 400 then -- if we have a large image - return true -- add symbol - end - end - end - end - end - end - end, - score = 10.0, - condition = function(task) - if task:get_header('Subject') then - return true - end - return false - end, - description = 'No text parts and a large image', - score = 3.1, -} -~~~ - -Using Lua in rules provides many abilities to write complex mail filtering rules. - -## Writing Lua plugins {#luaplugins} - -Plugins are more complex filters than ordinary rules. Plugins can have their own configuration parameters and multiple callbacks. Plugins can make DNS requests, read from Rspamd maps and insert custom results. - -### Structure of the typical plugin - -Each Rspamd plugin has a common structure: - -- Registering configuration parameters -- Reading configuration parameters and set up callbacks -- Callbacks that are called by Rspamd during message processing - -Here is a simple plugin example: - -~~~lua -local config_param = 'default' - -local function sample_callback(task) -end - - --- Reading configuration - --- Get all options for this plugin -local opts = Rspamd_config:get_all_opt('sample') -if opts then - if opts['config'] then - config_param = opts['config'] - -- Register callback - Rspamd_config:register_symbol('some_symbol', sample_callback) - end -end -~~~ - -This plugin uses global variable *rspamd_config* to extract configuration options. Then it registers function `sample_callback` that will be called for processing symbol `some_symbol`. - -### Using DNS requests inside plugins - -It is often required to make DNS requests for messages checks. Here is an example of making asynchronous DNS request from Rspamd Lua plugin: - -~~~lua --- Function-callback of Rspamd rule -local function symbol_cb(task) - -- Task is now local variable - - local function dns_cb(resolver, to_resolve, results, err, str) - -- Increase total count of dns requests - task:inc_dns_req() - if results then - task:insert_result('symbol', 1, str) - end - end - -- Resolve 'example.com' using primitives from the task passed - task:get_resolver():resolve_a(task:get_session(), task:get_mempool(), - 'example.com', dns_cb, 'sample string') -end -~~~ - -### Using maps from Lua plugin - -Maps hold dynamically loaded data like lists or ip trees. It is possible to use 3 types of maps: - -* **radix_tree** stores ip addresses -* **hash_map** stores plain strings (domains usually) -* **callback** call for a specified Lua callback when a map is loaded or changed, map's content is passed to that callback as a parameter - -Here is a sample of using maps from Lua API: - -~~~lua -local Rspamd_logger = require "rspamd_logger" - --- Add two maps in configuration section -local hash_map = Rspamd_config:add_hash_map('file:///path/to/file', 'sample map') -local radix_tree = Rspamd_config:add_radix_map('http://somehost.com/test.dat', 'sample ip map') -local generic_map = Rspamd_config:add_map('file:///path/to/file', 'sample generic map', - function(str) - -- This callback is called when a map is loaded or changed - -- Str contains map content - Rspamd_logger.info('Got generic map content: ' .. str) - end) - -local function sample_symbol_cb(task) - -- Check whether hash map contains from address of message - if hash_map:get_key(task:get_from()) then - -- Check whether radix map contains client's ip - if radix_map:get_key(task:get_from_ip_num()) then - ... - end - end -end -~~~ - -## Conclusions {#luaconclusion} - -Lua plugins is a powerful tool for creating complex filters that can access practically all features of Rspamd. Lua plugins can be used for writing custom rules and interact with Rspamd in many ways, can use maps and make DNS requests. Rspamd is shipped with a couple of Lua plugins that can be used as examples while writing your own plugins. - -## References {#luareference} - -- [Lua manual](http://www.lua.org/manual/5.2/) -- [Programming in Lua](http://www.lua.org/pil/) diff --git a/doc/markdown/migration.md b/doc/markdown/migration.md deleted file mode 100644 index 889b44574..000000000 --- a/doc/markdown/migration.md +++ /dev/null @@ -1,232 +0,0 @@ -# Migrating between rspamd versions - -This document describes incompatible changes introduced in recent rspamd versions and details how to update your rules and configuration accordingly. - -## Migrating from rspamd 1.0 to rspamd 1.1 - -The only change here affects users with per-user statistics enabled. There is an incompatible change in sqlite3 and per-user behaviour: - -Now both redis and sqlite3 follow common principles for per-user statistics: - -* If per-user statistics is enabled check per-user tokens **ONLY** -* If per-user statistics is not enabled then check common tokens **ONLY** - -If you need the old behaviour, then you need to use a separate classifier for per-user statistics, for example: - -~~~ucl - classifier { - tokenizer { - name = "osb"; - } - name = "bayes_user"; - min_tokens = 11; - backend = "sqlite3"; - per_language = true; - per_user = true; - statfile { - path = "/tmp/bayes.spam.sqlite"; - symbol = "BAYES_SPAM_USER"; - } - statfile { - path = "/tmp/bayes.ham.sqlite"; - symbol = "BAYES_HAM_USER"; - } - } - classifier { - tokenizer { - name = "osb"; - } - name = "bayes"; - min_tokens = 11; - backend = "sqlite3"; - per_language = true; - statfile { - path = "/tmp/bayes.spam.sqlite"; - symbol = "BAYES_SPAM"; - } - statfile { - path = "/tmp/bayes.ham.sqlite"; - symbol = "BAYES_HAM"; - } - } -~~~ - -## Migrating from rspamd 0.9 to rspamd 1.0 - -In rspamd 1.0 the default settings for statistics tokenization have been changed to `modern`, meaning that tokens are now generated from normalized words and there are various improvements which are incompatible with the statistics model used in pre-1.0 versions. To use these new features you should either **relearn** your statistics or continue using your old statistics **without** new features by adding a `compat` parameter: - -~~~ucl -classifier { -... - tokenizer { - compat = true; - } -... -} -~~~ - -The recommended way to store statistics now is the `sqlite3` backend (which is incompatible with the old mmap backend): - -~~~ucl -classifier { - type = "bayes"; - tokenizer { - name = "osb"; - } - cache { - path = "${DBDIR}/learn_cache.sqlite"; - } - min_tokens = 11; - backend = "sqlite3"; - languages_enabled = true; - statfile { - symbol = "BAYES_HAM"; - path = "${DBDIR}/bayes.ham.sqlite"; - spam = false; - } - statfile { - symbol = "BAYES_SPAM"; - path = "${DBDIR}/bayes.spam.sqlite"; - spam = true; - } -} -~~~ - -## Migrating from rspamd 0.6 to rspamd 0.7 - -### WebUI changes - -The rspamd web interface is now a part of the rspamd distribution. Moreover, all static files are now served by rspamd itself so you won't need to set up a separate web server to distribute static files. At the same time, the WebUI worker has been removed and the controller acts as WebUI+old_controller which allows it to work with both a web browser and the rspamc client. However, you might still want to set up a full-featured HTTP server in front of rspamd to enable, for example, TLS and access controls. - -Now there are two password levels for rspamd: `password` for read-only commands and `enable_password` for data changing commands. If `enable_password` is not specified then `password` is used for both commands. - -Here is an example of the full configuration of the rspamd controller worker to serve the WebUI: - -~~~ucl -worker { - type = "controller"; - bind_socket = "localhost:11334"; - count = 1; - password = "q1"; - enable_password = "q2"; - secure_ip = "127.0.0.1"; # Allows to use *all* commands from this IP - static_dir = "${WWWDIR}"; -} -~~~ - -### Settings changes - -The settings system has been completely reworked. It is now a lua plugin that registers pre-filters and assigns settings according to dynamic maps or a static configuration. Should you want to use the new settings system then please check the recent [documentation](https://rspamd.com/doc/configuration/settings.html). The old settings have been completely removed from rspamd. - -### Lua changes - -There are many changes in the lua API and some of them are, unfortunately, breaking ones. - -* many superglobals are removed: now rspamd modules need to be loaded explicitly, -the only global remaining is `rspamd_config`. This affects the following modules: - - `rspamd_logger` - - `rspamd_ip` - - `rspamd_http` - - `rspamd_cdb` - - `rspamd_regexp` - - `rspamd_trie` - -~~~lua -local rspamd_logger = require "rspamd_logger" -local rspamd_trie = require "rspamd_trie" -local rspamd_cdb = require "rspamd_cdb" -local rspamd_ip = require "rspamd_ip" -local rspamd_regexp = require "rspamd_regexp" -~~~ - -* new system of symbols registration: now symbols can be registered by adding new indices to `rspamd_config` object. Old version: - -~~~lua -local reconf = config['regexp'] -reconf['SYMBOL'] = function(task) -... -end -~~~ - -new one: - -~~~lua -rspamd_config.SYMBOL = function(task) -... -end -~~~ - -`rspamd_message` is **removed** completely; you should use task methods to access message data. This includes such methods as: - -* `get_date` - this method can now return a date for task and message based on the arguments: - -~~~lua -local dm = task:get_date{format = 'message'} -- MIME message date -local dt = task:get_date{format = 'connect'} -- check date -~~~ - -* `get_header` - this function is totally reworked. Now `get_header` version returns just a decoded string, `get_header_raw` returns an undecoded string and `get_header_full` returns the full list of tables. Please consult the corresponding [documentation](https://rspamd.com/doc/lua/task.html) for details. You also might want to update the old invocation of task:get_header to the new one. -Old version: - -~~~lua -function kmail_msgid (task) - local msg = task:get_message() - local header_msgid = msg:get_header('Message-Id') - if header_msgid then - -- header_from and header_msgid are tables - for _,header_from in ipairs(msg:get_header('From')) do - ... - end - end - return false -end -~~~ - -new one: - -~~~lua -function kmail_msgid (task) - local header_msgid = task:get_header('Message-Id') - if header_msgid then - local header_from = task:get_header('From') - -- header_from and header_msgid are strings - end - return false -end -~~~ - -or with the full version: - -~~~lua -rspamd_config.FORGED_GENERIC_RECEIVED5 = function (task) - local headers_recv = task:get_header_full('Received') - if headers_recv then - -- headers_recv is now the list of tables - for _,header_r in ipairs(headers_recv) do - if re:match(header_r['value']) then - return true - end - end - end - return false -end -~~~ - -* `get_from` and `get_recipients` now accept optional numeric arguments that specifies where to get sender and recipients for a message. By default, this argument is `0` which means that data is initially checked in the SMTP envelope (meaning `MAIL FROM` and `RCPT TO` SMTP commands) and if the envelope data is inaccessible then it is grabbed from MIME headers. Value `1` means that data is checked on envelope only, while `2` switches mode to MIME headers. Here is an example from the `forged_recipients` module: - -~~~lua --- Check sender -local smtp_from = task:get_from(1) -if smtp_from then - local mime_from = task:get_from(2) - if not mime_from or - not (string.lower(mime_from[1]['addr']) == - string.lower(smtp_from[1]['addr'])) then - task:insert_result(symbol_sender, 1) - end -end -~~~ - -### Protocol changes - -rspamd now uses `HTTP` protocols for all operations, therefore an additional client library is unlikely to be needed. The fallback to the old `spamc` protocol has also been implemented to be automatically compatible with `rmilter` and other software that uses the `rspamc` protocol. diff --git a/doc/markdown/modules/chartable.md b/doc/markdown/modules/chartable.md deleted file mode 100644 index 5458427a0..000000000 --- a/doc/markdown/modules/chartable.md +++ /dev/null @@ -1,10 +0,0 @@ -# Chartable module - -This module allows to find number of characters from the different [unicode scripts](http://www.unicode.org/reports/tr24/). Finally, it evaluates number of scrips changes, e.g. 'a網絡a' is treated as 2 script changes - from latin to chineese and from chineese back to latin, divided by total number of unicode characters. If the product of this division is higher than threshold then a symbol is inserted. By default threshold is `0.1` meaning that script changes occurrs approximantely for 10% of characters. - -~~~ucl -chartable { - symbol = "R_CHARSET_MIXED"; - threshold = 0.1; -} -~~~ diff --git a/doc/markdown/modules/dcc.md b/doc/markdown/modules/dcc.md deleted file mode 100644 index 36931ac3a..000000000 --- a/doc/markdown/modules/dcc.md +++ /dev/null @@ -1,40 +0,0 @@ -# DCC module - -This modules performs [DCC](http://www.dcc-servers.net/dcc/) lookups to determine -the *bulkiness* of a message (e.g. how many recipients have seen it). - -Identifying bulk messages is very useful in composite rules e.g. if a message is -from a freemail domain *AND* the message is reported as bulk by DCC then you can -be sure the message is spam and can assign a greater weight to it. - -Please view the License terms on the DCC website before you enable this module. - -## Module configuration - -This module requires that you have the `dccifd` daemon configured, running and -working correctly. To do this you must download and build the [latest DCC client] -(https://www.dcc-servers.net/dcc/source/dcc.tar.Z). Once installed, edit -`/var/dcc/dcc_conf` set `DCCIFD_ENABLE=on` and set `DCCM_LOG_AT=NEVER` and -`DCCM_REJECT_AT=MANY`, then start the daemon by running `/var/dcc/libexec/rcDCC start`. - -Once the `dccifd` daemon is started it will listen on the UNIX domain socket /var/dcc/dccifd -and all you have to do is tell the rspamd where `dccifd` is listening: - -~~~ucl -dcc { - host = "/var/dcc/dccifd"; - # Port is only required if `dccifd` listens on a TCP socket - # port = 1234 -} -~~~ - -Once this module is configured it will write the DCC output to the rspamd as each -message is scanned: - -````` -Apr 5 14:19:53 mail1-ewh rspamd: (normal) lua; dcc.lua:98: sending to dcc: client=217.78.2.204#015DNSERROR helo="003b046f.slimabs.top" envfrom="23SecondAbs@slimabs.top" envrcpt="xxxx@xxxx.com" -Apr 5 14:19:53 mail1-ewh rspamd: (normal) lua; dcc.lua:65: DCC result=R disposition=R header="X-DCC--Metrics: xxxxx.xxxx.com 1282; bulk Body=1 Fuz1=1 Fuz2=many" -````` - -Any messages that DCC returns a *reject* result for (based on the configured `DCCM_REJECT_AT` -value) will cause the symbol `DCC_BULK` to fire. diff --git a/doc/markdown/modules/dkim.md b/doc/markdown/modules/dkim.md deleted file mode 100644 index 48e589386..000000000 --- a/doc/markdown/modules/dkim.md +++ /dev/null @@ -1,32 +0,0 @@ -# DKIM module - -This module checks [DKIM](http://www.dkim.org/) signatures for emails scanned. -DKIM signatures can establish that this specific message has been signed by a trusted -relay. For example, if a message comes from `gmail.com` then a valid DKIM signature -means that this message was definitely signed by `gmail.com` (unless gmail.com private -key has been compromised, which is not a likewise case). - -## Principles of work - -Rspamd can deal with many types of DKIM signatures and messages canonicalisation. -The major difficulty with DKIM are line endings: many MTA treat them differently which -leads to broken signatures. Basically, rspamd treats all line endings as `CR+LF` that -is compatible with the most of DKIM implementations. - -## Configuration - -DKIM module has several useful configuration options: - -- `dkim_cache_size` (or `expire`) - maximum size of DKIM keys cache -- `whitelist` - a map of domains that should not be checked with DKIM (e.g. if that domains have totally broken DKIM signer) -- `domains` - a map of domains that should have more strict scores for DKIM violation -- `strict_multiplier` - multiply the value of symbols by this value if received from `domains` map -- `trusted_only` - do not check DKIM signatures for all domains but those which are from the `domains` map -- `skip_multi` - skip DKIM check for messages with multiple signatures - -The last option can help for some circumstances when rspamd lacks the proper support of -multiple DKIM signatures. Unfortunately, with some mailing lists, or other software -this option could be useful to reduce false positives rate as rspamd deals with -multiple signatures poorly: it just uses the first one to check. On the other hand, -the proper support of multiple DKIM signatures is planned to be implemented in rspamd -in the next releases, which will make this option meaningless. \ No newline at end of file diff --git a/doc/markdown/modules/dmarc.md b/doc/markdown/modules/dmarc.md deleted file mode 100644 index 7bec587ec..000000000 --- a/doc/markdown/modules/dmarc.md +++ /dev/null @@ -1,48 +0,0 @@ -# DMARC module - -DMARC is a technology leveraging SPF & DKIM which allows domain owners to publish policies regarding how messages bearing -their domain in the RFC5322.From field should be handled (for example to quarantine or reject messages which do not have an -aligned DKIM or SPF identifier) and to elect to receive reporting information about such messages (to help them identify -abuse and/or misconfiguration and make informed decisions about policy application). - -## DMARC in rspamd - -The default configuration for the DMARC module in rspamd is an empty collection: - -~~~ucl -dmarc { -} -~~~ - -This is enough to enable the module and check/apply DMARC policies. - -Symbols added by the module are as follows: - -- `DMARC_POLICY_ALLOW`: Message was authenticated & allowed by DMARC policy -- `DMARC_POLICY_REJECT`: Authentication failed- rejection suggested by DMARC policy -- `DMARC_POLICY_QUARANTINE`: Authentication failed- quarantine suggested by DMARC policy -- `DMARC_POLICY_SOFTFAIL`: Authentication failed- no action suggested by DMARC policy - -Rspamd is able to store records in `redis` which could be used to generate DMARC aggregate reports but there is as of yet no available tool to generate such reports from these. Format of the records stored in `redis` is as follows: - - unixtime,ip,spf_result,dkim_result,dmarc_disposition - -where spf and dkim results are `true` or `false` indicating wether an aligned spf/dkim identifier was found and dmarc_disposition is one of `none`/`quarantine`/`reject` indicating policy applied to the message. - -These records are added to a list named $prefix$domain where $domain is the domain which defined policy for the message being reported on and $prefix is the value of the `key_prefix` setting (or "dmarc_" if this isn't set). - -Keys are inserted to redis servers when a server is selected by hash value from sender's domain. - -To enable storing of report information, `reporting` must be set to `true`. - -~~~ucl -dmarc { - # Enables storing reporting information to redis - reporting = true; - # If Redis server is not configured below, settings from redis {} will be used - #servers = "127.0.0.1:6379"; # Servers to use for reads and writes (can be a list) - # Alternatively set read_servers / write_servers to split reads and writes - # To set custom prefix for redis keys: - #key_prefix = "dmarc_"; -} -~~~ diff --git a/doc/markdown/modules/emails.md b/doc/markdown/modules/emails.md deleted file mode 100644 index e69de29bb..000000000 diff --git a/doc/markdown/modules/fann.md b/doc/markdown/modules/fann.md deleted file mode 100644 index b91e35da3..000000000 --- a/doc/markdown/modules/fann.md +++ /dev/null @@ -1,40 +0,0 @@ -# Neural network module - -Neural network module is an experimental module that allows to perform post-classification of messages based on their current symbols and some training corpus obtained from the previous learns. - -To use this module, you need to build rspamd with `libfann` support. It is normally enabled if you use pre-built packages, however, it could be specified using `-DENABLE_FANN=ON` to `cmake` command during build process. - -The idea behind this module is to learn which symbols combinations are common for spamd and which are common for ham. To achieve this goal, fann module studies log files via `log_helper` worker unless gathering some reasonable amount of log samples (`1k` by default). Neural network is learned for spam when a message has `reject` action (definite spam) and it is learned as ham when a message has negative score. You could also use your own criteria for learning. - -Training is performed in background and after some amount of trains (`1k` again) neural network is updated on the disk allowing scanners to load and update their own data. - -After some amount of such iterations (`100` by default), the training process removes old neural network and starts training new one. This is done to ensure that old data does not influence on the current processing. The neural network is also reset when you add or remove rules from rspamd. Once trained, neural network data is saved into file so it could persist between restarts. The current training epoch is however vanished upon restart. - -## Configuration - -First of all, you need a special worker called `log_helper` to accept rspamd scan results. This logger has a trivial setup: - -~~~ucl -worker "log_helper" { - count = 1; -} -~~~ - -Then you'd need to setup fann plugin: - -~~~ucl -fann_scores { - fann_file = "${DBDIR}/data.fann"; # Used to store ANN file on disk - train { - max_train = 10k; # Number of trains per epoch - max_epoch = 1k # Number of epoch while ANN data is valid - spam_score = 8; # Score to learn spam - ham_score = -2; # Score to learn ham - } - use_settings = false; # If enabled, then settings-id could switch this module to another FANN -} -~~~ - -## Settings usage - -TODO \ No newline at end of file diff --git a/doc/markdown/modules/forged_recipients.md b/doc/markdown/modules/forged_recipients.md deleted file mode 100644 index e69de29bb..000000000 diff --git a/doc/markdown/modules/fuzzy_check.md b/doc/markdown/modules/fuzzy_check.md deleted file mode 100644 index 13e8c6878..000000000 --- a/doc/markdown/modules/fuzzy_check.md +++ /dev/null @@ -1,163 +0,0 @@ -# Fuzzy check module - -This module is intended to check messages for specific fuzzy patterns stored in -[fuzzy storage workers](../workers/fuzzy_storage.md). At the same time, this module -is responsible for learning fuzzy storage with message patterns. - -## Fuzzy patterns - -Rspamd uses `shingles` algorithm to perform fuzzy match of messages. This algorithm -is probabilistic and uses words chains to detect some common patterns and filter -thus spam or ham messages. Shingles algorithm is described in the following -[research paper](http://dl.acm.org/citation.cfm?id=283370). We use 3-gramms for this -algorithm and [siphash](https://131002.net/siphash/) for hash function. Currently, -rspamd uses 32 hashes for shingles. Using of siphash allows private storages to be -used, since nobody can generate the same sequence of hashes without some shared -secret called `shingles key`. By default, rspamd uses the string `rspamd` as siphash -key, however, it is possible change this value from the configuration. - -Each shingles set is accompanied by a collision resistant hash, namely [blake2](https://blake2.net/) hash. -This digest is used as unique ID of the hash. - -Attachements and images are not currently matched against fuzzy hashes, however they -are checked by means blake2 digests using strict match. - -## Module configuration - -Fuzzy check module has several global options and allows to specify multiple match -storages. Global options include: - -- `symbol`: default symbol to insert (if no flags matches) -- `min_length`: minimum length of text parts in words to perform fuzzy check (default - check all text parts) -- `min_bytes`: minimum lenght of attachements and images in bytes to check them in fuzzy storage -- `whitelist`: IP list to skip all fuzzy checks -- `timeout`: timeout for reply waiting - -Fuzzy rules are defined as a set of `rule` definitions. Each `rule` must have servers -list to check or learn and a set of flags and optional parameters. Here is an example of -rule's settings: - -~~~ucl -fuzzy_check { - rule { - # List of servers, can be an array or multi-value item - servers = "localhost:11335"; - servers = "highsecure.ru:11335"; - - # Default symbol - symbol = "FUZZY_UNKNOWN"; - - # List of additional mime types to be checked in this fuzzy - mime_types = "application/pdf"; - mime_types = ["application/*", "*/octet-stream", "*"]; - - # Maximum global score for all maps - max_score = 20.0; - - # Ignore flags that are not listed in maps for this rule - skip_unknown = yes; - - # If this value is false, then allow learning for this fuzzy rule - read_only = no; - - # Key for strict digests (default: "rspamd") - fuzzy_key = "somebigrandomstring"; - - # Key for fuzzy siphash (default: "rspamd") - fuzzy_shingles_key = "anotherbigrandomstring"; - - # maps - } -} -~~~ - -Each rule can have several maps defined by a `flag` value. For example, a single -fuzzy storage can contain both good and bad hashes that should have different symbols -and thus different weights. Maps are defined inside fuzzy rules as following: - -~~~ucl -fuzzy_check { - rule { - ... - fuzzy_map = { - FUZZY_DENIED { - # Maximum weight for this list - max_score = 20.0; - # Flag value - flag = 1 - } - FUZZY_PROB { - max_score = 10.0; - flag = 2 - } - FUZZY_WHITE { - max_score = 2.0; - flag = 3 - } - } -} -~~~ - -The meaning of `max_score` can be rather unclear. First of all, all hashes in -fuzzy storage have their own weights. For example, if we have a hash `A` and 100 users -marked it as spam hash, then it will have weight of `100 * single_vote_weight`. -Therefore, if a `single_vote_weight` is `1` then the final weight will be `100` indeed. -`max_score` means the weight that is required for the rule to add symbol with the maximum -score 1.0 (that will be of course multiplied by metric's weigth). In our example, -if the weight of hash is `100` and `max_score` will be `99`, then the rule will be -added with the weight of `1`. If `max_score` is `200`, then the rule will be added with the -weight likely `0.2` (the real function is hyperbolic tangent). In the following configuration: - -~~~ucl -metric { - name = "default"; - ... - symbol { - name = "FUZZY_DENIED"; - weght = "10.0"; - } - ... -} -fuzzy_check { - rule { - ... - fuzzy_map = { - FUZZY_DENIED { - # Maximum weight for this list - max_score = 20.0; - # Flag value - flag = 1 - } - ... - } -} -~~~ - -If a hash has value `10`, then a symbol `FUZZY_DENIED` with weight of `2.0` will be added. -If a hash has value `100500`, then `FUZZY_DENIED` will have weight `10.0`. - -## Learning fuzzy_check - -Module `fuzzy_check` also allows to learn messages. You can use `rspamc` command or -connect to the **controller** worker using HTTP protocol. For learning you must check -the following settings: - -1. Controller worker should be accessible by `rspamc` or HTTP (check `bind_socket`) -2. Controller should allow privilleged commands for this client (check `enable_password` or `allow_ip` settings) -3. Controller should have `fuzzy_check` module configured to the servers specified -4. You should know `fuzzy_key` and `fuzzy_shingles_key` to operate with this storage -5. Your `fuzzy_check` module should have `fuzzy_map` configured to the flags used by server -6. Your `fuzzy_check` rule must have `read_only` option being turned off - `read_only = false` -7. Your `fuzzy_storage` worker should allow updates from the controller's host (`allow_update` option) -8. Your controller should be able to communicate with fuzzy storage by means of `UDP` protocol - -If all these conditions are met then you can learn messages with rspamc: - - rspamc -w -f fuzzy_add ... - -or delete hashes: - - rspamc -f fuzzy_del ... - -On learning, rspamd sends commands to **all** servers inside specific rule. On check, -rspamd selects a server in round-robin matter. diff --git a/doc/markdown/modules/index.md b/doc/markdown/modules/index.md deleted file mode 100644 index afb440a8e..000000000 --- a/doc/markdown/modules/index.md +++ /dev/null @@ -1,70 +0,0 @@ -# Rspamd modules - -Rspamd ships with a set of modules. Some modules are written in C to speedup -complex procedures while others are written in lua to reduce code size. -Actually, new modules are encouraged to be written in lua and add the essential -support to the Lua API itself. Truly speaking, lua modules are very close to -C modules in terms of performance. However, lua modules can be written and loaded -dynamically. - -## C Modules - -C modules provides core functionality of rspamd and are actually statically linked -to the main rspamd code. C modules are defined in the `options` section of rspamd -configuration. If no `filters` attribute is defined then all modules are disabled. -The default configuration enables all modules explicitly: - -~~~ucl -filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check"; -~~~ - -Here is the list of C modules available: - -- [regexp](regexp.md): the core module that allow to define regexp rules, -rspamd internal functions and lua rules. -- [surbl](surbl.md): this module extracts URLs from messages and check them against -public DNS black lists to filter messages with malicious URLs. -- [spf](spf.md): checks SPF records for messages processed. -- [dkim](dkim.md): performs DKIM signatures checks. -- [dmarc](dmarc.md): performs DKIM signatures checks. -- [fuzzy_check](fuzzy_check.md): checks messages fuzzy hashes against public blacklists. -- [chartable](chartable.md): checks character sets of text parts in messages. - -## Lua modules - -Lua modules are dynamically loaded on rspamd startup and are reloaded on rspamd -reconfiguration. Should you want to write a lua module consult with the -[Lua API documentation](../lua/). To define path to lua modules there is a special section -named `modules` in rspamd: - -~~~ucl -modules { - path = "/path/to/dir/"; - path = "/path/to/module.lua"; - path = "$PLUGINSDIR/lua"; -} -~~~ - -If a path is a directory then rspamd scans it for `*.lua" pattern and load all -files matched. - -Here is the list of Lua modules shipped with rspamd: - -- [multimap](multimap.md) - a complex module that operates with different types -of maps. -- [rbl](rbl.md) - a plugin that checks messages against DNS blacklist based on -either SMTP FROM addresses or on information from `Received` headers. -- [emails](emails.md) - extract emails from a message and checks it against DNS -blacklists. -- [maillist](maillist.md) - determines the common mailing list signatures in a message. -- [once_received](once_received.md) - detects messages with a single `Received` headers -and performs some additional checks for such messages. -- [phishing](phishing.md) - detects messages with phished URLs. -- [ratelimit](ratelimit.md) - implements leaked bucket algorithm for ratelimiting and -uses `redis` to store data. -- [trie](trie.md) - uses suffix trie for extra-fast patterns lookup in messages. -- [mime_types](mime_types.md) - applies some rules about mime types met in messages -- [rspamd_update](rspamd_update.md) - load dynamic rules and other rspamd updates -- [spamassassin](spamassassin.md) - load spamassassin rules -- [dmarc](dmarc.md) - performs DMARC policy checks -- [dcc](dcc.md) - performs [DCC](http://www.dcc-servers.net/dcc/) lookups to determine message bulkiness diff --git a/doc/markdown/modules/maillist.md b/doc/markdown/modules/maillist.md deleted file mode 100644 index 7563eae24..000000000 --- a/doc/markdown/modules/maillist.md +++ /dev/null @@ -1,15 +0,0 @@ -# Mail list module - -Mailing list module is a simple module that performs checks whether a message is -sent over some popular mailing lists software. This module is designed to negate -some rules as they are likely to be touched unnecessarily if a message comes from -some list. - -Here is a list of currently supported mailing lists programs: - -- Ezmlm -- Mailman -- Google groups -- Majordomo -- Communigate PRO mailing lists -- subscribe.ru mailing list \ No newline at end of file diff --git a/doc/markdown/modules/mime_types.md b/doc/markdown/modules/mime_types.md deleted file mode 100644 index 4910fcae9..000000000 --- a/doc/markdown/modules/mime_types.md +++ /dev/null @@ -1,30 +0,0 @@ -# Rspamd mime types module - -This module is intended to do some mime types sanity checks. That includes the following: - -1. Checks whether mime type is from the `good` list (e.g. `multipart/alternative` or `text/html`) -2. Checks if a mime type is from the `bad` list (e.g. `multipart/form-data`) -3. Checks if an attachement filename extension is different from the intended mime type - -## Configuration - -`mime_types` module reads mime types map specified in `file` option. This map contains binding - -``` -type/subtype score -``` - -When score is more than `0` then it is considered as `bad` if it is less than `0` it is considered as `good` (with the corresponding multiplier). -When mime type is not listed then `MIME_UNKNOWN` symbol is inserted. - -`extension_map` option allows to specify map from a known extension to a specific mime type: - -~~~ucl -extension_map = { - html = "text/html"; - txt = "text/plain"; - pdf = "application/pdf"; -} -~~~ - -When an attachement extension matches left part but the content type does not match the right part then symbol `MIME_BAD_ATTACHMENT` is inserted. diff --git a/doc/markdown/modules/multimap.md b/doc/markdown/modules/multimap.md deleted file mode 100644 index cede3bc94..000000000 --- a/doc/markdown/modules/multimap.md +++ /dev/null @@ -1,162 +0,0 @@ -# Multimap module - -Multimap module is designed to handle rules that are based on different types of maps. - -## Principles of work - -Maps in rspamd are the files or HTTP links that are automatically monitored and reloaded -if changed. For example, maps can be defined as following: - - "http://example.com/file" - "file:///etc/rspamd/file.map" - "/etc/rspamd/file.map" - -Rspamd respects `304 Not Modified` reply from HTTP server allowing to save traffic -when a map has not been actually changed since last load. For file maps, rspamd uses normal -`mtime` attribute (time modified). The global map watching settings are defined in the -`options` section of the configuration file: - -* `map_watch_interval`: defines time when all maps are rescanned; the actual check interval is jittered to avoid simultaneous checking (hence, the real interval is from this value up to the this interval doubled). - -Multimap module allows to build rules based on the dynamic maps content. Rspamd supports the following -map types in this module: - -* `hash map` - a list of domains or `user@domain` -* `regexp map` - a list of regular expressions -* `ip map` - an effective radix trie of `ip/mask` values (supports both IPv4 and IPv6 addresses) -* `cdb` - constant database format (files only) - -Multimap has different message attributes to be checked via maps. - - -Multimap can also be used for pre-filtering of message: so if map matches then no further checks will be performed. This feature is particularly useful for whitelisting, blacklisting and allows to save scan resources. To enable this mode just add `action` option to the map configuration (see below). - -## Configuration - -The module itself contains a set of rules in form: - - symbol { type = type; map = uri; [optional params] } - -### Map types - -Type attribute means what is matched with this map. The following types are supported: - -* `ip` - matches source IP of message (radix map) -* `from` - matches envelope from (or header `From` if envelope from is absent) -* `rcpt` - matches any of envelope rcpt or header `To` if envelope info is missing -* `header` - matches any header specified (must have `header = "Header-Name"` configuration attribute) -* `dnsbl` - matches source IP against some DNS blacklist (consider using [RBL](rbl.md) module for this) -* `url` - matches URLs in messages against maps -* `filename` - matches attachment filename against map - -DNS maps are legacy and are not encouraged to use in new projects (use [rbl](rbl.md) for that). - -Maps can also be specified as [CDB](http://www.corpit.ru/mjt/tinycdb.html) databases which might be useful for large maps: - - map = "cdb:///path/to/file.cdb"; - -### Pre-filter maps - -To enable pre-filter support, you should specify `action` parameter which can take the -following values: - -* `accept` - accept a message (no action) -* `add header` or `add_header` - adds a header to message -* `rewrite subject` or `rewrite_subject` - change subject -* `greylist` - greylist message -* `reject` - drop message - -No filters will be processed for a message if such a map matches. - -~~~ucl -multimap { - test { type = "ip"; map = "/tmp/ip.map"; symbol = "TESTMAP"; } - spamhaus { type = "dnsbl"; map = "pbl.spamhaus.org"; symbol = "R_IP_PBL"; - description = "PBL dns block list"; } # Better use RBL module instead -} -~~~ - -### Regexp maps - - -All maps but `ip` and `dnsbl` support `regexp` mode. In this mode, all keys in maps are treated as regular expressions, for example: - - /example\d+\.com/i - /other\d+\.com/i test - # Comments are still enabled - -For performance considerations, use only expressions supported by [hyperscan](http://01org.github.io/hyperscan/dev-reference/compilation.html#pattern-support) as this engine provides blazing performance at no additional cost. Currently, there is no way to distinguish what particular regexp was matched in case if multiple regexp were matched. - -To enable regexp mode, you should set `regexp` option to `true`: - -~~~ucl -sender_from_whitelist_user { - type = "from"; - map = "file:///tmp/from.map"; - symbol = "SENDER_FROM_WHITELIST"; - regexp = true; -} -~~~ - -### Map filters - -It is also possible to apply a filtering expression before checking value against some map. This is mainly useful -for `header` rules. Filters are specified with `filter` option. Rspamd supports the following filters so far: - -* `email` or `email:addr` - parse header value and extract email address from it (`Somebody ` -> `user@example.com`) -* `email:user` - parse header value as email address and extract user name from it (`Somebody ` -> `user`) -* `email:domain` - parse header value as email address and extract user name from it (`Somebody ` -> `example.com`) -* `email:name` - parse header value as email address and extract displayed name from it (`Somebody ` -> `Somebody`) -* `regexp:/re/` - extracts generic information using the specified regular expression - -URL maps allows another set of filters (by default, url maps are matched using hostname part): - -* `tld` - matches TLD (top level domain) part of urls -* `full` - matches the complete URL not the hostname -* `is_phished` - matches hostname but if and only if the URL is phished (e.g. pretended to be from another domain) -* `regexp:/re/` - extracts generic information using the specified regular expression from the hostname -* `tld:regexp:/re/` - extracts generic information using the specified regular expression from the TLD part -* `full:regexp:/re/` - extracts generic information using the specified regular expression from the full URL text - -Filename maps support this filters set: - -* `extension` - matches file extension -* `regexp:/re/` - extract data from filename according to some regular expression - -Here are some examples of pre-filter configurations: - -~~~ucl -sender_from_whitelist_user { - type = "from"; - filter = "email:user"; - map = "file:///tmp/from.map"; - symbol = "SENDER_FROM_WHITELIST_USER"; - action = "accept"; # Prefilter mode -} -sender_from_regexp { - type = "header"; - header = "from"; - filter = "regexp:/.*@/"; - map = "file:///tmp/from_re.map"; - symbol = "SENDER_FROM_REGEXP"; -} -url_map { - type = "url"; - filter = "tld"; - map = "file:///tmp/url.map"; - symbol = "URL_MAP"; -} -url_tld_re { - type = "url"; - filter = "tld:regexp:/\.[^.]+$/"; # Extracts the last component of URL - map = "file:///tmp/url.map"; - symbol = "URL_MAP_RE"; -} -filename_blacklist { - type = "filename"; - filter = "extension"; - map = "/${LOCAL_CONFDIR}/filename.map"; - symbol = "FILENAME_BLACKLISTED"; - action = "reject"; -} -~~~ diff --git a/doc/markdown/modules/once_received.md b/doc/markdown/modules/once_received.md deleted file mode 100644 index cb91522a5..000000000 --- a/doc/markdown/modules/once_received.md +++ /dev/null @@ -1,22 +0,0 @@ -# Once received module - -This module is intended to do simple checks for mail with one `Received` header. The idea behind these checks is that legitimate mail likely has more than one received and some bad patterns, such as `dynamic` or `broadband` are common for spam from hacked users' machines. - -## Configuration - -The configuration of this module is pretty straightforward: specify `symbol` for generic one received mail, specify `symbol_strict` for emails with bad patterns or with unresolvable hostnames and add **good** and **bad** patterns. Patterns can contain [lua patterns](http://lua-users.org/wiki/PatternsTutorial). `good_host` lines are used to negate this module for certain hosts, `bad_host` lines are used to specify certain bad patterns. It is also possible to specify `whitelist` to define a list of networks for which `once_received` checks should be excluded. - -## Example - -~~~ucl -once_received { - good_host = "^mail"; - bad_host = "static"; - bad_host = "dynamic"; - symbol_strict = "ONCE_RECEIVED_STRICT"; - symbol = "ONCE_RECEIVED"; - whitelist = "/tmp/ip.map"; -} -~~~ - -IP map can contain, as usually, IP's (both v4 and v6), networks (in CIDR notation) and optional comments starting from `#` symbol. diff --git a/doc/markdown/modules/phishing.md b/doc/markdown/modules/phishing.md deleted file mode 100644 index 55884f287..000000000 --- a/doc/markdown/modules/phishing.md +++ /dev/null @@ -1,114 +0,0 @@ -# Phishing module - -This module is designed to report about potentially phished URL's. - -## Principles of phishing detection - -Rspamd tries to detect phished URL's merely in HTML text parts. First, -it get URL from `href` or `src` attribute and then tries to find the text enclosed -within this link tag. If some url is also enclosed in the specific tag then -rspamd decides to compare whether these two URL's are related, namely if they -belong to the same top level domain. Here are examples of urls that are considered -to be non-phished: - - http://example.com/other - http://example.com/ - -And the following URLs are considered as phished: - - http://example.co.uk - http://example.com - http://example.com - -## Configuration of phishing module - -Here is an example of full module configuration. - -~~~ucl -phishing { - symbol = "R_PHISHING"; # Default symbol - - # Check only domains from this list - domains = "file:///path/to/map"; - - # Make exclusions for known redirectors - # Entry format: URL/path for map, colon, name of symbol - redirector_domains = [ - "${CONFDIR}/redirectors.map:REDIRECTOR_FALSE" - ]; - # For certain domains from the specified strict maps - # use another symbol for phishing plugin - strict_domains = [ - "${CONFDIR}/paypal.map:PAYPAL_PHISHING" - ]; -} -~~~ - -If an anchoring (actual as opposed to phished) domain is found in a map -referenced by the `redirector_domains` setting then the related symbol is -yielded and the URL is not checked further. This allows making exclusions -for known redirectors, especially ESPs. - -Further to this, if the phished domain is found in a map referenced by -`strict_domains` the related symbol is yielded and the URL not checked -further. This allows fine-grained control to avoid false positives and -enforce some really bad phishing mails, such as bank phishing or other -payments system phishing. - -Finally, the default symbol is yielded- if `domains` is specified then -only if the phished domain is found in the related map. - -Maps for this module can consist of effective second level domain parts (eSLD) -or whole domain parts of the URLs (FQDN) as well. - -## Openphish support - -Since version 1.3, there is [openphish](https://openphish.com) support in rspamd. -Now rspamd loads this public feed as a map (using HTTPS) and checks URLs in messages using -openphish list. If any match is found, then rspamd adds symbol `PHISHED_OPENPHISH`. - -If you use research or commercial data feed, rspamd can also use its data and gives -more details about URLs found: their sector (e.g. 'Finance'), brand name (e.g. -'Bank of Zimbabwe') and other useful information. - -There are couple of options available to configure openphish module: - -~~~ucl -phishing { - # URL of feed, default is public url: - openphish_map = "https://www.openphish.com/feed.txt"; - # For premium feed, change that to your personal URL, e.g. - # openphish_map = "https://openphish.com/samples/premium_feed.json"; - - # Change this to true if premium feed is enabled - openphish_premium = false; -} -~~~ - -## Phishtank support - -There is also [phishtank](https://phishtank.com) support in rspamd since 1.3. Unlike -openphish feed, phishtank's one is not enabled by default since it has quite a big size (about 50Mb) so -you might want to setup some reverse proxy (e.g. nginx) to cache that data among rspamd instances: - -~~~nginx -proxy_cache_path /data/nginx/cache levels=1:2 keys_zone=phish:10m; - -server { - listen 8080; - location / { - proxy_pass http://data.phishtank.com:80; - proxy_cache phish; - proxy_cache_lock on; - } -} -~~~ - - -To enable phishtank feed, you can edit `local.d/phishing.conf` file and add the following lines there: - -~~~ucl -phishtank_enabled = true; -# Where nginx is installed -phishtank_map = "http://localhost:8080/data/online-valid.json"; -~~~ diff --git a/doc/markdown/modules/ratelimit.md b/doc/markdown/modules/ratelimit.md deleted file mode 100644 index c36017461..000000000 --- a/doc/markdown/modules/ratelimit.md +++ /dev/null @@ -1,95 +0,0 @@ -# Ratelimit plugin - -Ratelimit plugin is designed to limit messages coming from certain senders, to -certain recipients from certain IP addresses combining these parameters into -a separate limits. - -All limits are stored in [redis](http://redis.io) server (or servers cluster) to enable -shared cache between different scanners. - -## Module configuration - -In the default configuration, there are no cache servers specified, hence, the module won't work unless you add this option to the configuration. - -`Ratelimit` module supports the following configuration options: - -- `servers` - list of servers where ratelimit data is stored -- `whitelisted_rcpts` - comma separated list of whitelisted recipients. By default -the value of this option is 'postmaster, mailer-daemon' -- `whitelisted_ip` - a map of ip addresses or networks whitelisted -- `max_rcpts` - do not apply ratelimit if it contains more than this value of recipients (5 by default). This -option allows to avoid too many work for setting buckets if there are a lot of recipients in a message). -- `max_delay` - maximum lifetime for any limit bucket (1 day by default) -- `rates` - a table of allowed rates in form: - - type = [burst,leak]; - -Where `type` is one of: - -- `to` -- `to_ip` -- `to_ip_from` -- `bounce_to` -- `bounce_to_ip` - -`burst` is a capacity of a bucket and `leak` is a rate in messages per second. -Both these attributes are floating point values. - -- `symbol` - if this option is specified, then `ratelimit` plugin just adds the corresponding symbol instead of setting pre-result, the value is scaled as $$ 2 * tanh(\frac{bucket}{threshold * 2}) $$, where `tanh` is the hyperbolic tanhent function - -## Principles of work - -The basic principle of ratelimiting in rspamd is called `leaked bucket`. It could -be visually represented as a bucket that has some capacity, and a small hole in a bottom. -Messages comes to this bucket and leak through the hole over time (it doesn't delay messages, just count them). If the capacity of -a bucket is exhausted, then a temporary reject is sent. This happens unless the capacity -of bucket is enough to accept more messages (and since messages are leaking then after some -time, it will be possible to process new messages). - -Rspamd uses 3 types of limit buckets: - -- `to` - a bucket based on a recipient only -- `to:ip` - a bucket combining a recipient and a sender's IP -- `to:from:ip` - a bucket combining a recipient, a sender and a sender's IP - -For bounce messages there are special buckets that lack `from` component and have more -restricted limits. Rspamd treats the following senders as bounce senders: - -- 'postmaster', -- 'mailer-daemon' -- '' (empty sender) -- 'null' -- 'fetchmail-daemon' -- 'mdaemon' - -Each recipient has its own triple of buckets, hence it is useful -to limit number of recipients to check. - -Each bucket has two parameters: -- `capacity` - how many messages could go into a bucket before a limit is reached -- `leak` - how many messages per second are leaked from a bucket. - -For example, a bucket with capacity `100` and leak `1` can accept up to 100 messages but then -will accept not more than a message per second. - -By default, ratelimit module has the following settings which disable all limits: - -~~~lua --- Default settings for limits, 1-st member is burst, second is rate and the third is numeric type -local settings = { - -- Limit for all mail per recipient (burst 100, rate 2 per minute) - to = {0, 0.033333333}, - -- Limit for all mail per one source ip (burst 30, rate 1.5 per minute) - to_ip = {0, 0.025}, - -- Limit for all mail per one source ip and from address (burst 20, rate 1 per minute) - to_ip_from = {0, 0.01666666667}, - - -- Limit for all bounce mail (burst 10, rate 2 per hour) - bounce_to = {0, 0.000555556}, - -- Limit for bounce mail per one source ip (burst 5, rate 1 per hour) - bounce_to_ip = {0, 0.000277778}, - - -- Limit for all mail per user (authuser) (burst 20, rate 1 per minute) - user = {0, 0.01666666667} -} -~~~ diff --git a/doc/markdown/modules/rbl.md b/doc/markdown/modules/rbl.md deleted file mode 100644 index b7e73a1d1..000000000 --- a/doc/markdown/modules/rbl.md +++ /dev/null @@ -1,116 +0,0 @@ -# RBL module - -The RBL module provides support for checking the IPv4/IPv6 source address of a message's sender against a set of RBLs as well as various less conventional methods of using RBLs: against addresses in Received headers; against the reverse DNS name of the sender and against the parameter used for HELO/EHLO at SMTP time. - -Configuration is structured as follows: - -~~~ucl -rbl { - # default settings defined here - rbls { - # 'rbls' subsection under which the RBL definitions are nested - an_rbl { - # rbl-specific subsection - } - # ... - } -} -~~~ - -The default settings define the ways in which the RBLs are used unless overridden in an RBL-specific subsection. - -Defaults may be set for the following parameters (default values used if these are not set are shown in brackets - note that these may be redefined in the default config): - -- default_ipv4 (true) - -Use this RBL to test IPv4 addresses. - -- default_ipv6 (false) - -Use this RBL to test IPv6 addresses. - -- default_received (true) - -Use this RBL to test IPv4/IPv6 addresses found in Received headers. The RBL should also be configured to check one/both of IPv4/IPv6 addresses. - -- default_from (false) - -Use this RBL to test IPv4/IPv6 addresses of message senders. The RBL should also be configured to check one/both of IPv4/IPv6 addresses. - -- default_rdns (false) - -Use this RBL to test reverse DNS names of message senders (hostnames passed to rspamd should have been validated with a forward lookup, particularly if this is to be used to provide whitelisting). - -- default_helo (false) - -Use this RBL to test parameters sent for HELO/EHLO at SMTP time. - -- default_dkim (false) - -Use this RBL to test domains found in validated DKIM signatures. - -- default_dkim_domainonly (true) - -If true test top-level domain only, otherwise test entire domain found in DKIM signature. - -- default_emails (false) - -Use this RBL to test email addresses in form [localpart].[domainpart].[rbl] or if set to "domain_only" uses [domainpart].[rbl]. - -- default_unknown (false) - -If set to false, do not yield a result unless the response received from the RBL is defined in its related returncodes {} subsection, else return the default symbol for the RBL. - -- default_exclude_users (false) - -If set to true, do not use this RBL if the message sender is authenticated. - -- default_exclude_private_ips (true) - -If true, do not use the RBL if the sending host address is in `local_addrs` & do not check received headers baring these addresses. - -- default_exclude_local (true) - -If true & local_exclude_ip_map has been set - do not use the RBL if the sending host address is in the local IP list & do not check received headers baring these addresses. - -- default_is_whitelist (false) - -If true matches on this list should neutralise any listings where this setting is false and ignore_whitelists is not true. - -- default_ignore_whitelists (false) - -If true this list should not be neutralised by whitelists. - -Other parameters which can be set here are: - -- local_exclude_ip_map - -Can be set to a URL of a list of IPv4/IPv6 addresses & subnets not to be considered as local exclusions by exclude_local checks. - -RBL-specific subsection is structured as follows: - -~~~ucl -# Descriptive name of RBL or symbol if symbol is not defined. -an_rbl { - # Explicitly defined symbol - symbol = "SOME_SYMBOL"; - # RBL-specific defaults (where different from global defaults) - #The global defaults may be overridden using 'helo' to override 'default_helo' and so on. - ipv6 = true; - ipv4 = false; - # Address used for RBL-testing - rbl = "v6bl.example.net"; - # Possible responses from RBL and symbols to yield - returncodes { - # Name_of_symbol = "address"; - EXAMPLE_ONE = "127.0.0.1"; - EXAMPLE_TWO = "127.0.0.2"; - } -} -~~~ - -The following extra settings are valid in the RBL subsection: - -- whitelist_exception - -(For whitelists) - Symbols named as parameters for this setting will not be used for neutralising blacklists (set this multiple times to add multiple exceptions). diff --git a/doc/markdown/modules/regexp.md b/doc/markdown/modules/regexp.md deleted file mode 100644 index 01d7a0635..000000000 --- a/doc/markdown/modules/regexp.md +++ /dev/null @@ -1,146 +0,0 @@ -# Rspamd regexp module - -This is a core module that deals with regexp expressions to filter messages. - -## Principles of work - -Regexp module operates with `expressions` - a logical sequence of different `atoms`. Atoms -are elements of the expression and could be represented as regular expressions, rspamd -functions and lua functions. Rspamd supports the following operators in expressions: - -* `&&` - logical AND (can be also written as `and` or even `&`) -* `||` - logical OR (`or` `|`) -* `!` - logical NOT (`not`) -* `+` - logical PLUS, usually used with comparisons: - - `>` more than - - `<` less than - - `>=` more or equal - - `<=` less or equal - -Whilst logical operators are clear for understanding, PLUS is not so clear. In rspamd, -it is used to join multiple atoms or subexpressions and compare them to a specific number: - - A + B + C + D > 2 - evaluates to `true` if at least 3 operands are true - (A & B) + C + D + E >= 2 - evaluates to `true` if at least 2 operands are true - -Operators has their own priorities: - -1. NOT -2. PLUS -3. COMPARE -4. AND -5. OR - -You can change priorities by braces, of course. All operations are *right* associative in rspamd. -While evaluating expressions, rspamd tries to optimize their execution time by reordering and does not evaluate -unnecessary branches. - -## Expressions components - -Rspamd support the following components within expressions: - -* Regular expressions -* Internal functions -* Lua global functions (not widely used) - -### Regular expressions - -In rspamd, regular expressions could match different parts of messages: - -* Headers (should be `Header-Name=/regexp/flags`), mime headers -* Full headers string -* Textual mime parts -* Raw messages -* URLs - -The match type is defined by special flags after the last `/` symbol: - -* `H` - header regexp -* `X` - undecoded header regexp (e.g. without quoted-printable decoding) -* `B` - MIME header regexp (applied for headers in MIME parts only) -* `R` - full headers content (applied for all headers undecoded and for the message only - **not** including MIME headers) -* `M` - raw message regexp -* `P` - part regexp without HTML tags -* `Q` - part regexp with HTML tags -* `C` - spamassassin `BODY` regexp analogue(see http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.txt) -* `D` - spamassassin `RAWBODY` regexp analogue -* `U` - URL regexp - -From 1.3, it is also possible to specify long regexp types for convenience in curly braces: - -* `{header}` - header regexp -* `{raw_header}` - undecoded header regexp (e.g. without quoted-printable decoding) -* `{mime_header}` - MIME header regexp (applied for headers in MIME parts only) -* `{all_header}` - full headers content (applied for all headers undecoded and for the message only - **not** including MIME headers) -* `{body}` - raw message regexp -* `{mime}` - part regexp without HTML tags -* `{raw_mime}` - part regexp with HTML tags -* `{sa_body}` - spamassassin `BODY` regexp analogue(see http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.txt) -* `{sa_raw_body}` - spamassassin `RAWBODY` regexp analogue -* `{url}` - URL regexp - -Each regexp also supports the following flags: - -* `i` - ignore case -* `u` - use utf8 regexp -* `m` - multiline regexp - treat string as multiple lines. That is, change "^" and "$" from matching the start of the string's first line and the end of its last line to matching the start and end of each line within the string -* `x` - extended regexp - this flag tells the regular expression parser to ignore most whitespace that is neither backslashed nor within a bracketed character class. You can use this to break up your regular expression into (slightly) more readable parts. Also, the # character is treated as a metacharacter introducing a comment that runs up to the pattern's closing delimiter, or to the end of the current line if the pattern extends onto the next line. -* `s` - dotall regexp - treat string as single line. That is, change `.` to match any character whatsoever, even a newline, which normally it would not match. Used together, as `/ms`, they let the `.` match any character whatsoever, while still allowing `^` and `$` to match, respectively, just after and just before newlines within the string. -* `O` - do not optimize regexp (rspamd optimizes regexps by default) - -### Internal functions - -Rspamd supports a set of internal functions to do some common spam filtering tasks: - -* `check_smtp_data(type[, str or /re/])` - checks for the specific envelope argument: `from`, `rcpt`, `user`, `subject` -* `compare_encoding(str or /re/)` - compares message encoding with string or regexp -* `compare_parts_distance(inequality_percent)` - if a message is multipart/alternative, compare two parts and return `true` if they are inequal more than `inequality_percent` -* `compare_recipients_distance(inequality_percent)` - check how different are recipients of a message (works for > 5 recipients) -* `compare_transfer_encoding(str or /re/)` - compares message transfer encoding with string or regexp -* `content_type_compare_param(param, str or /re/)` - compare content-type parameter `param` with string or regexp -* `content_type_has_param(param)` - return true if `param` exists in content-type -* `content_type_is_subtype(str or /re/` - return `true` if subtype of content-type matches string or regexp -* `content_type_is_type(str or /re/)`- return `true` if type of content-type matches string or regexp -* `has_content_part(type)` - return `true` if the part with the specified `type` exists -* `has_content_part_len(type, len)` - return `true` if the part with the specified `type` exists and have at least `len` lenght -* `has_fake_html()` - check if there is an HTML part in message with no HTML tags -* `has_html_tag(tagname)` - return `true` if html part contains specified tag -* `has_only_html_part()` - return `true` if there is merely a single HTML part -* `header_exists(header)` - return if a specified header exists in the message -* `is_html_balanced()` - check whether HTML part has balanced tags -* `is_recipients_sorted()` - return `true` if there are more than 5 recipients in a message and they are sorted -* `raw_header_exists()` - does the same as `header_exists` - -Many of these functions are just legacy but they are supported in terms of compatibility. - -### Lua atoms - -Lua atoms now can be lua global functions names or callbacks. This is -a compatibility feature for previously written rules. - -### Regexp objects - -From rspamd 1.0, it is possible to add more power to regexp rules by using of -table notation while writing rules. A table can have the following fields: - -- `callback`: lua callback for the rule -- `re`: regular expression (mutually exclusive with `callback` option) -- `condition`: function of task that determines when a rule should be executed -- `score`: default score -- `description`: default description -- `one_shot`: default one shot settings - -Here is an example of table form definition of regexp rule: - -~~~lua -config['regexp']['RE_TEST'] = { - re = '/test/i{mime}', - score = 10.0, - condition = function(task) - if task:get_header('Subject') then - return true - end - return false - end, -} -~~~ \ No newline at end of file diff --git a/doc/markdown/modules/replies.md b/doc/markdown/modules/replies.md deleted file mode 100644 index 6beb59133..000000000 --- a/doc/markdown/modules/replies.md +++ /dev/null @@ -1,48 +0,0 @@ -# Replies module - -This module collects the `message-id` header of messages sent by authenticated users and stores corresponding hashes to Redis, which are set to expire after a configuable amount of time (by default 1 day). Furthermore, it hashes `in-reply-to` headers of all received messages & checks for matches (ie. messages sent in response to messages our system originated)- and yields a symbol which could be used to adjust scoring or forces an action (most likely "no action" to accept) according to configuration. - - -## Configuration - -Settings for the module are described below (default values are indicated in brackets). - -- action (null) - -If set, apply the given action to messages identified as replies (would typically be set to "no action" to accept). - -- expire (86400) - -Time, in seconds, after which to expire records (default is one day). - -- key_prefix (rr) - -String prefixed to keys in Redis. - -- message (Message is reply to one we originated) - -Message passed when action is forced. - -- servers (null) - -Comma seperated list of Redis hosts - -- symbol (REPLY) - -Symbol yielded on messages identified as replies. - -## Example - -~~~ucl -replies { - # This setting is non-default & is required to be set - servers = "localhost"; - # This setting is non-default & may be desirable - action = "no action"; - # These are default settings you may want to change - expire = 86400; - key_prefix = "rr"; - message = "Message is reply to one we originated"; - symbol = "REPLY"; -} -~~~ diff --git a/doc/markdown/modules/rspamd_update.md b/doc/markdown/modules/rspamd_update.md deleted file mode 100644 index 78b004c8c..000000000 --- a/doc/markdown/modules/rspamd_update.md +++ /dev/null @@ -1,90 +0,0 @@ -# Rspamd update module - -This module allows to load rspamd rules, adjust symbols scores and actions without full daemon restart. -`rspamd_update` provides method to backport new rules and scores changing without updating rspamd itself. This might be useful, for example, if you want to use the stable version of rspamd but would like to improve filtering quality at the same time. - -## Security considerations - -Rspamd update module can execute lua code which is executed with scanner's privilleges - usually `_rspamd` or `nobody` user. Therefore, you should not use untrusted sources of updates. -Rspamd supports digital signatures to check the validity of updates downloaded using [EdDSA](http://ed25519.cr.yp.to/) signatures scheme. -For your own updates that are loaded from the filesystem or from some trusted network you might use unsigned files, however, signing is recommended even in this case. - -To sign a map you can use `rspamadm signtool` and to generate signing keypair - `rspamadm kaypair -s -u`: - -~~~ucl -keypair { - pubkey = "zo4sejrs9e5idqjp8rn6r3ow3x38o8hi5pyngnz6ktdzgmamy48y"; - privkey = "pwq38sby3yi68xyeeuup788z6suqk3fugrbrxieri637bypqejnqbipt1ec9tsm8h14qerhj1bju91xyxamz5yrcrq7in8qpsozywxy"; - id = "bs4zx9tcf1cs5ed5mt4ox8za54984frudpzzny3jwdp8mkt3feh7nz795erfhij16b66piupje4wooa5dmpdzxeh5mi68u688ixu3yd"; - encoding = "base32"; - algorithm = "curve25519"; - type = "sign"; -} -~~~ - -Then you can use `signtool` to edit map's file: - -``` -rspamadm signtool -e --editor=vim -k -``` - -To enforce signing policies you should add `sign+` string to your map definition: - -~~~ucl -map = "sign+http://example.com/map" -~~~ - -To specify trusted key you could either put **public** key from the keypair to `local.d/options.inc` file as following: - -``` -trusted_keys = [""]; -``` - -or add it as `key` definition to the map string: - -~~~ucl -map = "sign+key=+http://example.com/map" -~~~ - -## Module configuration - -The module itself has very few parameters: - -* `key`: use this key (base32 encoded) as trusted key - -All other keys are threated as rules to load maps. By default, rspamd tries to load signed updates from `rspamd.com` site using trusted key `qxuogdh5eghytji1utkkte1dn3n81c3y5twe61uzoddzwqzuxxyb`: - -~~~ucl -rspamd_update { - rules = "sign+http://rspamd.com/update/rspamd-${BRANCH_VERSION}.ucl"; - key = "qxuogdh5eghytji1utkkte1dn3n81c3y5twe61uzoddzwqzuxxyb"; -} -~~~ - -## Updates structure - -Update files are quite simple: they have 3 sections: - -* `symbols` - list of new scores for symbols that are already in rspamd (loaded with `priority = 1` to override default settings) -* `actions` - list of scores for actions (also loaded with `priority = 1`) -* `rules` - list of lua code fragments to load into rspamd, they can use `rspamd_config` global to register new rules - -Here is an example of update file: - -~~~ucl -rules = { - test =<`. In this case rspamd uses `HELO` to grab domain information as specified in the -standart. - -## Principles of work - -`SPF` can be a powerfull tool when properly used. However, it is very fragile in many -cases: when a message is somehow redirected or reconstructed by mailing lists software. - -Moreover, many mail providers have no clear understanding of this technology and -misuse the SPF technique. Hence, the scores for SPF symbols are relatively small -in rspamd. - -SPF uses DNS service extensively, therefore rspamd maintain the cache of SPF records. -This caches operates on principle of `least recently used` expiration. All cached items -lifetimes is accordingly limited by the matching DNS record time to live. - -You can manually specify the size of this cache by configuring SPF module: - -~~~ucl -spf { - spf_cache_size = 1k; # cache up to 1000 of the most recent SPF records -} -~~~ - -Currently, rspamd supports the full set of SPF elements, macroes and has internal -protection from DNS recursion. diff --git a/doc/markdown/modules/surbl.md b/doc/markdown/modules/surbl.md deleted file mode 100644 index ec39a6c7d..000000000 --- a/doc/markdown/modules/surbl.md +++ /dev/null @@ -1,184 +0,0 @@ -# SURBL module - -This module performs scanning of URL's found in messages against a list of known -DNS lists. It can add different symbols depending on the DNS replies from a -specific DNS URL list. It is also possible to resolve domains of URLs and then -check the IP addresses against the normal `RBL` style list. - -## Module configuration - -The default configuration defines several public URL lists. However, their terms -of usage normally disallows commercial or very extensive usage without purchasing -a specific sort of license. - -Nonetheless, they can be used by personal services or low volume requests free -of charge. - -~~~ucl -surbl { - # List of domains that are not checked by surbl - whitelist = "file://$CONFDIR/surbl-whitelist.inc"; - # Additional exceptions for TLD rules - exceptions = "file://$CONFDIR/2tld.inc"; - - rule { - # DNS suffix for this rule - suffix = "multi.surbl.org"; - symbol = "SURBL_MULTI"; - bits { - # List of bits ORed when reply is given - JP_SURBL_MULTI = 64; - AB_SURBL_MULTI = 32; - MW_SURBL_MULTI = 16; - PH_SURBL_MULTI = 8; - WS_SURBL_MULTI = 4; - SC_SURBL_MULTI = 2; - } - } - rule { - suffix = "multi.uribl.com"; - symbol = "URIBL_MULTI"; - bits { - URIBL_BLACK = 2; - URIBL_GREY = 4; - URIBL_RED = 8; - } - } - rule { - suffix = "uribl.rambler.ru"; - # Also check images - images = true; - symbol = "RAMBLER_URIBL"; - } - rule { - suffix = "dbl.spamhaus.org"; - symbol = "DBL"; - # Do not check numeric URL's - noip = true; - } - rule { - suffix = "uribl.spameatingmonkey.net"; - symbol = "SEM_URIBL_UNKNOWN"; - bits { - SEM_URIBL = 2; - } - noip = true; - } - rule { - suffix = "fresh15.spameatingmonkey.net"; - symbol = "SEM_URIBL_FRESH15_UNKNOWN"; - bits { - SEM_URIBL_FRESH15 = 2; - } - noip = true; - } -} -~~~ - -In general, the configuration of `surbl` module is definition of DNS lists. Each -list must have suffix that defines the list itself and optionally for some lists -it is possible to specify either `bit` or `ips` sections. - -Since some URL lists do not accept `IP` addresses, it is also possible to disable sending of URLs with IP address in the host to such lists. That could be done by specifying `noip = true` option: - -~~~ucl - rule { - suffix = "dbl.spamhaus.org"; - symbol = "DBL"; - # Do not check numeric URL's - noip = true; - } -~~~ - -It is also possible to check HTML images URLs using URL blacklists. Just specify `images = true` for such list and you are done: - -~~~ucl - rule { - suffix = "uribl.rambler.ru"; - # Also check images - images = true; - symbol = "RAMBLER_URIBL"; - } -~~~ - -## Principles of operation - -In this section, we define how `surbl` module performs its checks. - -### TLD composition - -By default, we want to check some top level domain, however, many domains contain -two components while others can have 3 or even more components to check against the -list. By default, rspamd takes top level domain as defined in the [public suffixes](https://publicsuffix.org). -Then one more component is prepended, for example: - - sub.example.com -> [.com] -> example.com - sub.co.uk -> [.co.uk] -> sub.co.uk - -However, sometimes even more levels of domain components are required. In this case, -the `exceptions` map can be used. For example, if we want to check all subdomains of -`example.com` and `example.co.uk`, then we can define the following list: - - example.com - example.co.uk - -Here are new composition rules: - - sub.example.com -> [.example.com] -> sub.example.com - sub1.sub2.example.co.uk -> [.example.co.uk] -> sub2.example.co.uk - -### DNS composition - -SURBL module composes the DNS request of two parts: - -- TLD component as defined in the previous section; -- DNS list suffix - -For example, to form a request to multi.surbl.org, the following applied: - - example.com -> example.com.multi.surbl.com - -### Results parsing - -Normally, DNS blacklists encode reply in A record from some private network -(namely, `127.0.0.0/8`). Encoding varies from one service to another. Some lists -use bits encoding, where a single DNS list or error message is encoded as a bit -in the least significant octet of the IP address. For example, if bit 1 encodes `LISTA` -and bit 2 encodes `LISTB`, then we need to perform bitwise `OR` for each specific bit -to decode reply: - - 127.0.0.3 -> LISTA | LISTB -> both bit symbols are added - 127.0.0.2 -> LISTB only - 127.0.0.1 -> LISTA only - -This encoding can save DNS requests to query multiple lists one at a time. - -Some other lists use direct encoding of lists by some specific addresses. In this -case you should define results decoding principle in `ips` section not `bits` since -bitwise rules are not applicable to these lists. In `ips` section you explicitly -match the ip returned by a list and its meaning. - -## IP lists - -From rspamd 1.1 it is also possible to do two step checks: - -1. Resolve IP addresses of each URL -2. Check each IP resolved against SURBL list - -In general this procedure could be represented as following: - -* Check `A` or `AAAA` records for `example.com` -* For each ip address resolve it using reverse octets composition: so if IP address of `example.com` is `1.2.3.4`, then checks would be for `4.3.2.1.uribl.tld` - -For example, [SBL list](https://www.spamhaus.org/sbl/) of `spamhaus` project provides such functions using `ZEN` multi list. This is included in rspamd default configuration: - -~~~ucl - rule { - suffix = "zen.spamhaus.org"; - symbol = "ZEN_URIBL"; - resolve_ip = true; - ips { - URIBL_SBL = "127.0.0.2"; - } - } -~~~ diff --git a/doc/markdown/modules/trie.md b/doc/markdown/modules/trie.md deleted file mode 100644 index 18e9f6808..000000000 --- a/doc/markdown/modules/trie.md +++ /dev/null @@ -1,39 +0,0 @@ -# Trie plugin - -Trie plugin is designed to search multiple strings within raw messages or text parts -doing this blazingly fast. In fact, it uses aho-corasic algorithm that performs incredibly -good even on large texts and many input strings. - -This module provides a convenient interface to the search trie structure. - -## Configuration - -Here is an example of trie configuration: - -~~~ucl -trie { - # Each subsection defines a single rule with associated symbol - SYMBOL1 { - # Define rules in the file (it is *NOT* a map) - file = "/some/path"; - # Raw rules search within the whole undecoded messages - raw = true; - # If we have multiple occurrences of strings from this rule - # then we insert a symbol multiple times - multi = true; - } - SYMBOL2 { - patterns = [ - "pattern1", - "pattern2", - "pattern3" - ] - } -} -~~~ - -Despite of the fact that aho-corasic trie is very fast, it supports merely plain -strings. Moreover, it cannot distinguish words boundaries, for example, a string -`test` will be found in texts `test`, `tests` or even `123testing`. Therefore, it -might be used to search some concrete and relatively specific patterns and should -not be used for words match. diff --git a/doc/markdown/modules/whitelist.md b/doc/markdown/modules/whitelist.md deleted file mode 100644 index 5b2417194..000000000 --- a/doc/markdown/modules/whitelist.md +++ /dev/null @@ -1,119 +0,0 @@ -# Whitelist module - -Whitelist module is intended to negate or increase scores for some messages that are known to -be from the trusted sources. Due to `SMTP` protocol design flaws, it is quite easy to -forge sender. Therefore, rspamd tries to validate sender based on the following additional -properties: - -- `DKIM`: a message has a valid DKIM signature for this domain -- `SPF`: a message matches SPF record for the domain -- `DMARC`: a message also satisfies domain's DMARC policy (usually implies SPF and DMARC) - -## Whitelist setup - -Whitelist configuration is quite straightforward. You can define a set of rules within -`rules` section. Each rule **must** have `domains` attribute that specifies either -map of domains (if specified as a string) or a direct list of domains (if specified as an array). - -### Whitelist constraints - -The following constraints are allowed: - -- `valid_spf`: require a valid SPF policy -- `valid_dkim`: require DKIM validation -- `valid_dmarc`: require a valid DMARC policy - -### Whitelist rules modes - -Each whitelist rule can work in 3 modes: - -- `whitelist` (default): add symbol when a domain has been found and one of constraints defined is satisfied (e.g. `valid_dmarc`) -- `blacklist`: add symbol when a domain has been found and one of constraints defined is *NOT* satisfied (e.g. `valid_dmarc`) -- `strict`: add symbol with negative (ham) score when a domain has been found and one of constraints defined is satisfied (e.g. `valid_dmarc`) and add symbol with **POSITIVE** (spam) score when some of constraints defined has failed - -If you do not define any constraints, then all both `strict` and `whitelist` rules just insert result for all mail from the specified domains. For `blacklist` rules the result has normally positive score. - -These options are combined using `AND` operator for `whitelist` and using `OR` for `blacklist` and `strict` rules. Therefore, if `valid_dkim = true` and -`valid_spf = true` would require both DKIM and SPF validation to whitelist domains from -the list. On the contrary, for blacklist and strict rules any violation would cause positive score symbol being inserted. - -### Optional settings - -You can also set the default metric settings using the ordinary attributes, such as: - -- `score`: default score -- `group`: default group (`whitelist` group is used if not specified explicitly) -- `one_shot`: default one shot mode -- `description`: default description - -Within lists, you can also use optional `multiplier` argument that defines additional -multiplier for the score added by this module. For example, let's define twice bigger -score for `github.com`: - - ["github.com", 2.0] - -or if using map: - - github.com 2.0 - -## Configuration example - -~~~ucl -whitelist { - rules { - WHITELIST_SPF = { - valid_spf = true; - domains = [ - "github.com", - ]; - score = -1.0; - } - - WHITELIST_DKIM = { - valid_dkim = true; - domains = [ - "github.com", - ]; - score = -2.0; - } - - WHITELIST_SPF_DKIM = { - valid_spf = true; - valid_dkim = true; - domains = [ - ["github.com", 2.0], - ]; - score = -3.0; - } - - STRICT_SPF_DKIM = { - valid_spf = true; - valid_dkim = true; - strict = true; - domains = [ - ["paypal.com", 2.0], - ]; - score = -3.0; # For strict rules negative score should be defined - } - - BLACKLIST_DKIM = { - valid_spf = true; - valid_dkim = true; - blacklist = true; - domains = "/some/file/blacklist_dkim.map"; - score = 3.0; # Mention positive score here - } - - WHITELIST_DMARC_DKIM = { - valid_dkim = true; - valid_dmarc = true; - domains = [ - "github.com", - ]; - score = -7.0; - } - } -} -~~~ - -Rspamd also comes with a set of pre-defined whitelisted domains that could be useful for start. diff --git a/doc/markdown/tutorials/index.md b/doc/markdown/tutorials/index.md deleted file mode 100644 index 4a3b0d5d9..000000000 --- a/doc/markdown/tutorials/index.md +++ /dev/null @@ -1,9 +0,0 @@ -# Rspamd tutorials - -In this section you can find the current step-by-step tutorials coverign various topics about rspamd. - -* [Migrating from SA](migrate_sa.md) - the guide for those who wants to migrate an existing SpamAssassin system to Rspamd -* [Writing rspamd rules](writing_rules.md) - how to extend rspamd by writing your own rules -* [Creating your fuzzy storage](http://rspamd.com/doc/fuzzy_storage.html) - learn how to make your own fuzzy storage -* [Training rspamd with dovecot antispam plugin, part 1](https://kaworu.ch/blog/2014/03/25/dovecot-antispam-with-rspamd/) - this tutorial describes how to train rspamd automatically using the `antispam` pluging of the `dovecot` IMAP server -* [Training rspamd with dovecot antispam plugin, part 2](https://kaworu.ch/blog/2015/10/12/dovecot-antispam-with-rspamd-part2/) - continuation of the previous tutorial diff --git a/doc/markdown/tutorials/migrate_sa.md b/doc/markdown/tutorials/migrate_sa.md deleted file mode 100644 index 3ccc6de4a..000000000 --- a/doc/markdown/tutorials/migrate_sa.md +++ /dev/null @@ -1,82 +0,0 @@ -# Migrating from SpamAssassin to Rspamd - -This guide provides information for those who wants to migrate an existing system from [SpamAssassin](https://spamassassin.apache.org) to Rspamd. You will find information about major differences between the spam filtering engines and how to deal with the transition process. - -## Why migrate to Rspamd - -rspamd runs **significantly faster** than SpamAssassin while providing approximately the same quality of filtering. However, if you don't care about the performance and resource consumption of your spam filtering engine you might still find Rspamd useful because it has a simple but powerful web management system (WebUI). - -On the other hand, if you have a lot of custom rules, or you use Pyzor/Razor/DCC, or you have some commercial 3rd party products that depend on SpamAssassin then you may not want to migrate. - -In short: Rspamd is for **speed**! - -## What about dspam/spamoracle...? - -You could also move from these projects to Rspamd. You should bear in mind, however, that Rspamd and SA are multi-factor spam filtering systems that use three main approaches to filter messages: - -* Content filtering - static rules that are designed to find known bad patterns in messages (usually regexp or other custom rules) -* Dynamic lists - DNS or reputation lists that are used to filter known bad content, such as abused IP addresses or URL domains -* Statistical filters - which learn to distinguish spam and ham messages - -`dspam`, `spamoracle` and others usually implement the third approach, only providing statistical filtering. This method is quite powerful but it can cause false-positives and is not very suitable for multi-user environments. Rspamd and SA, in contrast, are designed for systems with many users. Rspamd, in particular, was written for a very large system with more than 40 million users and about 10 million emails per hour. - -## Before you start - -There are a couple of things you need to know before transition: - -1. Rspamd does not support SpamAssassin statistics so you'd need to **train** your filter from scratch with spam and ham samples (or install the [pre-built statistics](https://rspamd.com/rspamd_statistics/)). Rspamd uses a different statistical engine - called [OSB-Bayes](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf) - which is intended to be more precise than SA's 'naive' Bayes classifier -2. Rspamd uses `Lua` for plugins and rules, so basic knowledge of this language is more than useful for playing with Rspamd; however, Lua is very simple and can be learned [very quickly](http://lua-users.org/wiki/LuaTutorial) -3. Rspamd uses the `HTTP` protocol to communicate with the MTA or milter, so SA native milters might not communicate with Rspamd. There is some limited support of the SpamAssassin protocol, though some commands are not supported, in particular those which require copying of data between scanner and milter. More importantly, `Length`-less messages are not supported by Rspamd as they completely break HTTP semantics and will never be supported. To achieve the same functionality, a dedicated scanner could use, e.g. HTTP `chunked` encoding. -4. Rspamd is **NOT** intended to work with blocking libraries or services, hence, something like `mysql` or `postgresql` will likely not be supported -5. Rspamd is developing quickly so you should be aware that there might be some incompatible changes between major versions - they are usually listed in the [migration](../migration.md) section of the site. -6. Unlike SA where there are only `spam` and `ham` results, Rspamd supports five levels of messages called `actions`: - + `no action` - ham message - + `greylist` - turn on adaptive greylisting (which is also used on higher levels) - + `add header` - adds Spam header (meaning soft-spam action) - + `rewrite subject` - rewrite subject to `*** SPAM *** original subject` - + `reject` - ultimately reject message - -Each action can have its own score limit which could also be modified by a user's settings. Rspamd assumes the following order of actions: `no action` <= `greylist` <= `add header` <= `rewrite subject` <= `reject`. - -Actions are **NOT** performed by Rspamd itself - they are just recommendations for the MTA agent, rmilter for example, that performs the necessary actions such as adding headers or rejecting mail. - -SA `spam` is almost equal to the Rspamd `add header` action in the default setup. With this action, users will be able to check messages in their `Junk` folder, which is usually a desired behaviour. - -## First steps with Rspamd - -To install Rspamd, I recommend using one of the [official packages](https://rspamd.com/downloads.html) that are available for many popular platforms. If you'd like to have more features then you can consider the `experimental` branch of packages, while if you would like to have more stability then you can select the `stable` branch. However, normally even the `experimental` branch is stable enough for production use, and bugs are fixed more quickly in the `experimental` branch. - -## General SpamAssassin rules - -For those who have a lot of custom rules, there is good news: Rspamd supports a certain set of SpamAssassin rules via a special [plugin](../modules/spamassassin.md) that allows **direct** loading of SA rules into Rspamd. You just need to specify your SA configuration files in the plugin configuration: - -~~~ucl -spamassassin { - sa_main = "/etc/spamassassin/conf.d/*"; - sa_local = "/etc/spamassassin/local.cf"; -} -~~~ - -On the other hand, if you don't have a lot of custom rules and primarily use the default ruleset then you shouldn't use this plugin: many SA rules are already implemented natively in Rspamd so you won't get any benefit from including such rules from SA. - -## Integration - -If you have your SA up and running it is usually possible to switch the system to Rspamd using the existing tools. However, please check the [integration document](https://rspamd.com/doc/integration.html) for further details. - -## Statistics - -rspamd statistics are not compatible with SA as Rspamd uses a more advanced statistics algorithm, described in the following [article](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf), so please bear in mind that you need to **relearn** your statistics. This can be done, for example, by using the `rspamc` command: assuming that you have your messages in separate files (e.g. `maildir` format), placed in directories `spam` and `ham`: - - rspamc learn_spam spam/ - rspamd learn_ham ham/ - -(You will need Rspamd up and running to use these commands.) - -### Learning using mail interface - -You can also setup rspamc to learn via passing messages to a certain email address. I'd recommend using `/etc/aliases` for this purpose and a `mail-redirect` command (e.g. provided by [Mail Redirect addon](https://addons.mozilla.org/en-GB/thunderbird/addon/mailredirect/) for `thunderbird` MUA). The desired aliases could be the following: - - learn-spam123: "| rspamc learn_spam" - learn-ham123: "| rspamc learn_ham" - -(You would need to use less predictable aliases to avoid the sending of messages to such addresses by an adversary, or just by mistake, to prevent statistics pollution.) diff --git a/doc/markdown/tutorials/writing_rules.md b/doc/markdown/tutorials/writing_rules.md deleted file mode 100644 index f7e78bbf5..000000000 --- a/doc/markdown/tutorials/writing_rules.md +++ /dev/null @@ -1,484 +0,0 @@ -# Writing Rspamd rules - -In this tutorial, I describe how to create new rules for Rspamd - both Lua and regexp rules. - -## Introduction - -Rules are the essential part of a spam filtering system and Rspamd ships with some prepared rules by default. However, if you run your own system you might want to have your own rules for better spam filtering or a better false positives rate. Rules are usually written in `Lua`, where you can specify both custom logic and generic regular expressions. - -## Configuration files - -Since Rspamd ships with its own rules it is a good idea to store your custom rules and configuration in separate files to avoid clashing with the default rules which might change from version to version. There are some possibilities to achieve this: - -- Local rules in Lua should be stored in the file named `${CONFDIR}/lua/rspamd.local.lua` where `${CONFDIR}` is the directory where your configuration files are placed (e.g. `/etc/rspamd`, or `/usr/local/etc/rspamd` for some systems) -- Local configuration that **adds** options to Rspamd should be placed in `${CONFDIR}/rspamd.conf.local` -- Local configuration that **overrides** the default settings should be placed in `${CONFDIR}/rspamd.conf.override` - -Lua local configuration can be used to both override and extend: - -`rspamd.lua`: - -~~~lua -config['regexp']['symbol'] = '/some_re/' -~~~ - -`rspamd.local.lua`: - -~~~lua -config['regexp']['symbol1'] = '/other_re/' -- add 'symbol1' key to the table -config['regexp']['symbol'] = '/override_re/' -- replace regexp for 'symbol' -~~~ - -For configuration rules you can take a look at the following examples: - -`rspamd.conf`: - -~~~ucl -var1 = "value1"; - -section "name" { - var2 = "value2"; -} -~~~ - -`rspamd.conf.local`: - -~~~ucl -var1 = "value2"; - -section "name" { - var3 = "value3"; -} -~~~ - -Resulting config: - -~~~ucl -var1 = "value1"; -var1 = "value2"; - -section "name" { - var2 = "value2"; -} -section "name" { - var3 = "value3"; -} -~~~ - -Override example: - -`rspamd.conf`: - -~~~ucl -var1 = "value1"; - -section "name" { - var2 = "value2"; -} -~~~ - -`rspamd.conf.override`: - -~~~ucl -var1 = "value2"; - -section "name" { - var3 = "value3"; -} -~~~ - -Resulting config: - -~~~ucl -var1 = "value2"; - -# Note that var2 is removed completely - -section "name" { - var3 = "value3"; -} -~~~ - -For each individual configuration file shipped with Rspamd, there are two special includes: - - .include(try=true,priority=1) "$CONFDIR/local.d/config.conf" - .include(try=true,priority=1) "$CONFDIR/override.d/config.conf" - -Therefore, you can either extend (using local.d) or ultimately override (using override.d) any settings in the Rspamd configuration. - -For example, let's override some default symbols shipped with Rspamd. To do that we can create and edit `etc/rspamd/local.d/metrics.conf`: - - symbol "BLAH" { - score = 20.0; - } - -We can also use an override file. For example, let's redefine actions and set a more restrictive `reject` score. To do this, we create `etc/rspamd/override.d/metrics.conf` with the following content: - - actions { - reject = 150; - add_header = 6; - greylist = 4; - } - -Note that you need to define a complete action to redefine an existing one. For example, you **cannot** write something like - - actions { - reject = 150; - } - -as this will set the other actions (`add_header` and `greylist`) as undefined. - -## Writing rules - -There are two types of rules that are normally defined by Rspamd: - -- `Lua` rules: code in written in Lua -- `Regexp` rules: regular expressions and combinations of regular expressions to match specific patterns - -Lua rules are useful for some complex tasks: check DNS, query redis or HTTP, examine some task-specific details. Regexp rules are useful since they are heavily optimized by Rspamd (especially when `hyperscan` is enabled) and allow matching custom patterns in headers, urls, text parts and even the entire message body. - -### Rule weights - -Rule weights are usually defined in the `metrics` section and contain the following data: - -- score triggers for different actions -- symbol scores -- symbol descriptions -- symbol group definitions: - + symbols in group - + description of group - + joint group score limit - -For built-in rules scores are placed in the file called `${CONFDIR}/metrics.conf`, however, you have two possibilities to define scores for your rules: - -1. Define scores in `rspamd.conf.local` as following: - -~~~ucl -metric "default" { - symbol "MY_SYMBOL" { - description = "my cool rule"; - score = 1.5; - } -} -~~~ - -2. Define scores directly in Lua when describing symbol: - -~~~lua --- regexp rule -config['regexp']['MY_SYMBOL'] = { - re = '/a/M & From=/blah/', - score = 1.5, - description = 'my cool rule', - group = 'my symbols' -} - --- lua rule -rspamd_config.MY_LUA_SYMBOL = { - callback = function(task) - -- Do something - return true - end, - score = -1.5, - description = 'another cool rule', - group = 'my symbols' -} -~~~ - -## Regexp rules - -Regexp rules are executed by the `regexp` module of Rspamd. You can find a detailed description of the syntax in [the regexp module documentation](../modules/regexp.md) - -Here are some hints to maximise performance of your regexp rules: - -* Prefer lightweight regexps, such as header or url, to heavy ones, such as mime or body regexps -* If you need to match text in a message's content, prefer `mime` regexps as they are executed on text content only -* If you **really** need to match the whole messages, then you might consider using the [trie](../modules/trie.md) module as it is significantly faster -* Avoid complex regexps, avoid backtracing, avoid negative groups `(?!)`, avoid capturing patterns (replace with `(?:)`), avoid potentially empty patterns, e.g. `/^.*$/` - -Following these rules allows you to create fast but efficient rules. To add regexp rules you should use the `config` global table that is defined in any Lua file used by Rspamd: - -~~~lua -config['regexp'] = {} -- Remove all regexp rules (including internal ones) -local reconf = config['regexp'] -- Create alias for regexp configs - -local re1 = 'From=/foo@/H' -- Mind local here -local re2 = '/blah/P' - -reconf['SYMBOL'] = { - re = string.format('(%s) && !(%s)', re1, re2), -- use string.format to create expression - score = 1.2, - description = 'some description', - - condition = function(task) -- run this rule only if some condition is satisfied - return true - end, -} -~~~ - -## Lua rules - -Lua rules are more powerful than regexp ones but they are not as heavily optimized and can cause performance issues if written incorrectly. All Lua rules accept a special parameter called `task` which represents a scanned message. - -### Return values - -Each Lua rule can return 0, or false, meaning that the rule has not matched, or true if the symbol should be inserted. In fact, you can return any positive or negative number which would be multiplied by the rule's score, e.g. if the rule score is `1.2`, then when your function returns `1` the symbol will have a score of `1.2`, and when your function returns `2.0` then the symbol will have a score of `2.4`. - -### Rule conditions - -Like regexp rules, conditions are allowed for Lua regexps, for example: - -~~~lua -rspamd_config.SYMBOL = { - callback = function(task) - return 1 - end, - score = 1.2, - description = 'some description', - - condition = function(task) -- run this rule only if some condition is satisfied - return true - end, -} -~~~ - -### Useful task manipulations - -There are a number of methods in [task](../lua/task.md) objects. For example, you can get any part of a message: - -~~~lua -rspamd_config.HTML_MESSAGE = { - callback = function(task) - local parts = task:get_text_parts() - - if parts then - for i,p in ipairs(parts) do - if p:is_html() then - return 1 - end - end - end - - return 0 - end, - score = -0.1, - description = 'HTML included in message', -} -~~~ - -You can get HTML information: - -~~~lua -local function check_html_image(task, min, max) - local tp = task:get_text_parts() - - for _,p in ipairs(tp) do - if p:is_html() then - local hc = p:get_html() - local len = p:get_length() - - - if len >= min and len < max then - local images = hc:get_images() - if images then - for _,i in ipairs(images) do - if i['embedded'] then - return true - end - end - end - end - end - end -end - -rspamd_config.HTML_SHORT_LINK_IMG_1 = { - callback = function(task) - return check_html_image(task, 0, 1024) - end, - score = 3.0, - group = 'html', - description = 'Short html part (0..1K) with a link to an image' -} -~~~ - -You can get message headers with full information passed: - -~~~lua - -rspamd_config.SUBJ_ALL_CAPS = { - callback = function(task) - local util = require "rspamd_util" - local sbj = task:get_header('Subject') - - if sbj then - local stripped_subject = subject_re:search(sbj, false, true) - if stripped_subject and stripped_subject[1] and stripped_subject[1][2] then - sbj = stripped_subject[1][2] - end - - if util.is_uppercase(sbj) then - return true - end - end - - return false - end, - score = 3.0, - group = 'headers', - description = 'All capital letters in subject' -} -~~~ - -You can also access HTTP headers, urls and other useful properties of Rspamd tasks. Moreover, you can use global convenience modules exported by Rspamd, such as [rspamd_util](../lua/util.md) or [rspamd_logger](../lua/logger.md) by requiring them in your rules: - -~~~lua -rspamd_config.SUBJ_ALL_CAPS = { - callback = function(task) - local util = require "rspamd_util" - local logger = require "rspamd_logger" - ... - end, -} -~~~ - -## Rspamd symbols - -rspamd rules fall under three categories: - -1. Pre-filters - run before other rules -2. Filters - run normally -3. Post-filters - run after all checks - -The most common type of rules are generic filters. Each filter is basically a callback that is executed by Rspamd at some time, along with an optional symbol name associated with this callback. In general, there are three options to register symbols: - -* register callback and associated symbol -* register just a plain callback -* register symbol with no callback (*virtual* symbol) - -The last option is useful when you have a single callback but with different possible results; for example `SYMBOL_ALLOW` or `SYMBOL_DENY`. Filters are registered with three methods: - -* `rspamd_config:register_symbol('SYMBOL', nominal_weight, callback)` - registers normal symbol -* `rspamd_config:register_callback_symbol(nominal_weight, callback)` - registers callback only symbol -* `rspamd_config:register_virtual_symbol('SYMBOL', nominal_weight, id)` - registers normal symbol - -`nominal_weight` is used to define priority and the initial score multiplier. It should usually be `1.0` for normal symbols and `-1.0` for symbols with negative scores that should be executed before other symbols. Here is an example of registering one callback and a couple of virtual symbols used in the [dmarc](../modules/dmarc.md) module: - -~~~lua -local id = Rspamd_config:register_callback_symbol('DMARC_CALLBACK', 1.0, - dmarc_callback) -rspamd_config:register_virtual_symbol('DMARC_POLICY_ALLOW', -1, id) -rspamd_config:register_virtual_symbol('DMARC_POLICY_REJECT', 1, id) -rspamd_config:register_virtual_symbol('DMARC_POLICY_QUARANTINE', 1, id) -rspamd_config:register_virtual_symbol('DMARC_POLICY_SOFTFAIL', 1, id) -rspamd_config:register_dependency(id, symbols['spf_allow_symbol']) -rspamd_config:register_dependency(id, symbols['dkim_allow_symbol']) -~~~ - -Numeric `id` is returned by a registration function with callbacks (`register_symbol` or `register_callback_symbol`) and can be used to link symbols: - -* add virtual symbols associated with this callback -* correctly display average time for symbols without callbacks -* properly sort symbols -* register dependencies on virtual symbols (in fact, the true dependency is created based on the parent symbol but it is sometimes convenient to use virtual symbols for simplicity) - -### Asynchronous actions - -For asynchronous actions, such as redis access or DNS checks it is recommended to use -dedicated callbacks, called symbol handlers. The difference to generic Lua rules is that -dedicated callbacks are not obliged to return value but they use the method `task:insert_result(symbol, weight)` to indicate a match. All Lua plugins are implemented as symbol handlers. Here is a simple example of a symbol handler that checks DNS: - -~~~lua -rspamd_config:register_symbol('SOME_SYMBOL', 1.0, - function(task) - local to_resolve = 'google.com' - local logger = require "rspamd_logger" - - local dns_cb = function(resolver, to_resolve, results, err) - if results then - logger.infox(task, '<%1> host: [%2] resolved for symbol: %3', - task:get_message_id(), to_resolve, 'RULE') - task:insert_result(rule['symbol'], 1) - end - end - task:get_resolver():resolve_a({ - task=task, - name = to_resolve, - callback = dns_cb}) - end) -~~~ - -You can also set the desired score and description: - -~~~lua -rspamd_config:set_metric_symbol('SOME_SYMBOL', 1.2, 'some description') --- Table version -if rule['score'] then - if not rule['group'] then - rule['group'] = 'whitelist' - end - rule['name'] = symbol - Rspamd_config:set_metric_symbol(rule) -end -~~~ - -## Difference between `config` and `rspamd_config` - -It might be confusing that there are two variables with a common meaning. (This is a legacy of older versions of Rspamd). However, currently `rspamd_config` represents an object that can have many purposes: - -* Get configuration options: - -~~~lua -rspamd_config:get_all_opts('section') -~~~ - -* Add maps: - -~~~lua -rule['map'] = Rspamd_config:add_kv_map(rule['domains'], - "Whitelist map for " .. symbol) -~~~ - -* Register callbacks for symbols: - -~~~lua -rspamd_config:register_symbol('SOME_SYMBOL', 1.0, some_functions) -~~~ - -* Register lua rules (note that `__newindex` metamethod is actually used here): - -~~~lua -rspamd_config.SYMBOL = {...} -~~~ - -* Register composites, pre-filters, post-filters and so on - -On the other hand, the `config` global is extremely simple: it's just a plain table of configuration options that is exactly the same as defined in `rspamd.conf` (and `rspamd.conf.local` or `rspamd.conf.override`). However, you can also use Lua tables and even functions for some options. For example, the `regexp` module also can accept a `callback` argument: - -~~~lua -config['regexp']['SYMBOL'] = { - callback = function(task) ... end, - ... -} -~~~ - -Such syntax is discouraged, however, and is preserved mostly for compatibility reasons. - -## Configuration order - -There is a strict order of configuration application: - -1. `rspamd.conf` and `rspamd.conf.local` are processed -2. `rspamd.conf.override` is processed and it **overrides** anything parsed on the previous step -3. **Lua** rules are loaded and they can override everything from the previous steps, with the important exception of rules scores, which are **NOT** overridden if the relevant symbol is also defined in a `metric` section -4. **Dynamic** configuration options defined in the WebUI (normally) are loaded and can override rule scores or action scores from the previous steps - -## Rules check order - -Rules in Rspamd are checked in the following order: - -1. **Pre-filters**: checked every time and can stop all further processing by calling `task:set_pre_result()` -2. **All symbols***: can depend on each other by calling `rspamd_config:add_dependency(from, to)` -3. **Statistics**: is checked only when all symbols are checked -4. **Composites**: combine symbols to adjust the final results -5. **Post-filters**: are executed even if a message is already rejected and symbols processing has been stopped diff --git a/doc/markdown/workers/controller.md b/doc/markdown/workers/controller.md deleted file mode 100644 index 44f8e2b0b..000000000 --- a/doc/markdown/workers/controller.md +++ /dev/null @@ -1,69 +0,0 @@ -# Controller worker - -Controller worker is used to manage rspamd stats, to learn rspamd and to serve WebUI. - -Internally, the controller worker is just a web server that accepts requests and sends replies using JSON serialization. -Each command is defined by URL. Some commands are read only and are considered as `unprivileged` whilst other commands, such as -maps modification, config modifications and learning requires higher level of privileges: `enable` level. The differece between levels is specified -by password. If only one password is specified in the configuration, it is used for both type of commands. - -## Controller configuration - -Rspamd controller worker supports the following options: - -* `password`: password for read-only commands -* `enable_password`: password for write commands -* `secure_ip`: list or map with IP addresses that are treated as `secure` so **all** commands are allowed from these IPs **without** passwords -* `static_dir`: directory where interface static files are placed (usually `${WWWDIR}`) -* `stats_path`: path where controller save persistent stats about rspamd (such as scanned messages count) - -## Encryption support - -To generate a keypair for the scanner you could use: - - rspamadm keypair -u - -After that keypair should appear as following: - -~~~ucl -keypair { - pubkey = "tm8zjw3ougwj1qjpyweugqhuyg4576ctg6p7mbrhma6ytjewp4ry"; - privkey = "ykkrfqbyk34i1ewdmn81ttcco1eaxoqgih38duib1e7b89h9xn3y"; -} -~~~ - -You can use its **public** part thereafter when scanning messages as following: - - rspamc --key tm8zjw3ougwj1qjpyweugqhuyg4576ctg6p7mbrhma6ytjewp4ry - -## Passwords encryption - -Rspamd now suggests to encrypt passwords when storing them in a configuration. Currently, it uses `PBKDF2-Blake2` function to derive key from a password. To encrypt key, you can use `rspamadm pw` command as following: - - rspamadm pw - Enter passphrase: - $1$cybjp37q4w63iogc4erncz1tgm1ce9i5$kxfx9xc1wk9uuakw7nittbt6dgf3qyqa394cnradg191iqgxr8kb - -You can use that line as `password` and `enable_password` values. - -## Supported commands - -* `/auth` -* `/symbols` -* `/actions` -* `/maps` -* `/getmap` -* `/graph` -* `/pie` -* `/history` -* `/historyreset` (priv) -* `/learnspam` (priv) -* `/learnham` (priv) -* `/saveactions` (priv) -* `/savesymbols` (priv) -* `/savemap` (priv) -* `/scan` -* `/check` -* `/stat` -* `/statreset` (priv) -* `/counters` diff --git a/doc/markdown/workers/fuzzy_storage.md b/doc/markdown/workers/fuzzy_storage.md deleted file mode 100644 index 2d9b61411..000000000 --- a/doc/markdown/workers/fuzzy_storage.md +++ /dev/null @@ -1,138 +0,0 @@ -# Fuzzy storage worker - -Fuzzy storage worker is intended to store fuzzy hashes of messages. - -## Protocol format - -Fuzzy storage accepts requests using `UDP` protocol with the following structure: - -~~~C -struct fuzzy_cmd { /* attribute(packed) */ - unit8_t version; /* command version, must be 0x2 */ - unit8_t cmd; /* numeric command */ - unit8_t shingles_count; /* number of shingles */ - unit8_t flag; /* flag number */ - int32_t value; /* value to store */ - uint32_t tag; /* random tag */ - char digest[64]; /* blake2b digest */ -}; -~~~ - -All numbers are in host byte order, so if you want to check fuzzy hashes from a -host with different byte order you need some additional conversions (not currently -supported by rspamd). In future, rspamd might use little endian byte order for all -operations. - -Fuzzy storage accepts the following commands: -- `FUZZY_CHECK` - check for a fuzzy hash -- `FUZZY_ADD` - add a new hash -- `FUZZY_DEL` - remove a hash - -`flag` field is used to store different hashes in a single storage. For example, -it allows to store blacklists and whitelists in the same fuzzy storage worker. -A client should set the `flag` field when adding or deleting hashes and check it -when querying for a hash. - -`value` is added to the currently stored value of a hash if that hash has been found. -This field can handle negative numbers as well. - -`tag` is used to distinguish requests by a client. Fuzzy storage just sets this -field in the reply equal to the value in the request. - -`digest` field contains the content of hash. Currently, rspamd uses `blake2b` hash -in its binary form granting the `2^512` of possible hashes with negligible collisions -probability. At the same time, rspamd saves the legacy format of fuzzy hashes by -means of this field. Old rspamd can work with legacy hashes only. - -`shingles_count` defines how many `shingles` are attached to this command. -Currently, rspamd uses 32 shingles and this value thus should be 32 for commands -with shingles. Shingles should be included in the same packet and follow the command as -an array of int64_t values. Please note, that rspamd rejects commands that have wrong -shingles count or their size is not equal to the desired one: - - sizeof(fuzzy_cmd) + shingles_count * sizeof(int64_t) - -Reply format of fuzzy storage is also presented as a structure: - -~~~C -struct fuzzy_cmd { /* attribute(packed) */ - int32_t value; - uint32_t flag; - uint32_t tag; - float prob; -}; -~~~ - -`prob` field is used to store the probability of match. This value is changed from -`0.0` (no match) to `1.0` (full match). - -## Storage format - -Rspamd fuzzy storage uses `sqlite3` for storing hashes. All update operations are -performed in a transaction which is committed to the main database approximately once -per minute. `VACUUM` command is executed on startup and hashes expiration is performed -at the termination of rspamd fuzzy storage worker. - -Here is the internal database structure: - -``` -CREATE TABLE digests(id INTEGER PRIMARY KEY, - flag INTEGER NOT NULL, - digest TEXT NOT NULL, - value INTEGER, - time INTEGER); - -CREATE TABLE shingles(value INTEGER NOT NULL, - number INTEGER NOT NULL, - digest_id INTEGER REFERENCES digests(id) ON DELETE CASCADE ON UPDATE CASCADE); -``` - -Since rspamd uses normal sqlite3 you can use all tools for working with the hashes -database to perform, for example backup or analysis. - -## Operation notes - -To check a hash, rspamd fuzzy storage initially queries for the direct match using -`digest` field as a key. If that match succeed then the value is returned immediately. -Otherwise, if a command contains shingles then rspamd checks for fuzzy match trying -to find each shingle's value. If more than 50% of shingles matches the same digest -then rspamd returns that digest's value and the probability of match that means -generally `match_count / shingles_count`. - -## Configuration - -Fuzzy storage accepts the following extra options: - -- `hashfile` - path to the sqlite storage (where are also few outdated aliases for this command exist: hash_file, file, database) -- `sync` - time to perform database sync in seconds, default value: 60 -- `expire` - time value for hashes expiration in seconds, default value: 2 days -- `keypair` - encryption keypair (can be repeated for different keys), can be obtained via *rspamadm keypair -u* command -- `keypair_cache_size` - Size of keypairs cache, default value: 512 -- `encrypted_only` - allow encrypted requests only (and forbid all unknown keys or plaintext requests) -- `master_timeout` - master protocol IO timeout -- `sync_keypair` - encryption key for master/slave updates -- `masters` - string, allow master/slave updates from the following IP addresses -- `master_key` - allow master/slave updates merely using the specified key -- `slave` - list of slave hosts. -- `mirror` - list of slave hosts, same as `slave` -- `allow_update` - string, array of strings or a map of IP addresses that are allowed -to perform changes to fuzzy storage (you should also set `read_only = no` in your fuzzy_check plugin). - -Here is an example configuration of fuzzy storage: - -~~~ucl -worker { - type = "fuzzy"; - bind_socket = "*:11335"; - hashfile = "${DBDIR}/fuzzy.db" - expire = 90d; - allow_update = ["127.0.0.1", "::1"]; -} -~~~ - -## Compatibility notes - -Rspamd fuzzy storage of version `0.8` can work with rspamd clients of all versions, -however, all updates from legacy versions (less that `0.8`) won't update fuzzy shingles -database. Rspamd [fuzzy check module](../modules/fuzzy_check.md) can work **only** -with the recent rspamd fuzzy storage (it won't get anything from the legacy storages). diff --git a/doc/markdown/workers/index.md b/doc/markdown/workers/index.md deleted file mode 100644 index 55857a73d..000000000 --- a/doc/markdown/workers/index.md +++ /dev/null @@ -1,84 +0,0 @@ -# Rspamd workers - -Rspamd defines several types of worker processes. Each type is designed for its specific -purpose, for example to scan mail messages, to perform control actions, such as learning or -statistic grabbing. There is also flexible worker type named `lua` worker that allows -to run any lua script as Rspamd worker providing proxy from Rspamd lua API. - -## Worker types - -Currently Rspamd defines the following worker types: - -- [normal](normal.md): this worker is designed to scan mail messages -- [controller](controller.md): this worker performs configuration actions, such as -learning, adding fuzzy hashes and serving web interface requests -- [fuzzy_storage](fuzzy_storage.md): stores fuzzy hashes -- [lua](lua_worker.md): runs custom lua scripts - -## Workers connections - -All client applications should interact with two main workers: `normal` and `controller`. -Both of these workers use `HTTP` protocol for all operations and rely on HTTP headers -to get extra information from a client. Depending on network configuration, it might be -useful to bind all workers to the loopback interface preventing all interaction from the -outside. Rspamd workers are **not** supposed to run in an unprotected environment, such as -Internet. Currently there is neither secrecy nor integrity control in these protocols and -using of plain HTTP might leak sensitive information. - -[Fuzzy worker](fuzzy_storage.md) is different: it is intended to serve external requests, however, it -listens on an UDP port and does not save any state information. - -## Common workers options - -All workers shares a set of common options. Here is a typical example of a normal -worker configuration that uses merely common worker options: - -~~~ucl -worker { - type = "normal"; - bind_socket = "*:11333"; -} -~~~ - -Here are options available to all workers: - -- `type` - a **mandatory** string that defines type of worker. -- `bind_socket` - a string that defines bind address of a worker. -- `count` - number of worker instances to run (some workers ignore that option, e.g. `fuzzy_storage`) - -`bind_socket` is the mostly common used option. It defines the address where worker should accept -connections. Rspamd allows both names and IP addresses for this option: - -~~~ucl -bind_socket = "localhost:11333"; -bind_socket = "127.0.0.1:11333"; -bind_socket = "[::1]:11333"; # note that you need to enclose ipv6 in '[]' -~~~ - -Also universal listening addresses are defined: - -~~~ucl -bind_socket = "*:11333"; # any ipv4 and ipv6 address -bind_socket = "*v4:11333"; # any ipv4 address -bind_socket = "*v6:11333"; # any ipv6 address -~~~ - -Moreover, you can specify systemd sockets if Rspamd is invoked by systemd: - -~~~ucl -bind_socket = "systemd:1"; # the first socket passed by systemd throught environment -~~~ - -For unix sockets, it is also possible to specify owner and mode using this syntax: - -~~~ucl -bind_socket = "/tmp/rspamd.sock mode=0666 owner=user"; -~~~ - -Without owner and mode, Rspamd uses the active user as owner (e.g. if started by root, -then `root` is used) and `0644` as access mask. Please mention that you need to specify -**octal** number for mode, namely prefixed by a zero. Otherwise, modes like `666` will produce -a weird result. - -You can specify multiple `bind_socket` options to listen on as many addresses as -you want. diff --git a/doc/markdown/workers/lua_worker.md b/doc/markdown/workers/lua_worker.md deleted file mode 100644 index cad1ad998..000000000 --- a/doc/markdown/workers/lua_worker.md +++ /dev/null @@ -1,3 +0,0 @@ -# Lua worker - -TODO diff --git a/doc/markdown/workers/normal.md b/doc/markdown/workers/normal.md deleted file mode 100644 index 9b935fb5a..000000000 --- a/doc/markdown/workers/normal.md +++ /dev/null @@ -1,29 +0,0 @@ -# Rspamd normal worker - -Rspamd normal worker is intended to scan messages for spam. It has the following configuration options available: - -* `mime`: turn to `off` if you want to scan non-mime messages (e.g. forum comments or SMS), default: `on` -* `allow_learn`: turn to `on` if you want to learn messages using this worker (usually you should use [controller](controller.md) worker), default: `off` -* `timeout`: input/output timeout, default: `1min` -* `task_timeout`: maximum time to process a single task, default: `8s` -* `max_tasks`: maximum count of tasks processes simultaneously, default: `0` - no limit -* `keypair`: encryption keypair - -## Encryption support - -To generate a keypair for the scanner you could use: - - rspamadm keypair -u - -After that keypair should appear as following: - -~~~ucl -keypair { - pubkey = "tm8zjw3ougwj1qjpyweugqhuyg4576ctg6p7mbrhma6ytjewp4ry"; - privkey = "ykkrfqbyk34i1ewdmn81ttcco1eaxoqgih38duib1e7b89h9xn3y"; -} -~~~ - -You can use its **public** part thereafter when scanning messages as following: - - rspamc --key tm8zjw3ougwj1qjpyweugqhuyg4576ctg6p7mbrhma6ytjewp4ry