Explorar el Código

[Doc] Documentation now lives in rspamd.com repo

tags/1.3.0
Vsevolod Stakhov hace 7 años
padre
commit
14803e9fae
Se han modificado 46 ficheros con 0 adiciones y 4511 borrados
  1. 0
    106
      doc/markdown/architecture/index.md
  2. 0
    154
      doc/markdown/architecture/protocol.md
  3. 0
    114
      doc/markdown/configuration/composites.md
  4. 0
    52
      doc/markdown/configuration/index.md
  5. 0
    90
      doc/markdown/configuration/logging.md
  6. 0
    109
      doc/markdown/configuration/metrics.md
  7. 0
    79
      doc/markdown/configuration/options.md
  8. 0
    81
      doc/markdown/configuration/settings.md
  9. BIN
      doc/markdown/configuration/settings.png
  10. 0
    226
      doc/markdown/configuration/statistic.md
  11. 0
    386
      doc/markdown/configuration/ucl.md
  12. 0
    41
      doc/markdown/index.md
  13. 0
    259
      doc/markdown/lua/index.md
  14. 0
    231
      doc/markdown/migration.md
  15. 0
    10
      doc/markdown/modules/chartable.md
  16. 0
    40
      doc/markdown/modules/dcc.md
  17. 0
    32
      doc/markdown/modules/dkim.md
  18. 0
    48
      doc/markdown/modules/dmarc.md
  19. 0
    0
      doc/markdown/modules/emails.md
  20. 0
    40
      doc/markdown/modules/fann.md
  21. 0
    0
      doc/markdown/modules/forged_recipients.md
  22. 0
    163
      doc/markdown/modules/fuzzy_check.md
  23. 0
    70
      doc/markdown/modules/index.md
  24. 0
    15
      doc/markdown/modules/maillist.md
  25. 0
    30
      doc/markdown/modules/mime_types.md
  26. 0
    162
      doc/markdown/modules/multimap.md
  27. 0
    22
      doc/markdown/modules/once_received.md
  28. 0
    114
      doc/markdown/modules/phishing.md
  29. 0
    94
      doc/markdown/modules/ratelimit.md
  30. 0
    116
      doc/markdown/modules/rbl.md
  31. 0
    146
      doc/markdown/modules/regexp.md
  32. 0
    48
      doc/markdown/modules/replies.md
  33. 0
    90
      doc/markdown/modules/rspamd_update.md
  34. 0
    72
      doc/markdown/modules/spamassassin.md
  35. 0
    34
      doc/markdown/modules/spf.md
  36. 0
    184
      doc/markdown/modules/surbl.md
  37. 0
    39
      doc/markdown/modules/trie.md
  38. 0
    119
      doc/markdown/modules/whitelist.md
  39. 0
    9
      doc/markdown/tutorials/index.md
  40. 0
    82
      doc/markdown/tutorials/migrate_sa.md
  41. 0
    481
      doc/markdown/tutorials/writing_rules.md
  42. 0
    69
      doc/markdown/workers/controller.md
  43. 0
    138
      doc/markdown/workers/fuzzy_storage.md
  44. 0
    84
      doc/markdown/workers/index.md
  45. 0
    3
      doc/markdown/workers/lua_worker.md
  46. 0
    29
      doc/markdown/workers/normal.md

+ 0
- 106
doc/markdown/architecture/index.md Ver fichero

@@ -1,106 +0,0 @@
# Rspamd architecture

## Introduction

Rspamd is a universal spam filtering system based on an event-driven processing model, which means that Rspamd is not intended to block anywhere in the code. To process messages Rspamd uses a set of `rules`. Each `rule` is a symbolic name associated with a message property. For example, we can define the following rules:

- `SPF_ALLOW` - means that a message is validated by SPF;
- `BAYES_SPAM` - means that a message is statistically considered as spam;
- `FORGED_OUTLOOK_MID` - message ID seems to be forged for the Outlook MUA.

Rules are defined by [modules](../modules/). If there is a module, for example, that performs SPF checks it may define several rules according to SPF policy:

- `SPF_ALLOW` - a sender is allowed to send messages for this domain;
- `SPF_DENY` - a sender is denied by SPF policy;
- `SPF_SOFTFAIL` - there is no affinity defined by SPF policy.

Rspamd supports two main types of modules: internal modules written in C and external modules written in Lua. There is no real difference between the two types with the exception that C modules are embedded and can be enabled in a `filters` attribute in the `options` section of the config:

~~~ucl
options {
filters = "regexp,surbl,spf,dkim,fuzzy_check,chartable,email";
...
}
~~~

## Protocol

Rspamd uses the HTTP protocol for all operations. This protocol is described in the [protocol section](protocol.md).

## Metrics

Rules in Rspamd define a logic of checks, but it is required to set up weights for each rule. (For Rspamd, weight means `significance`.) Rules with a greater absolute value of weight are considered more important. The weight of rules is defined in `metrics`. Each metric is a set of grouped rules with specific weights. For example, we may define the following weights for our SPF rules:

- `SPF_ALLOW`: -1
- `SPF_DENY`: 2
- `SPF_SOFTFAIL`: 0.5

Positive weights mean that this rule increases a messages 'spammyness', while negative weights mean the opposite.

### Rules scheduler

To avoid unnecessary checks Rspamd uses a scheduler of rules for each message. If a message is considered as definite spam then further checks are not performed. This scheduler is rather naive and it performs the following logic:

- select negative rules *before* positive ones to prevent false positives;
- prefer rules with the following characteristics:
- frequent rules;
- rules with more weight;
- faster rules

These optimizations can filter definite spam more quickly than a generic queue.

Since Rspamd-0.9 there are further optimizations for rules and expressions that are described generally in the [following presentation](http://highsecure.ru/ast-rspamd.pdf).

## Actions

Another important property of metrics is their actions set. This set defines recommended actions for a message if it reaches a certain score defined by all rules which have been triggered. Rspamd defines the following actions:

- `No action`: a message is likely to be ham;
- `Greylist`: greylist a message if it is not certainly ham;
- `Add header`: a message is likely spam, so add a specific header;
- `Rewrite subject`: a message is likely spam, so rewrite its subject;
- `Reject`: a message is very likely spam, so reject it completely

These actions are just recommendations for the MTA and are not to be strictly followed. For all actions that are greater or equal than `greylist` it is recommended to perform explicit greylisting. `Add header` and `rewrite subject` actions are very close in semantics and are both considered as probable spam. `Reject` is a strong rule which usually means that a message should be really rejected by the MTA. The triggering score for these actions should be specified according to their logic priorities. If two actions have the same weight, the result is unspecified.

## Rules weight

The weight of rules is not necessarily constant. For example, for statistics rules we have no certain confidence if a message is spam or not; instead we have a measure of probability. To allow fuzzy rules weight, Rspamd supports `dynamic weights`. Generally, it means that a rule may add a dynamic range from 0 to a defined weight in the metric. So if we define the symbol `BAYES_SPAM` with a weight of 5.0, then this rule can add a resulting symbol with a weight from 0 to 5.0. To distribute values, Rspamd uses a form of Sigma function to provide a fair distribution curve. The majority of Rspamd rules, with the exception of fuzzy rules, use static weights.

## Statistics

Rspamd uses statistic algorithms to precisely calculate the final score of a message. Currently, the only algorithm defined is OSB-Bayes. You can find details of this algorithm in the following [paper](http://osbf-lua.luaforge.net/papers/osbf-eddc.pdf). Rspamd uses a window size of 5 words in its classification. During the classification procedure, Rspamd splits a message into a set of tokens. Tokens are separated by punctuation or whitespace characters. Short tokens (less than 3 symbols) are ignored. For each token, Rspamd calculates two non-cryptographic hashes used subsequently as indices. All these tokens are stored in different statistics backends (mmapped files, SQLite3 database or Redis server). Currently, the recommended backend for statistics is `Redis`.

## Running rspamd

There are several command-line options that can be passed to rspamd. All of them can be displayed by passing the `--help` argument.

All options are optional: by default rspamd will try to read the `etc/rspamd.conf` config file and run as a daemon. Also there is a test mode that can be turned on by passing the `-t` argument. In test mode, rspamd reads the config file and checks its syntax. If a configuration file is OK, the exit code is zero. Test mode is useful for testing new config files without restarting rspamd.

## Managing rspamd using signals

It is important to note that all user signals should be sent to the rspamd main process and not to its children (as for child processes these signals can have other meanings). You can identify the main process:

- by reading the pidfile:

$ cat pidfile

- by getting process info:

$ ps auxwww | grep rspamd
nobody 28378 0.0 0.2 49744 9424 rspamd: main process
nobody 64082 0.0 0.2 50784 9520 rspamd: worker process
nobody 64083 0.0 0.3 51792 11036 rspamd: worker process
nobody 64084 0.0 2.7 158288 114200 rspamd: controller process
nobody 64085 0.0 1.8 116304 75228 rspamd: fuzzy storage

$ ps auxwww | grep rspamd | grep main
nobody 28378 0.0 0.2 49744 9424 rspamd: main process

After getting the pid of the main process it is possible to manage rspamd with signals, as follows:

- `SIGHUP` - restart rspamd: reread config file, start new workers (as well as controller and other processes), stop accepting connections by old workers, reopen all log files. Note that old workers would be terminated after one minute which should allow processing of all pending requests. All new requests to rspamd will be processed by the newly started workers.
- `SIGTERM` - terminate rspamd.
- `SIGUSR1` - reopen log files (useful for log file rotation).

These signals may be used in rc-style scripts. Restarting of rspamd is performed softly: no connections are dropped and if a new config is incorrect then the old config is used.

+ 0
- 154
doc/markdown/architecture/protocol.md Ver fichero

@@ -1,154 +0,0 @@
# Rspamd protocol

## Protocol basics

Rspamd uses the HTTP protocol, either version 1.0 or 1.1. (There is also a compatibility layer described further in this document.) Rspamd defines some headers which allow the passing of extra information about a scanned message, such as envelope data, IP address or SMTP SASL authentication data, etc. Rspamd supports normal and chunked encoded HTTP requests.

## Rspamd HTTP request

Rspamd encourages the use of the HTTP protocol since it is standard and can be used by every programming language without the use of exotic libraries. A typical HTTP request looks like the following:

POST /check HTTP/1.0
Content-Length: 26969
From: smtp@example.com
Pass: all
Ip: 95.211.146.161
Helo: localhost.localdomain
Hostname: localhost

<your message goes here>

You can also use chunked encoding that allows streamlined data transfer which is useful if you don't know the length of a message.

### HTTP request

Normally, you should just use '/check' here. However, if you want to communicate with the controller then you might want to use controllers commands.

(TODO: write this part)

### HTTP headers

To avoid unnecessary work, Rspamd allows an MTA to pass pre-processed data about the message by using either HTTP headers or a JSON control block (described further in this document). Rspamd supports the following non-standard HTTP headers:

| Header | Description |
| :-------------- | :-------------------------------- |
| **Deliver-To:** | Defines actual delivery recipient of message. Can be used for personalized statistics and for user specific options. |
| **IP:** | Defines IP from which this message is received. |
| **Helo:** | Defines SMTP helo |
| **Hostname:** | Defines resolved hostname |
| **From:** | Defines SMTP mail from command data |
| **Queue-Id:** | Defines SMTP queue id for message (can be used instead of message id in logging). |
| **Rcpt:** | Defines SMTP recipient (there may be several `Rcpt` headers) |
| **Pass:** | If this header has `all` value, all filters would be checked for this message. |
| **Subject:** | Defines subject of message (is used for non-mime messages). |
| **User:** | Defines SMTP user. |
| **Message-Length:** | Defines the length of message excluding the control block. |

Controller also defines certain headers:

(TODO: write this part)

Standard HTTP headers, such as `Content-Length`, are also supported.

## Rspamd HTTP reply

Rspamd reply is encoded in `JSON`. Here is a typical HTTP reply:

HTTP/1.1 200 OK
Connection: close
Server: rspamd/0.9.0
Date: Mon, 30 Mar 2015 16:19:35 GMT
Content-Length: 825
Content-Type: application/json

~~~json
{
"default": {
"is_spam": false,
"is_skipped": false,
"score": 5.2,
"required_score": 7,
"action": "add header",
"DATE_IN_PAST": {
"name": "DATE_IN_PAST",
"score": 0.1
},
"FORGED_SENDER": {
"name": "FORGED_SENDER",
"score": 5
},
"TEST": {
"name": "TEST",
"score": 100500
},
"FUZZY_DENIED": {
"name": "FUZZY_DENIED",
"score": 0,
"options": [
"1: 1.00 / 1.00",
"1: 1.00 / 1.00"
]
},
"HFILTER_HELO_5": {
"name": "HFILTER_HELO_5",
"score": 0.1
}
},
"urls": [
"www.example.com",
"another.example.com"
],
"emails": [
"user@example.com"
],
"message-id": "4E699308EFABE14EB3F18A1BB025456988527794@example"
}
~~~

For convenience, the reply is LINTed using [JSONLint](http://jsonlint.com). The actual reply is compressed for speed.

The reply can be treated as a JSON object where keys are metric names (namely `default`) and values are objects that represent metrics.

Each metric has the following fields:

* `is_spam` - boolean value that indicates whether a message is spam
* `is_skipped` - boolean flag that is `true` if a message has been skipped due to settings
* `score` - floating point value representing the effective score of message
* `required_score` - floating point value meaning the threshold value for the metric
* `action` - recommended action for a message:
- `no action` - message is likely ham;
- `greylist` - message should be greylisted;
- `add header` - message is suspicious and should be marked as spam
- `rewrite subject` - message is suspicious and should have subject rewritten
- `soft reject` - message should be temporary rejected (for example, due to rate limit exhausting)
- `reject` - message should be rejected as spam

Additionally, metric contains all symbols added during a message's processing, indexed by symbol names.

Additional keys which may be in the reply include:

* `subject` - if action is `rewrite subject` this value defines the desired subject for a message
* `urls` - a list of URLs found in a message (only hostnames)
* `emails` - a list of emails found in a message
* `message-id` - ID of message (useful for logging)
* `messages` - array of optional messages added by Rspamd filters (such as `SPF`)

## Rspamd JSON control block

Since Rspamd version 0.9 it is also possible to pass additional data by prepending a JSON control block to a message. So you can use either headers or a JSON block to pass data from the MTA to Rspamd.

To use a JSON control block, you need to pass an extra header called `Message-Length` to Rspamd. This header should be equal to the size of the message **excluding** the JSON control block. Therefore, the size of the control block is equal to `Content-Length - Message-Length`. Rspamd assumes that a message starts immediately after the control block (with no extra CRLF). This method is equally compatible with streaming transfer, however even if you are not specifying `Content-Length` you are still required to specify `Message-Length`.

Here is an example of a JSON control block:

~~~json
{
"from": "smtp@example.com",
"pass_all": "true",
"ip": "95.211.146.161",
"helo": "localhost.localdomain",
"hostname": "localhost"
}
~~~

Moreover, [UCL](https://github.com/vstakhov/libucl) JSON extensions and syntax conventions are also supported inside the control block.

+ 0
- 114
doc/markdown/configuration/composites.md Ver fichero

@@ -1,114 +0,0 @@
# Rspamd composite symbols

## Introduction

Rspamd composites are used to combine rules and create more complex rules. Composite rules are defined by `composite` keys. The value of the key should be an object that defines the composite's name and value, which is the combination of rules in a joint expression.

For example, you can define a composite that is added when two specific symbols are found:

~~~ucl
composite {
name = "TEST_COMPOSITE";
expression = "SYMBOL1 and SYMBOL2";
}
~~~

In this case, if a message has both `SYMBOL1` and `SYMBOL2` then they are replaced by symbol `TEST_COMPOSITE`. The weights of `SYMBOL1` and `SYMBOL2` are subtracted from the metric accordingly.

## Composite expressions

You can use the following operations in a composite expression:

* `AND` `&` - matches true only if both operands are true
* `OR` `|` - matches true if any operands are true
* `NOT` `!` - matches true if operand is false

You also can use braces to define priorities. Otherwise operators are evaluated from left to right. For example:

~~~ucl
composite {
name = "TEST";
expression = "SYMBOL1 and SYMBOL2 and ( not SYMBOL3 | not SYMBOL4 | not SYMBOL5 )";
}
~~~

Composite rule can include other composites in the body. There is no restriction on definition order:

~~~ucl
composite {
name = "TEST1";
expression = "SYMBOL1 AND TEST2";
}
composite {
name = "TEST2";
expression = "SYMBOL2 OR NOT SYMBOL3";
}
~~~

Composites should not be recursive; this is normally detected by Rspamd.

## Composite weight rules

Composites can record symbols in a metric or record their weights. That could be used to create non-captive composites. For example, you have symbol `A` and `B` with weights `W_a` and `W_b` and a composite `C` with weight `W_c`.

* If `C` is `A & B` then if rule `A` and rule `B` matched then these symbols are *removed* and their weights are removed as well, leading to a single symbol `C` with weight `W_c`.
* If `C` is `-A & B`, then rule `A` is preserved, but the symbol `C` is inserted. The weight of `A` is preserved as well, so the total weight of `-A & B` will be `W_a + W_c`.
* If `C` is `~A & B`, then rule `A` is *removed* but its weight is *preserved*,
leading to a single symbol `C` with weight `W_a + W_c`

When you have multiple composites which include the same symbol and a composite wants to remove the symbol and another composite wants to preserve it, then the symbol is preserved by default. Here are some more examples:

~~~ucl
composite "COMP1" {
expression = "BLAH || !DATE_IN_PAST";
}
composite "COMP2" {
expression = "!BLAH || DATE_IN_PAST";
}
composite "COMP3" {
expression = "!BLAH || -DATE_IN_PAST";
}
~~~

Both `BLAH` and `DATE_IN_PAST` exist in the message's check results. However, `COMP3` wants to preserve `DATE_IN_PAST` so it will be saved in the output.

If we rewrite the previous example but replace `-` with `~` then `DATE_IN_PAST` will be removed (however, its weight won't be removed):

~~~ucl
composite "COMP1" {
expression = "BLAH || !DATE_IN_PAST";
}
composite "COMP2" {
expression = "!BLAH || DATE_IN_PAST";
}
composite "COMP3" {
expression = "!BLAH || ~DATE_IN_PAST";
}
~~~

When we want to remove a symbol, despite other composites combinations, it is possible to add the prefix `^` to the symbol:

~~~ucl
composite "COMP1" {
expression = "BLAH || !DATE_IN_PAST";
}
composite "COMP2" {
expression = "!BLAH || ^DATE_IN_PAST";
}
composite "COMP3" {
expression = "!BLAH || -DATE_IN_PAST";
}
~~~

In this example `COMP3` wants to save `DATE_IN_PAST` once again, however `COMP2` overrides this and removes `DATE_IN_PAST`.

## Composites with symbol groups

It is possible to include a group of symbols in a composite rule. This effectively means **any** symbol of the specified group:

~~~ucl
composite {
name = "TEST2";
expression = "SYMBOL2 && !g:mua";
}
~~~

+ 0
- 52
doc/markdown/configuration/index.md Ver fichero

@@ -1,52 +0,0 @@
# Rspamd configuration

Rspamd uses the Universal Configuration Language (UCL) for its configuration. The UCL format is described in detail in this [document](ucl.md). Rspamd defines several variables and macros to extend
UCL functionality.

## Rspamd variables

- *CONFDIR*: configuration directory for Rspamd, found in `$PREFIX/etc/rspamd/`
- *RUNDIR*: runtime directory to store pidfiles or UNIX sockets
- *DBDIR*: persistent databases directory (used for statistics or symbols cache).
- *LOGDIR*: a directory to store log files
- *PLUGINSDIR*: plugins directory for Lua plugins
- *PREFIX*: basic installation prefix
- *VERSION*: Rspamd version string (e.g. "0.6.6")

## Rspamd specific macros

- *.include_map*: defines a map that is dynamically reloaded and updated if its content has changed. This macro is intended to define dynamic configuration files.

## Rspamd basic configuration

The basic Rspamd configuration is stored in `$CONFDIR/rspamd.conf`. By default, this file looks like this one:

~~~ucl
lua = "$CONFDIR/lua/rspamd.lua"

.include "$CONFDIR/options.conf"
.include "$CONFDIR/logging.conf"
.include "$CONFDIR/metrics.conf"
.include "$CONFDIR/workers.conf"
.include "$CONFDIR/composites.conf"

.include "$CONFDIR/statistic.conf"

.include "$CONFDIR/modules.conf"

modules {
path = "$PLUGINSDIR/lua/"
}
~~~

In this file, we read a Lua script placed in `$CONFDIR/lua/rspamd.lua` and load Lua rules from it. Then we include a global [options](options.md) section followed by [logging](logging.md) logging configuration. The [metrics](metrics.md) section defines metric settings, including rule weights and Rspamd actions. The [workers](../workers/index.md) section specifies Rspamd workers settings. [Composites](composites.md) is a utility section that describes composite symbols. Statistical filters are defined in the [statistic](statistic.md) section. Rspamd stores module configurations (for both Lua and internal modules) in the [modules](../modules/index.md) section while modules themselves are loaded from the following portion of the configuration:

~~~ucl
modules {
path = "$PLUGINSDIR/lua/"
}
~~~

The modules section defines the path or paths of directories or specific files. If a directory is specified then all files with a `.lua` suffix are loaded as lua plugins (the directory path is treated as a `*.lua` shell pattern).

This configuration is not intended to be changed by the user, rather you can include your own configuration options as `.include`s. To redefine symbol weights and actions, it is recommended to use [dynamic configuration](settings.md). Nevertheless, the Rspamd installation script will never overwrite a user's configuration if it exists already. Please read the Rspamd changelog carefully, if you upgrade Rspamd to a new version, for all incompatible configuration changes.

+ 0
- 90
doc/markdown/configuration/logging.md Ver fichero

@@ -1,90 +0,0 @@
# Rspamd logging settings

## Introduction
Rspamd has a number of logging options. Firstly, there are three types of log output that are supported: console logging (just output log messages to console), file logging (output log messages to file) and logging via syslog. It is also possible to restrict logging to a specific level:

* `error` - log only critical errors
* `warning` - log errors and warnings
* `info` - log all non-debug messages
* `debug` - log all including debug messages (huge amount of logging)

It is possible to turn on debug messages for specific IP addresses. This can be useful for testing. For each logging type there are special mandatory parameters: log facility for syslog (read `syslog(3)` man page for details about facilities), log file for file logging. Also, file logging may be buffered for performance. To reduce logging noise, Rspamd detects sequential matching log messages and replaces them with a total number of repeats:

#81123(fuzzy): May 11 19:41:54 rspamd file_log_function: Last message repeated 155 times
#81123(fuzzy): May 11 19:41:54 rspamd process_write_command: fuzzy hash was successfully added

## Unique ID

From version 1.0, Rspamd logs contain a unique ID for each logging message. This allows finding relevant messages quickly. Moreover, there is now a `module` definition: for example, `task` or `cfg` modules. Here is a quick example of how it works: imagine that we have an incoming task for some message. Then you'd see something like this in the logs:

2015-09-02 16:41:59 #45015(normal) <ed2abb>; task; accept_socket: accepted connection from ::1 port 52895
2015-09-02 16:41:59 #45015(normal) <ed2abb>; task; rspamd_message_parse: loaded message; id: <F66099EE-BCAB-4D4F-A4FC-7C15A6686397@FreeBSD.org>; queue-id: <undef>

So the tag is `ed2abb` in this case. All subsequent processing related to this task will have the same tag. It is enabled not only on the `task` module, but also others, such as the `spf` or `lua` modules. For other modules, such as `cfg`, the tag is generated statically using a specific characteristic, for example the configuration file checksum.

## Configuration parameters

Here is summary of logging parameters:

- `type` - Defines logging type (file, console or syslog). For some types mandatory attributes may be required:
+ `filename` - path to log file for file logging
+ `facility` - logging facility for syslog
- `level` - Defines logging level (error, warning, info or debug).
- `log_buffer` - For file and console logging defines buffer size that will be used for logging output.
- `log_urls` - Flag that defines whether all URLs in message should be logged. Useful for testing.
- `debug_ip` - List that contains IP addresses for which debugging should be turned on.
- `log_color` - Turn on coloring for log messages. Default: `no`.
- `debug_modules` - A list of modules that are enabled for debugging. The following modules are available here:
+ `task` - task messages
+ `cfg` - configuration messages
+ `symcache` - messages from symbols cache
+ `fuzzy_backend` - messages from fuzzy backend
+ `lua` - messages from Lua code
+ `spf` - messages from spf module
+ `dkim` - messages from dkim module
+ `main` - messages from the main process
+ `dns` - messages from DNS resolver
+ `map` - messages from maps in Rspamd
+ `logger` - messages from the logger itself

### Log format

Rspamd supports a custom log format when writing information about a message to the log. (This feature is supported since version 1.1.) The format string looks as follows:


log_format =<< EOD
id: <$mid>,$if_qid{ qid: <$>,}$if_ip{ ip: $,}$if_user{ user: $,}$if_smtp_from{ from: <$>,}
(default: $is_spam ($action): [$scores] [$symbols]),
len: $len, time: $time_real real,
$time_virtual virtual, dns req: $dns_req
EOD

Newlines are replaced with spaces. Both text and variables are supported in the log format line. Each variable can have an optional `if_` prefix, which will log only if it is triggered. Moreover, each variable can have an optional body value, where `$` is replaced with the variable value (as many times as it is found in the body, e.g. `$var{$$$$}` will be replaced with the variable's name repeated 4 times).

Rspamd supports the following variables:

- `mid` - message ID
- `qid` - queue ID
- `ip` - from IP
- `user` - authenticated user
- `smtp_from` - envelope from (or MIME from if SMTP from is absent)
- `mime_from` - MIME from
- `smtp_rcpt` - envelope rcpt (or MIME from if SMTP from is absent) - the first recipient
- `mime_rcpt` - MIME rcpt - the first recipient
- `smtp_rcpts` - envelope rcpts - all recipients
- `mime_rcpts` - MIME rcpts - all recipients
- `len` - length of message
- `is_spam` - a one-letter rating of spammyness: `T` for spam, `F` for ham and `S` for skipped messages
- `action` - default metric action
- `symbols` - list of all symbols
- `time_real` - real time of task processing
- `time_virtual` - CPU time of task processing
- `dns_req` - number of DNS requests
- `lua` - custom Lua script, e.g:

~~~lua
$lua{
return function(task)
return 'text parts: ' .. tostring(#task:get_text_parts()) end
}
~~~

+ 0
- 109
doc/markdown/configuration/metrics.md Ver fichero

@@ -1,109 +0,0 @@
# Rspamd metrics settings

## Introduction

The metrics section configures weights for symbols and actions applied to a message by Rspamd. You can imagine a metric as a decision made by Rspamd for a specific message by a set of rules. Each rule can insert a `symbol` into the metric, which means that this rule is true for this message. Each symbol can have a floating point value called a `weight`, which means the significance of the corresponding rule. Rules with a positive weight increase the spam factor, while rules with negative weights increase the ham factor. The result is the overall message score.

After a score is evaluated, Rspamd selects an appropriate `action` for a message. Rspamd defines the following actions, ordered by spam factor, in ascending order:

1. `no action` - a message is likely ham
2. `greylist` - a message should be greylisted to ensure sender's validity
3. `add header` - add the specific `spam` header indicating that a message is likely spam
4. `rewrite subject` - add spam subject to a message
5. `soft reject` - temporarily reject a message
6. `reject` - permanently reject a message

Actions are assumed to be applied simultaneously, meaning that the `add header` action implies, for example, the `greylist` action. `add header` and `rewrite subject` are equivalent to Rspamd. They are just two options with the same purpose: to mark a message as probable spam. The `soft reject` action is mainly used to indicate temporary issues in mail delivery, for instance, exceeding a rate limit.

There is also a special purpose metric called `default` that acts as the main metric to treat a message as spam or ham. Actually, all clients that use Rspamd just check the default metric to determine whether a message is spam or ham. Therefore, the default configuration just defines the `default` metric.

## Configuring metrics
Each metric is defined by a `metric` object in the Rspamd configuration file. This object has one mandatory attribute - `name` - which defines the name of the metric:

~~~ucl
metric {
# Define default metric
name = "default";
}
~~~
It is also possible to define some generic attributes for the metric:

* `grow_factor` - the multiplier applied for the subsequent symbols inserting by the following rule:

$$
score = score + grow\_factor * symbol\_weight
$$

$$
grow\_factor = grow\_factor * grow\_factor
$$

By default this value is `1.0` meaning that no weight growing is defined. By increasing this value you increase the effective score of messages with multiple `spam` rules matched. This value is not affected by negative score values.

* `subject` - string value that is prepended to the message's subject if the `rewrite subject` action is applied
* `unknown_weight` - weight for unknown rules. If this parameter is specified, all rules can add symbols to this metric. If such a rule is not specified by this metric then its weight is equal to this option's value. Please note, that adding this option means that all rules will be checked by Rspamd, on the contrary, if no `unknown_weight` metric is specified then rules that are not registered anywhere are silently ignored by Rspamd.

The content of this section is in two parts: symbols and actions. Actions is an object of all actions defined by this metric. If some actions are skipped, they won't be ever suggested by Rspamd. The Actions section looks as follows:

~~~ucl
metric {
...
actions {
reject = 15;
add_header = 6;
greylist = 4;
};
...
}
~~~

You can use an underscore (`_`) instead of white space in action names to simplify the configuration.

Symbols are defined by an object with the following properties:

* `weight` - the symbol weight as floating point number (negative or positive); by default the weight is `1.0`
* `name` - symbolic name for a symbol (mandatory attribute)
* `group` - a group of symbols, for example `DNSBL symbols` (as shown in WebUI)
* `description` - optional symbolic description for WebUI
* `one_shot` - normally, Rspamd inserts a symbol as many times as the corresponding rule matches for the specific message; however, if `one_shot` is `true` then only the **maximum** weight is added to the metric. `grow_factor` is correspondingly not modified by a repeated triggering of `one_shot` rules.

A symbol definition can look like this:

~~~ucl
symbol {
name = "RWL_SPAMHAUS_WL_IND";
weight = -0.7;
description = "Sender listed at Spamhaus whitelist";
}
~~~

A single metric can contain multiple symbols definitions.


## Symbol groups

Symbols can be grouped to specify their common functionality. For example, one could group all `RBL` symbols together. Moreover, from Rspamd version 0.9 it is possible to specify a group score limit, which could be useful, for instance, if a specific group should not unconditionally send a message to the `spam` class. Here is an example of such a functionality:

~~~ucl
metric {
name = default; # This is mandatory option
group {
name = "RBL group";
max_score = 6.0;
symbol {
name = "RBL1";
weight = 1;
}
symbol {
name = "RBL2";
weight = 4;
}
symbol {
name = "RBL3";
weight = 5;
}
}
}
~~~

+ 0
- 79
doc/markdown/configuration/options.md Ver fichero

@@ -1,79 +0,0 @@
# Rspamd options settings

## Introduction

The options section defines basic Rspamd behaviour. Options are global for all types of workers. The default options are shown in the following example snippet:

~~~ucl
filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check";
raw_mode = false;
one_shot = false;
cache_file = "$DBDIR/symbols.cache";
map_watch_interval = 1min;
dynamic_conf = "$DBDIR/rspamd_dynamic";
history_file = "$DBDIR/rspamd.history";
check_all_filters = false;
dns {
timeout = 1s;
sockets = 16;
retransmits = 5;
}
tempdir = "/tmp";
url_tld = "${PLUGINSDIR}/effective_tld_names.dat";
classify_headers = [
"User-Agent",
"X-Mailer",
"Content-Type",
"X-MimeOLE",
];

control_socket = "$DBDIR/rspamd.sock mode=0600";
~~~

## Global options

* `filters`: comma separated string that defines enabled **internal** Rspamd filters; for a list of the internal filters please check the [modules page](../modules/)
* `one_shot`: if this flag is set to `true` then multiple rule triggers do not increase the total score of messages (however, this option can also be individually configured in the `metric` section for each symbol)
* `cache_file`: used to store information about rules and their statistics; this file is automatically generated if Rspamd detects that a symbol's list has been changed.
* `map_watch_interval`: interval between map scanning; the actual check interval is jittered to avoid simultaneous checking, so the real interval is from this value up to 2x this value
* `check_all_filters`: turns off optimizations when a message gains an overall score more than the `reject` score for the default metric; this optimization can also be turned off for each request individually
* `history_file`: this file is automatically created and refreshed on shutdown to preserve the rolling history of operations displayed by the WebUI across restarts
* `temp_dir`: a directory for temporary files (can also be set via the environment variable `TMPDIR`).
* `url_tld`: path to file with top level domain suffixes used by Rspamd to find URLs in messages; by default this file is shipped with Rspamd and should not be touched manually
* `pid_file`: file used to store PID of the Rspamd main process (not used with systemd)
* `min_word_len`: minimum size in letters (valid for utf-8 as well) for a sequence of characters to be treated as a word; normally Rspamd skips sequences if they are shorter or equal to three symbols
* `control_socket`: path/bind for the control socket
* `classify_headers`: list of headers that are processed by statistics
* `history_rows`: number of rows in the recent history table
* `explicit_modules`: always load modules from the list even if they have no configuration section in the file
* `disable_hyperscan`: disable Hyperscan optimizations (if enabled at compile time)
* `cores_dir`: directory where Rspamd should drop core files
* `max_cores_size`: maximum total size of core files that are placed in `cores_dir`
* `max_cores_count`: maximum number of files in `cores_dir`
* `local_addrs` or `local_networks`: map or list of IP networks used as local, so certain checks are skipped for them (e.g. SPF checks)

## DNS options

These options are in a separate subsection named `dns` and specify the behaviour of Rspamd name resolution. Here is a list of available tunables:

* `nameserver`: list (or array) of DNS servers to be used (if this option is skipped, then `/etc/resolv.conf` is parsed instead). It is also possible to specify weights of DNS servers to balance the payload, e.g.

~~~ucl
options {
dns {
# 9/10 on 127.0.0.1 and 1/10 to 8.8.8.8
nameserver = ["127.0.0.1:10", "8.8.8.8:1"];
# or
# nameserver = "127.0.0.1:10";
# nameserver = "8.8.8.8:1";
}
}
~~~

* `timeout`: timeout for each DNS request
* `retransmits`: how many times each request is retransmitted before it is treated as failed (the overall timeout for each request is thus `timeout * retransmits`)
* `sockets`: how many sockets are opened to a remote DNS resolver; can be tuned if you have tens of thousands of requests per second).

## Upstream options

**TODO**

+ 0
- 81
doc/markdown/configuration/settings.md Ver fichero

@@ -1,81 +0,0 @@
# Rspamd user settings

## Introduction

Rspamd allows exceptional control over the settings which will apply to incoming messages. Each setting can define a set of custom metric weights, symbols or actions. An administrator can also skip spam checks for certain messages completely, if required. Rspamd settings can be loaded as dynamic maps and updated automatically if a corresponding file or URL has changed since its last update.

To load settings as a dynamic map, you can set 'settings' to a map string:

~~~ucl
settings = "http://host/url"
~~~

If you don't want dynamic updates then you can define settings as an object:

~~~ucl
settings {
setting1 = {
...
}
setting2 = {
...
}
}
~~~

## Settings structure

The settings file should contain a single section called "settings":

~~~ucl
settings {
some_users {
priority = high;
from = "@example.com";
rcpt = "admin";
rcpt = "/user.*/";
ip = "172.16.0.0/16";
user = "@example.net";
apply "default" {
symbol1 = 10.0;
symbol2 = 0.0;
actions {
reject = 100.0;
greylist = 10.0;
"add header" = 5.0; # Please note the space, NOT an underscore
}
}
# Always add these symbols when settings rule has matched
symbols [
"symbol2", "symbol4"
]
}
whitelist {
priority = low;
rcpt = "postmaster@example.com";
want_spam = yes;
}
}
~~~

So each setting has the following attributes:

- `name` - section name that identifies this specific setting (e.g. `some_users`)
- `priority` - high or low; high priority rules are matched first (default priority is low)
- `match list` - list of rules which this rule matches:
+ `from` - match SMTP from
+ `rcpt` - match RCPT
+ `ip` - match source IP address
+ `user` - matches authenticated user ID of message sender if any
- `apply` - list of applied rules, identified by metric name (e.g. `default`)
+ `symbol` - modify weight of a symbol
+ `actions` - defines actions
- `symbols` - add symbols from the list if a rule has matched

The match section performs `AND` operation on different matches: for example, if you have `from` and `rcpt` in the same rule, then the rule matches only when `from` `AND` `rcpt` match. For similar matches, the `OR` rule applies: if you have multiple `rcpt` matches, then *any* of these will trigger the rule. If a rule is triggered then no more rules are matched.

Regexp rules can be slow and should not be used extensively.

The picture below describes the architecture of settings matching.

![Settings match procedure](settings.png "Settings match procedure")

BIN
doc/markdown/configuration/settings.png Ver fichero


+ 0
- 226
doc/markdown/configuration/statistic.md Ver fichero

@@ -1,226 +0,0 @@
# Rspamd statistic settings

## Introduction

Statistics is used by Rspamd to define the `class` of message: either spam or ham. The overall algorithm is based on Bayesian theorem
that defines probabilities combination. In general, it defines the probability of that a message belongs to the specified class (namely, `spam` or `ham`)
base on the following factors:

- the probability of a specific token to be spam or ham (which means efficiently count of a token's occurrences in spam and ham messages)
- the probability of a specific token to appear in a message (which efficiently means frequency of a token divided by a number of tokens in a message)

## Statistics Architecture

However, Rspamd uses more advanced techniques to combine probabilities, such as sparsed bigramms (OSB) and inverse chi-square distribution.
The key idea of `OSB` algorithm is to use not merely single words as tokens but combinations of words weighted by theirs positions.
This schema is displayed in the following picture:

![OSB algorithm](https://rspamd.com/img/rspamd-schemes.004.png "Rspamd OSB scheme")

The main disadvantage is the amount of tokens which is multiplied by size of window. In Rspamd, we use a window of 5 tokens that means that
the number of tokens is about 5 times larger than the amount of words.

Statistical tokens are stored in statfiles which, in turn, are mapped to specific backends. This architecture is displayed in the following image:

![Statistics architecture](https://rspamd.com/img/rspamd-schemes.005.png "Rspamd statistics architecture")

## Statistics Configuration

Starting from Rspamd 1.0, we propose to use `sqlite3` as backed and `osb` as tokenizer. That also enables additional features, such as tokens normalization and
metainformation in statistics. The following configuration demonstrates the recommended statistics configuration:

~~~ucl
# Classifier's algorithm is BAYES
classifier "bayes" {
tokenizer {
name = "osb";
}

# Unique name used to learn the specific classifier
name = "common_bayes";

cache {
path = "${DBDIR}/learn_cache.sqlite";
}

# Minimum number of words required for statistics processing
min_tokens = 11;
# Minimum learn count for both spam and ham classes to perform classification
min_learns = 200;

backend = "sqlite3";
languages_enabled = true;
statfile {
symbol = "BAYES_HAM";
path = "${DBDIR}/bayes.ham.sqlite";
spam = false;
}
statfile {
symbol = "BAYES_SPAM";
path = "${DBDIR}/bayes.spam.sqlite";
spam = true;
}
}
~~~

It is also possible to organize per-user statistics using SQLite3 backend. However, you should ensure that Rspamd is called at the
finally delivery stage (e.g. LDA mode) to avoid multi-recipients messages. In case of a multi-recipient message, Rspamd would just use the
first recipient for user-based statistics which might be inappropriate for your configuration (however, Rspamd prefers SMTP recipients over MIME ones and prioritize
the special LDA header called `Deliver-To` that can be appended by `-d` options for `rspamc`). To enable per-user statistics, just add `users_enabled = true` property
to the **classifier** configuration. You can use per-user and per-language statistics simultaneously. For both types of statistics, Rspamd also
looks to the default language and default user's statistics allowing to have the common set of tokens shared for all users/languages.

## Using Lua scripts for `per_user` classifier

It is also possible to create custom Lua scripts to use customized user or language for a specific task. Here is an example
of such a script for extracting domain names from recipients organizing thus per-domain statistics:

~~~ucl
classifier "bayes" {
tokenizer {
name = "osb";
}

name = "bayes2";

min_tokens = 11;
min_learns = 200;

backend = "sqlite3";
per_language = true;
per_user = <<EOD
return function(task)
local rcpt = task:get_recipients(1)

if rcpt then
one_rcpt = rcpt[1]
if one_rcpt['domain'] then
return one_rcpt['domain']
end
end

return nil
end
EOD
statfile {
path = "/tmp/bayes2.spam.sqlite";
symbol = "BAYES_SPAM2";
}
statfile {
path = "/tmp/bayes2.ham.sqlite";
symbol = "BAYES_HAM2";
}
}
~~~

## Applying per-user and per-language statistics

From version 1.1, Rspamd uses independent statistics for users and joint statistics for languages. That means the following:

* If `per_user` is enabled then Rspamd looks for users statistics **only**
* If `per_language` is enabled then Rspamd looks for language specific statistics **plus** language independent statistics

It is different from 1.0 version where the second approach was used for both cases.

## Using multiple classifiers

Rspamd allows to learn and to check multiple classifiers for a single messages. This might be useful, for example, if you have common and per user statistics. It is even possible to use the same statfiles for these purposes. Classifiers **might** have the same symbols (thought it is not recommended) and they should have a **unique** `name` attribute that is used for learning. Here is an example of such a configuration:

~~~ucl
classifier "bayes" {
tokenizer {
name = "osb";
}

name = "users";
min_tokens = 11;
min_learns = 200;
backend = "sqlite3";
per_language = true;
per_user = true;

statfile {
path = "/tmp/bayes.spam.sqlite";
symbol = "BAYES_SPAM_USER";
}
statfile {
path = "/tmp/bayes.ham.sqlite";
symbol = "BAYES_HAM_USER";
}
}

classifier "bayes" {
tokenizer {
name = "osb";
}

name = "common";
min_tokens = 11;
min_learns = 200;
backend = "sqlite3";
per_language = true;

statfile {
path = "/tmp/bayes.spam.sqlite";
symbol = "BAYES_SPAM";
}
statfile {
path = "/tmp/bayes.ham.sqlite";
symbol = "BAYES_HAM";
}
}
~~~

To learn specific classifier, you can use `-c` option for `rspamc` (or `Classifier` HTTP header):

rspamc -c bayes learn_spam ...
rspamc -c bayes_user -d user@example.com learn_ham ...

## Redis statistics

From version 1.1, it is also possible to specify Redis as a backend for statistics and cache of learned messages. Redis is recommended for clustered configurations as it allows simultaneous learn and checks and, besides, is very fast. To setup Redis, you could use `redis` backend for a classifier (cache is set to the same servers accordingly).

~~~ucl
classifier "bayes" {
tokenizer {
name = "osb";
}

name = "bayes";
min_tokens = 11;
min_learns = 200;
backend = "redis";
servers = "localhost:6379";
#write_servers = "localhost:6379"; # If needed another servers for learning
#password = "xxx"; # Optional password
#database = "2"; # Optional database id

statfile {
symbol = "BAYES_SPAM";
}
statfile {
symbol = "BAYES_HAM";
}
per_user = true;
}
~~~

`per_languages` is not supported by Redis - it just stores everything in the same place. `write_servers` are used in the
`master-slave` rotation by default and used for learning, whilst `servers` are selected randomly each time:

write_servers = "master.example.com:6379:10, slave.example.com:6379:1"
write_servers = "master.example.com:6379, slave.example.com:6379"

Where the last number is priority used to distinguish master from slave.

## Autolearning

From version 1.1, Rspamd supports autolearning for statfiles. Autolearning is applied after all rules are processed (including statistics) if and only if the same symbol has not been inserted. E.g. a message won't be learned as spam if `BAYES_SPAM` is already in the results of checking.

There are 3 possibilities to specify autolearning:

* `autolearn = true`: autolearning is performing as spam if a message has `reject` action and as ham if a message has **negative** score
* `autolearn = [1, 10]`: autolearn as ham if score is less than minimum of 2 numbers (< `1` here) and as spam if score is more than maximum of 2 numbers (> `10` in this case)
* `autolearn = "return function(task) ... end"`: use the following Lua function to detect if autolearn is needed (function should return 'ham' if learn as ham is needed and string 'spam' if learn as spam is needed, if no learn is needed then a function can return anything including `nil`)

Redis backend is highly recommended for autolearning purposes since it's the only backend with high concurrency level when multiple writers are properly synchronized.

+ 0
- 386
doc/markdown/configuration/ucl.md Ver fichero

@@ -1,386 +0,0 @@
# UCL configuration language

**Table of Contents** *generated with [DocToc](http://doctoc.herokuapp.com/)*

- [Introduction](#introduction)
- [Basic structure](#basic-structure)
- [Improvements to the json notation](#improvements-to-the-json-notation)
- [General syntax sugar](#general-syntax-sugar)
- [Automatic arrays creation](#automatic-arrays-creation)
- [Named keys hierarchy](#named-keys-hierarchy)
- [Convenient numbers and booleans](#convenient-numbers-and-booleans)
- [General improvements](#general-improvements)
- [Commments](#commments)
- [Macros support](#macros-support)
- [Variables support](#variables-support)
- [Multiline strings](#multiline-strings)
- [Emitter](#emitter)
- [Validation](#validation)
- [Performance](#performance)
- [Conclusion](#conclusion)

## Introduction {#introduction}

This document describes the main features and principles of the configuration
language called `UCL` - universal configuration language.

## Basic structure {#basic-structure}

UCL is heavily infused by `nginx` configuration as the example of a convenient configuration
system. However, UCL is fully compatible with `JSON` format and is able to parse json files.
For example, you can write the same configuration in the following ways:

* in nginx like:

~~~ucl
param = value;
section {
param = value;
param1 = value1;
flag = true;
number = 10k;
time = 0.2s;
string = "something";
subsection {
host = {
host = "hostname";
port = 900;
}
host = {
host = "hostname";
port = 901;
}
}
}
~~~

* or in JSON:

~~~json
{
"param": "value",
"param1": "value1",
"flag": true,
"subsection": {
"host": [
{
"host": "hostname",
"port": 900
},
{
"host": "hostname",
"port": 901
}
]
}
}
~~~

## Improvements to the json notation. {#improvements-to-the-json-notation}

There are various things that make ucl configuration more convenient for editing than strict json:

### General syntax sugar

* Braces are not necessary to enclose a top object: it is automatically treated as an object:

~~~json
"key": "value"
~~~

is equal to:

~~~json
{"key": "value"}
~~~

* There is no requirement of quotes for strings and keys, moreover, `:` may be replaced `=` or even be skipped for objects:

~~~ucl
key = value;
section {
key = value;
}
~~~

is equal to:

~~~json
{
"key": "value",
"section": {
"key": "value"
}
}
~~~

* No commas mess: you can safely place a comma or semicolon for the last element in an array or an object:

~~~json
{
"key1": "value",
"key2": "value",
}
~~~

### Automatic arrays creation

* Non-unique keys in an object are allowed and are automatically converted to the arrays internally:

~~~json
{
"key": "value1",
"key": "value2"
}
~~~

is converted to:

~~~json
{
"key": ["value1", "value2"]
}
~~~

### Named keys hierarchy

UCL accepts named keys and organize them into objects hierarchy internally. Here is an example of this process:

~~~ucl
section "blah" {
key = value;
}
section foo {
key = value;
}
~~~

is converted to the following object:

~~~ucl
section {
blah {
key = value;
}
foo {
key = value;
}
}
~~~

Plain definitions may be more complex and contain more than a single level of nested objects:

~~~ucl
section "blah" "foo" {
key = value;
}
~~~

is presented as:

~~~ucl
section {
blah {
foo {
key = value;
}
}
}
~~~

### Convenient numbers and booleans

* Numbers can have suffixes to specify standard multipliers:
+ `[kKmMgG]` - standard 10 base multipliers (so `1k` is translated to 1000)
+ `[kKmMgG]b` - 2 power multipliers (so `1kb` is translated to 1024)
+ `[s|min|d|w|y]` - time multipliers, all time values are translated to float number of seconds, for example `10min` is translated to 600.0 and `10ms` is translated to 0.01
* Hexadecimal integers can be used by `0x` prefix, for example `key = 0xff`. However, floating point values can use decimal base only.
* Booleans can be specified as `true` or `yes` or `on` and `false` or `no` or `off`.
* It is still possible to treat numbers and booleans as strings by enclosing them in double quotes.

## General improvements {#general-improvements}

### Commments {#comments}

UCL supports different style of comments:

* single line: `#`
* multiline: `/* ... */`

Multiline comments may be nested:

~~~c
# Sample single line comment
/*
some comment
/* nested comment */
end of comment
*/
~~~

### Macros support

UCL supports external macros both multiline and single line ones:

~~~ucl
.macro "sometext";
.macro {
Some long text
....
};
~~~

Moreover, each macro can accept an optional list of arguments in braces. These
arguments themselves are the UCL object that is parsed and passed to a macro as
options:

~~~ucl
.macro(param=value) "something";
.macro(param={key=value}) "something";
.macro(.include "params.conf") "something";
.macro(#this is multiline macro
param = [value1, value2]) "something";
.macro(key="()") "something";
~~~

UCL also provide a convenient `include` macro to load content from another files
to the current UCL object. This macro accepts either path to file:

~~~ucl
.include "/full/path.conf"
.include "./relative/path.conf"
.include "${CURDIR}/path.conf"
~~~

or URL (if ucl is built with url support provided by either `libcurl` or `libfetch`):

.include "http://example.com/file.conf"

`.include` macro supports a set of options:

* `try` (default: **false**) - if this option is `true` than UCL treats errors on loading of
this file as non-fatal. For example, such a file can be absent but it won't stop the parsing
of the top-level document.
* `sign` (default: **false**) - if this option is `true` UCL loads and checks the signature for
a file from path named `<FILEPATH>.sig`. Trusted public keys should be provided for UCL API after
parser is created but before any configurations are parsed.
* `glob` (default: **false**) - if this option is `true` UCL treats the filename as GLOB pattern and load
all files that matches the specified pattern (normally the format of patterns is defined in `glob` manual page
for your operating system). This option is meaningless for URL includes.
* `url` (default: **true**) - allow URL includes.
* `path` (default: empty) - A UCL_ARRAY of directories to search for the include file.
Search ends after the first patch, unless `glob` is true, then all matches are included.
* `prefix` (default false) - Put included contents inside an object, instead
of loading them into the root. If no `key` is provided, one is automatically generated based on each files basename()
* `key` (default: <empty string>) - Key to load contents of include into. If
the key already exists, it must be the correct type
* `target` (default: object) - Specify if the `prefix` `key` should be an
object or an array.
* `priority` (default: 0) - specify priority for the include (see below).
* `duplicate` (default: 'append') - specify policy of duplicates resolving:
- `append` - default strategy, if we have new object of higher priority then it replaces old one, if we have new object with less priority it is ignored completely, and if we have two duplicate objects with the same priority then we have a multi-value key (implicit array)
- `merge` - if we have object or array, then new keys are merged inside, if we have a plain object then an implicit array is formed (regardeless of priorities)
- `error` - create error on duplicate keys and stop parsing
- `rewrite` - always rewrite an old value with new one (ignoring priorities)

Priorities are used by UCL parser to manage the policy of objects rewriting during including other files
as following:

* If we have two objects with the same priority then we form an implicit array
* If a new object has bigger priority then we overwrite an old one
* If a new object has lower priority then we ignore it

By default, the priority of top-level object is set to zero (lowest priority). Currently,
you can define up to 16 priorities (from 0 to 15). Includes with bigger priorities will
rewrite keys from the objects with lower priorities as specified by the policy.

### Variables support

UCL supports variables in input. Variables are registered by a user of the UCL parser and can be presented in the following forms:

* `${VARIABLE}`
* `$VARIABLE`

UCL currently does not support nested variables. To escape variables one could use double dollar signs:

* `$${VARIABLE}` is converted to `${VARIABLE}`
* `$$VARIABLE` is converted to `$VARIABLE`

However, if no valid variables are found in a string, no expansion will be performed (and `$$` thus remains unchanged). This may be a subject
to change in future libucl releases.

### Multiline strings

UCL can handle multiline strings as well as single line ones. It uses shell/perl like notation for such objects:

key = <<EOD
some text
splitted to
lines
EOD

In this example `key` will be interpreted as the following string: `some text\nsplitted to\nlines`.
Here are some rules for this syntax:

* Multiline terminator must start just after `<<` symbols and it must consist of capital letters only (e.g. `<<eof` or `<< EOF` won't work);
* Terminator must end with a single newline character (and no spaces are allowed between terminator and newline character);
* To finish multiline string you need to include a terminator string just after newline and followed by a newline (no spaces or other characters are allowed as well);
* The initial and the final newlines are not inserted to the resulting string, but you can still specify newlines at the begin and at the end of a value, for example:


key <<EOD

some
text

EOD


## Emitter {#emitter}

Each UCL object can be serialized to one of the three supported formats:

* `JSON` - canonic json notation (with spaces indented structure);
* `Compacted JSON` - compact json notation (without spaces or newlines);
* `Configuration` - nginx like notation;
* `YAML` - yaml inlined notation.

## Validation {#validation}

UCL allows validation of objects. It uses the same schema that is used for json: [json schema v4](http://json-schema.org). UCL supports the full set of json schema with the exception of remote references. This feature is unlikely useful for configuration objects. Of course, a schema definition can be in UCL format instead of JSON that simplifies schemas writing. Moreover, since UCL supports multiple values for keys in an object it is possible to specify generic integer constraints `maxValues` and `minValues` to define the limits of values count in a single key. UCL currently is not absolutely strict about validation schemas themselves, therefore UCL users should supply valid schemas (as it is defined in json-schema draft v4) to ensure that the input objects are validated properly.

## Performance {#performance}

Are UCL parser and emitter fast enough? Well, there are some numbers.
I got a 19Mb file that consist of ~700 thousands lines of json (obtained via
http://www.json-generator.com/). Then I checked jansson library that performs json
parsing and emitting and compared it with UCL. Here are results:

jansson: parsed json in 1.3899 seconds
jansson: emitted object in 0.2609 seconds

ucl: parsed input in 0.6649 seconds
ucl: emitted config in 0.2423 seconds
ucl: emitted json in 0.2329 seconds
ucl: emitted compact json in 0.1811 seconds
ucl: emitted yaml in 0.2489 seconds

So far, UCL seems to be significantly faster than jansson on parsing and slightly faster on emitting. Moreover,
UCL compiled with optimizations (-O3) performs faster:


ucl: parsed input in 0.3002 seconds
ucl: emitted config in 0.1174 seconds
ucl: emitted json in 0.1174 seconds
ucl: emitted compact json in 0.0991 seconds
ucl: emitted yaml in 0.1354 seconds


You can do your own benchmarks by running `make check` in libucl top directory.

## Conclusion {#conclusion}

UCL has clear design that should be very convenient for reading and writing. At the same time it is compatible with
JSON language and therefore can be used as a simple JSON parser. Macroes logic provides an ability to extend configuration
language (for example by including some lua code) and comments allows to disable or enable the parts of a configuration
quickly.

+ 0
- 41
doc/markdown/index.md Ver fichero

@@ -1,41 +0,0 @@
# Rspamd documentation

## Tutorials and introduction documents

Here are the main introduction documents that are recommended for reading if you are going to use Rspamd in your mail system.

* **[Quick Start](quick_start.md)** - learn how to install, setup and perform initial configuring of Rspamd
* **[Upgrading](migration.md)** - the list of incompatible changes between versions of Rspamd
* **[Frequently asked questions](faq.md)** - common questions about Rspamd and Rmilter
* **[Migrating from SA](migrate_sa.md)** - the guide for those who wants to migrate an existing SpamAssassin system to Rspamd
* **[MTA integration](integration.md)** document describes how to integrate Rspamd into your mail infrastructure
* **[Creating your fuzzy storage](http://rspamd.com/doc/fuzzy_storage.html)** document provides information about how to make your own hashes storage and how to learn it efficiently

### Rspamd and Dovecot Antispam integration

* [Training Rspamd with Dovecot antispam plugin, part 1](https://kaworu.ch/blog/2014/03/25/dovecot-antispam-with-rspamd/) - this tutorial describes how to train Rspamd automatically using the `antispam` plugin of the `Dovecot` IMAP server
* [Training Rspamd with Dovecot antispam plugin, part 2](https://kaworu.ch/blog/2015/10/12/dovecot-antispam-with-rspamd-part2/) - continuation of the previous tutorial

## Configuration

This section contains documents about various configuration details.

* **[General information](./configuration/index.md)** explains basic principles of Rspamd configuration
* **[Modules documentation](./modules/)** gives a detailed description of each Rspamd module
* **[Workers documentation](./workers/)** contains information about different Rspamd worker processes: scanners, controller, fuzzy storage and so on
* **[Users settings description](./configuration/settings.md)** could be useful if you need to setup per-user configuration or want process mail in different ways, for example, for inbound and outbound messages.

## Architecture

These documents are useful if you need to know details about Rspamd internals.

* **[General information](./architecture/index.md)** provides an overview of the Rspamd architecture
* **[Protocol documentation](./architecture/protocol.md)** describes Rspamd protocol which is used to communicate with external tools, such as Rmilter or `rspamc` client utility


## Extending Rspamd

This section contains documents about writing new rules for Rspamd and, in particular, Rspamd Lua API.

* **[Writing Rspamd rules](./tutorials/writing_rules.md)** is a step-by-step guide that describes how to write rules for Rspamd
* **[Lua API reference](./lua/)** provides the extensive information about all Lua modules available in Rspamd

+ 0
- 259
doc/markdown/lua/index.md Ver fichero

@@ -1,273 +0,0 @@
# Rspamd Lua API {#top}

Lua api is a core part of Rspamd functionality. [Lua language](http://www.lua.org) is used for writing rules and plugins.

## Using Lua API from rules {#luarules}

Many Lua rules are shipped with Rspamd. They can be included to Rspamd by using tag **lua** in Rspamd.conf:

~~~ucl
lua = "$CONFDIR/lua/rspamd.lua"
~~~

### Global configuration tables {#luaglobal}

While load of this file Rspamd defines two global variables:
- *config* - a global table of modules configuration. Here is a sample of usage of this table:

~~~lua
config['module'] = {}

config['regexp'] = {
RULE_NAME = '/some_re/'
}

config['regexp']['RULE_NAME2'] = '/more_re/'

~~~

- *metrics* - a global table of metrics definitions. This variable is a table that is indexed by metric name and provide ability to set up symbols' properties:

~~~lua

metrics['default'] = {
-- Set weight and description
SYMBOL = { weight = 9.0, description = 'description'},
-- Just set weight
SYMBOL2 = 9.0,
}
metrics['default']['SYMBOL3'] = { weight = 1, description = 'description' }
~~~

* *classifiers* - a table of classifiers pre-filters. Pre-filter must be a function that accepts 4 parameters: `classifier`, `task`, `is_learn` and `is_spam`. Pre-filter must return a table of statfiles to be checked or learned for this message or nil if all suitable statfiles must be learned or checked. Here is an example of language detection for classification:

~~~lua


classifiers['bayes'] = function(classifier, task, is_learn, is_spam)
-- Subfunction for detection of message's language
local detect_language = function(task)
local parts = task:get_text_parts()
for _,p in ipairs(parts) do
local l = p:get_language()
if l then
return l
end
end
return nil
end

-- Main procedure
language = detect_language(task)
if language then
-- Find statfiles with specified language
local selected = {}
for _,st in pairs(classifier:get_statfiles()) do
local st_l = st:get_param('language')
if st_l and st_l == language then
-- Insert statfile with specified language
table.insert(selected, st)
end
end
if table.maxn(selected) > 1 then
return selected
end
else
-- Language not detected
local selected = {}
for _,st in ipairs(classifier:get_statfiles()) do
local st_l = st:get_param('language')
-- Insert only statfiles without language
if not st_l then
table.insert(selected, st)
end
end
if table.maxn(selected) > 1 then
return selected
end
end

return nil
end
~~~

* *rspamd_config* - is a global object that allows you to modify configuration and register new symbols.

## Writing advanced rules {#luarules}

So by using these two tables it is possible to configure rules and metrics. Also note that it is possible to use any Lua functions and Rspamd libraries:

~~~lua
local rulebody = string.format('%s & !%s', '/re1/', '/re2')
rspamd_logger.info('Loaded test rule: ' .. rulebody)
~~~

Also it is possible to declare functions and use `closures` when defining Rspamd rules:

~~~lua
local function check_headers_tab(task, header_name)
-- Extract raw headers from message
local raw_headers = task:get_raw_header(header_name)
-- Make match of headers, that are separated with tabs, not spaces
if raw_headers then
for _,rh in ipairs(raw_headers) do
if rh['tab_separated'] then
-- We have header value separated by tab symbol
return true,rh['name']
end
end
end
return false
end

rspamd_config.HEADER_TAB_FROM_WHITELISTED = function(task) return check_headers_tab(task, "From") end
rspamd_config.HEADER_TAB_TO_WHITELISTED = function(task) return check_headers_tab(task, "To") end
rspamd_config.HEADER_TAB_DATE_WHITELISTED = function(task) return check_headers_tab(task, "Date") end

rspamd_config.R_EMPTY_IMAGE = {
callback = function(task)
local tp = task:get_text_parts() -- get text parts in a message

for _,p in ipairs(tp) do -- iterate over text parts array using `ipairs`
if p:is_html() then -- if the current part is html part
local hc = p:get_html() -- we get HTML context
local len = p:get_length() -- and part's length

if len < 50 then -- if we have a part that has less than 50 bytes of text
local images = hc:get_images() -- then we check for HTML images

if images then -- if there are images
for _,i in ipairs(images) do -- then iterate over images in the part
if i['height'] + i['width'] >= 400 then -- if we have a large image
return true -- add symbol
end
end
end
end
end
end
end,
score = 10.0,
condition = function(task)
if task:get_header('Subject') then
return true
end
return false
end,
description = 'No text parts and a large image',
score = 3.1,
}
~~~

Using Lua in rules provides many abilities to write complex mail filtering rules.

## Writing Lua plugins {#luaplugins}

Plugins are more complex filters than ordinary rules. Plugins can have their own configuration parameters and multiple callbacks. Plugins can make DNS requests, read from Rspamd maps and insert custom results.

### Structure of the typical plugin

Each Rspamd plugin has a common structure:

- Registering configuration parameters
- Reading configuration parameters and set up callbacks
- Callbacks that are called by Rspamd during message processing

Here is a simple plugin example:

~~~lua
local config_param = 'default'

local function sample_callback(task)
end



local opts = Rspamd_config:get_all_opt('sample')
if opts then
if opts['config'] then
config_param = opts['config']
-- Register callback
Rspamd_config:register_symbol('some_symbol', sample_callback)
end
end
~~~

This plugin uses global variable *rspamd_config* to extract configuration options. Then it registers function `sample_callback` that will be called for processing symbol `some_symbol`.

### Using DNS requests inside plugins

It is often required to make DNS requests for messages checks. Here is an example of making asynchronous DNS request from Rspamd Lua plugin:

~~~lua
local function symbol_cb(task)
-- Task is now local variable

local function dns_cb(resolver, to_resolve, results, err, str)
-- Increase total count of dns requests
task:inc_dns_req()
if results then
task:insert_result('symbol', 1, str)
end
end
-- Resolve 'example.com' using primitives from the task passed
task:get_resolver():resolve_a(task:get_session(), task:get_mempool(),
'example.com', dns_cb, 'sample string')
end
~~~

### Using maps from Lua plugin

Maps hold dynamically loaded data like lists or ip trees. It is possible to use 3 types of maps:

* **radix_tree** stores ip addresses
* **hash_map** stores plain strings (domains usually)
* **callback** call for a specified Lua callback when a map is loaded or changed, map's content is passed to that callback as a parameter

Here is a sample of using maps from Lua API:

~~~lua
local Rspamd_logger = require "rspamd_logger"

local hash_map = Rspamd_config:add_hash_map('file:///path/to/file', 'sample map')
local radix_tree = Rspamd_config:add_radix_map('http://somehost.com/test.dat', 'sample ip map')
local generic_map = Rspamd_config:add_map('file:///path/to/file', 'sample generic map',
function(str)
-- This callback is called when a map is loaded or changed
-- Str contains map content
Rspamd_logger.info('Got generic map content: ' .. str)
end)

local function sample_symbol_cb(task)
-- Check whether hash map contains from address of message
if hash_map:get_key(task:get_from()) then
-- Check whether radix map contains client's ip
if radix_map:get_key(task:get_from_ip_num()) then
...
end
end
end
~~~

## Conclusions {#luaconclusion}

Lua plugins is a powerful tool for creating complex filters that can access practically all features of Rspamd. Lua plugins can be used for writing custom rules and interact with Rspamd in many ways, can use maps and make DNS requests. Rspamd is shipped with a couple of Lua plugins that can be used as examples while writing your own plugins.

## References {#luareference}

- [Lua manual](http://www.lua.org/manual/5.2/)
- [Programming in Lua](http://www.lua.org/pil/)

+ 0
- 231
doc/markdown/migration.md Ver fichero

@@ -1,232 +0,0 @@
# Migrating between rspamd versions

This document describes incompatible changes introduced in recent rspamd versions and details how to update your rules and configuration accordingly.

## Migrating from rspamd 1.0 to rspamd 1.1

The only change here affects users with per-user statistics enabled. There is an incompatible change in sqlite3 and per-user behaviour:

Now both redis and sqlite3 follow common principles for per-user statistics:

* If per-user statistics is enabled check per-user tokens **ONLY**
* If per-user statistics is not enabled then check common tokens **ONLY**

If you need the old behaviour, then you need to use a separate classifier for per-user statistics, for example:

~~~ucl
classifier {
tokenizer {
name = "osb";
}
name = "bayes_user";
min_tokens = 11;
backend = "sqlite3";
per_language = true;
per_user = true;
statfile {
path = "/tmp/bayes.spam.sqlite";
symbol = "BAYES_SPAM_USER";
}
statfile {
path = "/tmp/bayes.ham.sqlite";
symbol = "BAYES_HAM_USER";
}
}
classifier {
tokenizer {
name = "osb";
}
name = "bayes";
min_tokens = 11;
backend = "sqlite3";
per_language = true;
statfile {
path = "/tmp/bayes.spam.sqlite";
symbol = "BAYES_SPAM";
}
statfile {
path = "/tmp/bayes.ham.sqlite";
symbol = "BAYES_HAM";
}
}
~~~

## Migrating from rspamd 0.9 to rspamd 1.0

In rspamd 1.0 the default settings for statistics tokenization have been changed to `modern`, meaning that tokens are now generated from normalized words and there are various improvements which are incompatible with the statistics model used in pre-1.0 versions. To use these new features you should either **relearn** your statistics or continue using your old statistics **without** new features by adding a `compat` parameter:

~~~ucl
classifier {
...
tokenizer {
compat = true;
}
...
}
~~~

The recommended way to store statistics now is the `sqlite3` backend (which is incompatible with the old mmap backend):

~~~ucl
classifier {
type = "bayes";
tokenizer {
name = "osb";
}
cache {
path = "${DBDIR}/learn_cache.sqlite";
}
min_tokens = 11;
backend = "sqlite3";
languages_enabled = true;
statfile {
symbol = "BAYES_HAM";
path = "${DBDIR}/bayes.ham.sqlite";
spam = false;
}
statfile {
symbol = "BAYES_SPAM";
path = "${DBDIR}/bayes.spam.sqlite";
spam = true;
}
}
~~~

## Migrating from rspamd 0.6 to rspamd 0.7

### WebUI changes

The rspamd web interface is now a part of the rspamd distribution. Moreover, all static files are now served by rspamd itself so you won't need to set up a separate web server to distribute static files. At the same time, the WebUI worker has been removed and the controller acts as WebUI+old_controller which allows it to work with both a web browser and the rspamc client. However, you might still want to set up a full-featured HTTP server in front of rspamd to enable, for example, TLS and access controls.

Now there are two password levels for rspamd: `password` for read-only commands and `enable_password` for data changing commands. If `enable_password` is not specified then `password` is used for both commands.

Here is an example of the full configuration of the rspamd controller worker to serve the WebUI:

~~~ucl
worker {
type = "controller";
bind_socket = "localhost:11334";
count = 1;
password = "q1";
enable_password = "q2";
secure_ip = "127.0.0.1"; # Allows to use *all* commands from this IP
static_dir = "${WWWDIR}";
}
~~~

### Settings changes

The settings system has been completely reworked. It is now a lua plugin that registers pre-filters and assigns settings according to dynamic maps or a static configuration. Should you want to use the new settings system then please check the recent [documentation](https://rspamd.com/doc/configuration/settings.html). The old settings have been completely removed from rspamd.

### Lua changes

There are many changes in the lua API and some of them are, unfortunately, breaking ones.

* many superglobals are removed: now rspamd modules need to be loaded explicitly,
the only global remaining is `rspamd_config`. This affects the following modules:
- `rspamd_logger`
- `rspamd_ip`
- `rspamd_http`
- `rspamd_cdb`
- `rspamd_regexp`
- `rspamd_trie`
~~~lua
local rspamd_logger = require "rspamd_logger"
local rspamd_trie = require "rspamd_trie"
local rspamd_cdb = require "rspamd_cdb"
local rspamd_ip = require "rspamd_ip"
local rspamd_regexp = require "rspamd_regexp"
~~~

* new system of symbols registration: now symbols can be registered by adding new indices to `rspamd_config` object. Old version:

~~~lua
local reconf = config['regexp']
reconf['SYMBOL'] = function(task)
...
end
~~~

new one:

~~~lua
rspamd_config.SYMBOL = function(task)
...
end
~~~

`rspamd_message` is **removed** completely; you should use task methods to access message data. This includes such methods as:

* `get_date` - this method can now return a date for task and message based on the arguments:

~~~lua
local dm = task:get_date{format = 'message'} -- MIME message date
local dt = task:get_date{format = 'connect'} -- check date
~~~
* `get_header` - this function is totally reworked. Now `get_header` version returns just a decoded string, `get_header_raw` returns an undecoded string and `get_header_full` returns the full list of tables. Please consult the corresponding [documentation](https://rspamd.com/doc/lua/task.html) for details. You also might want to update the old invocation of task:get_header to the new one.
Old version:

~~~lua
function kmail_msgid (task)
local msg = task:get_message()
local header_msgid = msg:get_header('Message-Id')
if header_msgid then
-- header_from and header_msgid are tables
for _,header_from in ipairs(msg:get_header('From')) do
...
end
end
return false
end
~~~
new one:

~~~lua
function kmail_msgid (task)
local header_msgid = task:get_header('Message-Id')
if header_msgid then
local header_from = task:get_header('From')
-- header_from and header_msgid are strings
end
return false
end
~~~

or with the full version:
~~~lua
rspamd_config.FORGED_GENERIC_RECEIVED5 = function (task)
local headers_recv = task:get_header_full('Received')
if headers_recv then
-- headers_recv is now the list of tables
for _,header_r in ipairs(headers_recv) do
if re:match(header_r['value']) then
return true
end
end
end
return false
end
~~~

* `get_from` and `get_recipients` now accept optional numeric arguments that specifies where to get sender and recipients for a message. By default, this argument is `0` which means that data is initially checked in the SMTP envelope (meaning `MAIL FROM` and `RCPT TO` SMTP commands) and if the envelope data is inaccessible then it is grabbed from MIME headers. Value `1` means that data is checked on envelope only, while `2` switches mode to MIME headers. Here is an example from the `forged_recipients` module:

~~~lua
local smtp_from = task:get_from(1)
if smtp_from then
local mime_from = task:get_from(2)
if not mime_from or
not (string.lower(mime_from[1]['addr']) ==
string.lower(smtp_from[1]['addr'])) then
task:insert_result(symbol_sender, 1)
end
end
~~~

### Protocol changes

rspamd now uses `HTTP` protocols for all operations, therefore an additional client library is unlikely to be needed. The fallback to the old `spamc` protocol has also been implemented to be automatically compatible with `rmilter` and other software that uses the `rspamc` protocol.

+ 0
- 10
doc/markdown/modules/chartable.md Ver fichero

@@ -1,10 +0,0 @@
# Chartable module

This module allows to find number of characters from the different [unicode scripts](http://www.unicode.org/reports/tr24/). Finally, it evaluates number of scrips changes, e.g. 'a網絡a' is treated as 2 script changes - from latin to chineese and from chineese back to latin, divided by total number of unicode characters. If the product of this division is higher than threshold then a symbol is inserted. By default threshold is `0.1` meaning that script changes occurrs approximantely for 10% of characters.

~~~ucl
chartable {
symbol = "R_CHARSET_MIXED";
threshold = 0.1;
}
~~~

+ 0
- 40
doc/markdown/modules/dcc.md Ver fichero

@@ -1,40 +0,0 @@
# DCC module

This modules performs [DCC](http://www.dcc-servers.net/dcc/) lookups to determine
the *bulkiness* of a message (e.g. how many recipients have seen it).

Identifying bulk messages is very useful in composite rules e.g. if a message is
from a freemail domain *AND* the message is reported as bulk by DCC then you can
be sure the message is spam and can assign a greater weight to it.

Please view the License terms on the DCC website before you enable this module.

## Module configuration

This module requires that you have the `dccifd` daemon configured, running and
working correctly. To do this you must download and build the [latest DCC client]
(https://www.dcc-servers.net/dcc/source/dcc.tar.Z). Once installed, edit
`/var/dcc/dcc_conf` set `DCCIFD_ENABLE=on` and set `DCCM_LOG_AT=NEVER` and
`DCCM_REJECT_AT=MANY`, then start the daemon by running `/var/dcc/libexec/rcDCC start`.

Once the `dccifd` daemon is started it will listen on the UNIX domain socket /var/dcc/dccifd
and all you have to do is tell the rspamd where `dccifd` is listening:

~~~ucl
dcc {
host = "/var/dcc/dccifd";
# Port is only required if `dccifd` listens on a TCP socket
# port = 1234
}
~~~

Once this module is configured it will write the DCC output to the rspamd as each
message is scanned:

`````
Apr 5 14:19:53 mail1-ewh rspamd: (normal) lua; dcc.lua:98: sending to dcc: client=217.78.2.204#015DNSERROR helo="003b046f.slimabs.top" envfrom="23SecondAbs@slimabs.top" envrcpt="xxxx@xxxx.com"
Apr 5 14:19:53 mail1-ewh rspamd: (normal) lua; dcc.lua:65: DCC result=R disposition=R header="X-DCC--Metrics: xxxxx.xxxx.com 1282; bulk Body=1 Fuz1=1 Fuz2=many"
`````

Any messages that DCC returns a *reject* result for (based on the configured `DCCM_REJECT_AT`
value) will cause the symbol `DCC_BULK` to fire.

+ 0
- 32
doc/markdown/modules/dkim.md Ver fichero

@@ -1,32 +0,0 @@
# DKIM module

This module checks [DKIM](http://www.dkim.org/) signatures for emails scanned.
DKIM signatures can establish that this specific message has been signed by a trusted
relay. For example, if a message comes from `gmail.com` then a valid DKIM signature
means that this message was definitely signed by `gmail.com` (unless gmail.com private
key has been compromised, which is not a likewise case).

## Principles of work

Rspamd can deal with many types of DKIM signatures and messages canonicalisation.
The major difficulty with DKIM are line endings: many MTA treat them differently which
leads to broken signatures. Basically, rspamd treats all line endings as `CR+LF` that
is compatible with the most of DKIM implementations.

## Configuration

DKIM module has several useful configuration options:

- `dkim_cache_size` (or `expire`) - maximum size of DKIM keys cache
- `whitelist` - a map of domains that should not be checked with DKIM (e.g. if that domains have totally broken DKIM signer)
- `domains` - a map of domains that should have more strict scores for DKIM violation
- `strict_multiplier` - multiply the value of symbols by this value if received from `domains` map
- `trusted_only` - do not check DKIM signatures for all domains but those which are from the `domains` map
- `skip_multi` - skip DKIM check for messages with multiple signatures

The last option can help for some circumstances when rspamd lacks the proper support of
multiple DKIM signatures. Unfortunately, with some mailing lists, or other software
this option could be useful to reduce false positives rate as rspamd deals with
multiple signatures poorly: it just uses the first one to check. On the other hand,
the proper support of multiple DKIM signatures is planned to be implemented in rspamd
in the next releases, which will make this option meaningless.

+ 0
- 48
doc/markdown/modules/dmarc.md Ver fichero

@@ -1,48 +0,0 @@
# DMARC module

DMARC is a technology leveraging SPF & DKIM which allows domain owners to publish policies regarding how messages bearing
their domain in the RFC5322.From field should be handled (for example to quarantine or reject messages which do not have an
aligned DKIM or SPF identifier) and to elect to receive reporting information about such messages (to help them identify
abuse and/or misconfiguration and make informed decisions about policy application).

## DMARC in rspamd

The default configuration for the DMARC module in rspamd is an empty collection:

~~~ucl
dmarc {
}
~~~

This is enough to enable the module and check/apply DMARC policies.

Symbols added by the module are as follows:

- `DMARC_POLICY_ALLOW`: Message was authenticated & allowed by DMARC policy
- `DMARC_POLICY_REJECT`: Authentication failed- rejection suggested by DMARC policy
- `DMARC_POLICY_QUARANTINE`: Authentication failed- quarantine suggested by DMARC policy
- `DMARC_POLICY_SOFTFAIL`: Authentication failed- no action suggested by DMARC policy

Rspamd is able to store records in `redis` which could be used to generate DMARC aggregate reports but there is as of yet no available tool to generate such reports from these. Format of the records stored in `redis` is as follows:

unixtime,ip,spf_result,dkim_result,dmarc_disposition

where spf and dkim results are `true` or `false` indicating wether an aligned spf/dkim identifier was found and dmarc_disposition is one of `none`/`quarantine`/`reject` indicating policy applied to the message.

These records are added to a list named $prefix$domain where $domain is the domain which defined policy for the message being reported on and $prefix is the value of the `key_prefix` setting (or "dmarc_" if this isn't set).

Keys are inserted to redis servers when a server is selected by hash value from sender's domain.

To enable storing of report information, `reporting` must be set to `true`.

~~~ucl
dmarc {
# Enables storing reporting information to redis
reporting = true;
# If Redis server is not configured below, settings from redis {} will be used
#servers = "127.0.0.1:6379"; # Servers to use for reads and writes (can be a list)
# Alternatively set read_servers / write_servers to split reads and writes
# To set custom prefix for redis keys:
#key_prefix = "dmarc_";
}
~~~

+ 0
- 0
doc/markdown/modules/emails.md Ver fichero


+ 0
- 40
doc/markdown/modules/fann.md Ver fichero

@@ -1,40 +0,0 @@
# Neural network module

Neural network module is an experimental module that allows to perform post-classification of messages based on their current symbols and some training corpus obtained from the previous learns.

To use this module, you need to build rspamd with `libfann` support. It is normally enabled if you use pre-built packages, however, it could be specified using `-DENABLE_FANN=ON` to `cmake` command during build process.

The idea behind this module is to learn which symbols combinations are common for spamd and which are common for ham. To achieve this goal, fann module studies log files via `log_helper` worker unless gathering some reasonable amount of log samples (`1k` by default). Neural network is learned for spam when a message has `reject` action (definite spam) and it is learned as ham when a message has negative score. You could also use your own criteria for learning.

Training is performed in background and after some amount of trains (`1k` again) neural network is updated on the disk allowing scanners to load and update their own data.

After some amount of such iterations (`100` by default), the training process removes old neural network and starts training new one. This is done to ensure that old data does not influence on the current processing. The neural network is also reset when you add or remove rules from rspamd. Once trained, neural network data is saved into file so it could persist between restarts. The current training epoch is however vanished upon restart.

## Configuration

First of all, you need a special worker called `log_helper` to accept rspamd scan results. This logger has a trivial setup:

~~~ucl
worker "log_helper" {
count = 1;
}
~~~

Then you'd need to setup fann plugin:

~~~ucl
fann_scores {
fann_file = "${DBDIR}/data.fann"; # Used to store ANN file on disk
train {
max_train = 10k; # Number of trains per epoch
max_epoch = 1k # Number of epoch while ANN data is valid
spam_score = 8; # Score to learn spam
ham_score = -2; # Score to learn ham
}
use_settings = false; # If enabled, then settings-id could switch this module to another FANN
}
~~~

## Settings usage

TODO

+ 0
- 0
doc/markdown/modules/forged_recipients.md Ver fichero


+ 0
- 163
doc/markdown/modules/fuzzy_check.md Ver fichero

@@ -1,163 +0,0 @@
# Fuzzy check module

This module is intended to check messages for specific fuzzy patterns stored in
[fuzzy storage workers](../workers/fuzzy_storage.md). At the same time, this module
is responsible for learning fuzzy storage with message patterns.

## Fuzzy patterns

Rspamd uses `shingles` algorithm to perform fuzzy match of messages. This algorithm
is probabilistic and uses words chains to detect some common patterns and filter
thus spam or ham messages. Shingles algorithm is described in the following
[research paper](http://dl.acm.org/citation.cfm?id=283370). We use 3-gramms for this
algorithm and [siphash](https://131002.net/siphash/) for hash function. Currently,
rspamd uses 32 hashes for shingles. Using of siphash allows private storages to be
used, since nobody can generate the same sequence of hashes without some shared
secret called `shingles key`. By default, rspamd uses the string `rspamd` as siphash
key, however, it is possible change this value from the configuration.

Each shingles set is accompanied by a collision resistant hash, namely [blake2](https://blake2.net/) hash.
This digest is used as unique ID of the hash.

Attachements and images are not currently matched against fuzzy hashes, however they
are checked by means blake2 digests using strict match.

## Module configuration

Fuzzy check module has several global options and allows to specify multiple match
storages. Global options include:

- `symbol`: default symbol to insert (if no flags matches)
- `min_length`: minimum length of text parts in words to perform fuzzy check (default - check all text parts)
- `min_bytes`: minimum lenght of attachements and images in bytes to check them in fuzzy storage
- `whitelist`: IP list to skip all fuzzy checks
- `timeout`: timeout for reply waiting

Fuzzy rules are defined as a set of `rule` definitions. Each `rule` must have servers
list to check or learn and a set of flags and optional parameters. Here is an example of
rule's settings:

~~~ucl
fuzzy_check {
rule {
# List of servers, can be an array or multi-value item
servers = "localhost:11335";
servers = "highsecure.ru:11335";

# Default symbol
symbol = "FUZZY_UNKNOWN";

# List of additional mime types to be checked in this fuzzy
mime_types = "application/pdf";
mime_types = ["application/*", "*/octet-stream", "*"];

# Maximum global score for all maps
max_score = 20.0;

# Ignore flags that are not listed in maps for this rule
skip_unknown = yes;

# If this value is false, then allow learning for this fuzzy rule
read_only = no;

# Key for strict digests (default: "rspamd")
fuzzy_key = "somebigrandomstring";

# Key for fuzzy siphash (default: "rspamd")
fuzzy_shingles_key = "anotherbigrandomstring";

# maps
}
}
~~~

Each rule can have several maps defined by a `flag` value. For example, a single
fuzzy storage can contain both good and bad hashes that should have different symbols
and thus different weights. Maps are defined inside fuzzy rules as following:

~~~ucl
fuzzy_check {
rule {
...
fuzzy_map = {
FUZZY_DENIED {
# Maximum weight for this list
max_score = 20.0;
# Flag value
flag = 1
}
FUZZY_PROB {
max_score = 10.0;
flag = 2
}
FUZZY_WHITE {
max_score = 2.0;
flag = 3
}
}
}
~~~

The meaning of `max_score` can be rather unclear. First of all, all hashes in
fuzzy storage have their own weights. For example, if we have a hash `A` and 100 users
marked it as spam hash, then it will have weight of `100 * single_vote_weight`.
Therefore, if a `single_vote_weight` is `1` then the final weight will be `100` indeed.
`max_score` means the weight that is required for the rule to add symbol with the maximum
score 1.0 (that will be of course multiplied by metric's weigth). In our example,
if the weight of hash is `100` and `max_score` will be `99`, then the rule will be
added with the weight of `1`. If `max_score` is `200`, then the rule will be added with the
weight likely `0.2` (the real function is hyperbolic tangent). In the following configuration:

~~~ucl
metric {
name = "default";
...
symbol {
name = "FUZZY_DENIED";
weght = "10.0";
}
...
}
fuzzy_check {
rule {
...
fuzzy_map = {
FUZZY_DENIED {
# Maximum weight for this list
max_score = 20.0;
# Flag value
flag = 1
}
...
}
}
~~~

If a hash has value `10`, then a symbol `FUZZY_DENIED` with weight of `2.0` will be added.
If a hash has value `100500`, then `FUZZY_DENIED` will have weight `10.0`.

## Learning fuzzy_check

Module `fuzzy_check` also allows to learn messages. You can use `rspamc` command or
connect to the **controller** worker using HTTP protocol. For learning you must check
the following settings:

1. Controller worker should be accessible by `rspamc` or HTTP (check `bind_socket`)
2. Controller should allow privilleged commands for this client (check `enable_password` or `allow_ip` settings)
3. Controller should have `fuzzy_check` module configured to the servers specified
4. You should know `fuzzy_key` and `fuzzy_shingles_key` to operate with this storage
5. Your `fuzzy_check` module should have `fuzzy_map` configured to the flags used by server
6. Your `fuzzy_check` rule must have `read_only` option being turned off - `read_only = false`
7. Your `fuzzy_storage` worker should allow updates from the controller's host (`allow_update` option)
8. Your controller should be able to communicate with fuzzy storage by means of `UDP` protocol

If all these conditions are met then you can learn messages with rspamc:

rspamc -w <weight> -f <flag> fuzzy_add ...

or delete hashes:

rspamc -f <flag> fuzzy_del ...

On learning, rspamd sends commands to **all** servers inside specific rule. On check,
rspamd selects a server in round-robin matter.

+ 0
- 70
doc/markdown/modules/index.md Ver fichero

@@ -1,70 +0,0 @@
# Rspamd modules

Rspamd ships with a set of modules. Some modules are written in C to speedup
complex procedures while others are written in lua to reduce code size.
Actually, new modules are encouraged to be written in lua and add the essential
support to the Lua API itself. Truly speaking, lua modules are very close to
C modules in terms of performance. However, lua modules can be written and loaded
dynamically.

## C Modules

C modules provides core functionality of rspamd and are actually statically linked
to the main rspamd code. C modules are defined in the `options` section of rspamd
configuration. If no `filters` attribute is defined then all modules are disabled.
The default configuration enables all modules explicitly:

~~~ucl
filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check";
~~~

Here is the list of C modules available:

- [regexp](regexp.md): the core module that allow to define regexp rules,
rspamd internal functions and lua rules.
- [surbl](surbl.md): this module extracts URLs from messages and check them against
public DNS black lists to filter messages with malicious URLs.
- [spf](spf.md): checks SPF records for messages processed.
- [dkim](dkim.md): performs DKIM signatures checks.
- [dmarc](dmarc.md): performs DKIM signatures checks.
- [fuzzy_check](fuzzy_check.md): checks messages fuzzy hashes against public blacklists.
- [chartable](chartable.md): checks character sets of text parts in messages.

## Lua modules

Lua modules are dynamically loaded on rspamd startup and are reloaded on rspamd
reconfiguration. Should you want to write a lua module consult with the
[Lua API documentation](../lua/). To define path to lua modules there is a special section
named `modules` in rspamd:

~~~ucl
modules {
path = "/path/to/dir/";
path = "/path/to/module.lua";
path = "$PLUGINSDIR/lua";
}
~~~

If a path is a directory then rspamd scans it for `*.lua" pattern and load all
files matched.

Here is the list of Lua modules shipped with rspamd:

- [multimap](multimap.md) - a complex module that operates with different types
of maps.
- [rbl](rbl.md) - a plugin that checks messages against DNS blacklist based on
either SMTP FROM addresses or on information from `Received` headers.
- [emails](emails.md) - extract emails from a message and checks it against DNS
blacklists.
- [maillist](maillist.md) - determines the common mailing list signatures in a message.
- [once_received](once_received.md) - detects messages with a single `Received` headers
and performs some additional checks for such messages.
- [phishing](phishing.md) - detects messages with phished URLs.
- [ratelimit](ratelimit.md) - implements leaked bucket algorithm for ratelimiting and
uses `redis` to store data.
- [trie](trie.md) - uses suffix trie for extra-fast patterns lookup in messages.
- [mime_types](mime_types.md) - applies some rules about mime types met in messages
- [rspamd_update](rspamd_update.md) - load dynamic rules and other rspamd updates
- [spamassassin](spamassassin.md) - load spamassassin rules
- [dmarc](dmarc.md) - performs DMARC policy checks
- [dcc](dcc.md) - performs [DCC](http://www.dcc-servers.net/dcc/) lookups to determine message bulkiness

+ 0
- 15
doc/markdown/modules/maillist.md Ver fichero

@@ -1,15 +0,0 @@
# Mail list module

Mailing list module is a simple module that performs checks whether a message is
sent over some popular mailing lists software. This module is designed to negate
some rules as they are likely to be touched unnecessarily if a message comes from
some list.

Here is a list of currently supported mailing lists programs:

- Ezmlm
- Mailman
- Google groups
- Majordomo
- Communigate PRO mailing lists
- subscribe.ru mailing list

+ 0
- 30
doc/markdown/modules/mime_types.md Ver fichero

@@ -1,30 +0,0 @@
# Rspamd mime types module

This module is intended to do some mime types sanity checks. That includes the following:

1. Checks whether mime type is from the `good` list (e.g. `multipart/alternative` or `text/html`)
2. Checks if a mime type is from the `bad` list (e.g. `multipart/form-data`)
3. Checks if an attachement filename extension is different from the intended mime type

## Configuration

`mime_types` module reads mime types map specified in `file` option. This map contains binding

```
type/subtype score
```

When score is more than `0` then it is considered as `bad` if it is less than `0` it is considered as `good` (with the corresponding multiplier).
When mime type is not listed then `MIME_UNKNOWN` symbol is inserted.

`extension_map` option allows to specify map from a known extension to a specific mime type:

~~~ucl
extension_map = {
html = "text/html";
txt = "text/plain";
pdf = "application/pdf";
}
~~~

When an attachement extension matches left part but the content type does not match the right part then symbol `MIME_BAD_ATTACHMENT` is inserted.

+ 0
- 162
doc/markdown/modules/multimap.md Ver fichero

@@ -1,162 +0,0 @@
# Multimap module

Multimap module is designed to handle rules that are based on different types of maps.

## Principles of work

Maps in rspamd are the files or HTTP links that are automatically monitored and reloaded
if changed. For example, maps can be defined as following:

"http://example.com/file"
"file:///etc/rspamd/file.map"
"/etc/rspamd/file.map"

Rspamd respects `304 Not Modified` reply from HTTP server allowing to save traffic
when a map has not been actually changed since last load. For file maps, rspamd uses normal
`mtime` attribute (time modified). The global map watching settings are defined in the
`options` section of the configuration file:

* `map_watch_interval`: defines time when all maps are rescanned; the actual check interval is jittered to avoid simultaneous checking (hence, the real interval is from this value up to the this interval doubled).

Multimap module allows to build rules based on the dynamic maps content. Rspamd supports the following
map types in this module:

* `hash map` - a list of domains or `user@domain`
* `regexp map` - a list of regular expressions
* `ip map` - an effective radix trie of `ip/mask` values (supports both IPv4 and IPv6 addresses)
* `cdb` - constant database format (files only)

Multimap has different message attributes to be checked via maps.


Multimap can also be used for pre-filtering of message: so if map matches then no further checks will be performed. This feature is particularly useful for whitelisting, blacklisting and allows to save scan resources. To enable this mode just add `action` option to the map configuration (see below).

## Configuration

The module itself contains a set of rules in form:

symbol { type = type; map = uri; [optional params] }

### Map types

Type attribute means what is matched with this map. The following types are supported:

* `ip` - matches source IP of message (radix map)
* `from` - matches envelope from (or header `From` if envelope from is absent)
* `rcpt` - matches any of envelope rcpt or header `To` if envelope info is missing
* `header` - matches any header specified (must have `header = "Header-Name"` configuration attribute)
* `dnsbl` - matches source IP against some DNS blacklist (consider using [RBL](rbl.md) module for this)
* `url` - matches URLs in messages against maps
* `filename` - matches attachment filename against map

DNS maps are legacy and are not encouraged to use in new projects (use [rbl](rbl.md) for that).

Maps can also be specified as [CDB](http://www.corpit.ru/mjt/tinycdb.html) databases which might be useful for large maps:

map = "cdb:///path/to/file.cdb";

### Pre-filter maps

To enable pre-filter support, you should specify `action` parameter which can take the
following values:

* `accept` - accept a message (no action)
* `add header` or `add_header` - adds a header to message
* `rewrite subject` or `rewrite_subject` - change subject
* `greylist` - greylist message
* `reject` - drop message

No filters will be processed for a message if such a map matches.

~~~ucl
multimap {
test { type = "ip"; map = "/tmp/ip.map"; symbol = "TESTMAP"; }
spamhaus { type = "dnsbl"; map = "pbl.spamhaus.org"; symbol = "R_IP_PBL";
description = "PBL dns block list"; } # Better use RBL module instead
}
~~~

### Regexp maps


All maps but `ip` and `dnsbl` support `regexp` mode. In this mode, all keys in maps are treated as regular expressions, for example:

/example\d+\.com/i
/other\d+\.com/i test
# Comments are still enabled

For performance considerations, use only expressions supported by [hyperscan](http://01org.github.io/hyperscan/dev-reference/compilation.html#pattern-support) as this engine provides blazing performance at no additional cost. Currently, there is no way to distinguish what particular regexp was matched in case if multiple regexp were matched.

To enable regexp mode, you should set `regexp` option to `true`:

~~~ucl
sender_from_whitelist_user {
type = "from";
map = "file:///tmp/from.map";
symbol = "SENDER_FROM_WHITELIST";
regexp = true;
}
~~~

### Map filters

It is also possible to apply a filtering expression before checking value against some map. This is mainly useful
for `header` rules. Filters are specified with `filter` option. Rspamd supports the following filters so far:

* `email` or `email:addr` - parse header value and extract email address from it (`Somebody <user@example.com>` -> `user@example.com`)
* `email:user` - parse header value as email address and extract user name from it (`Somebody <user@example.com>` -> `user`)
* `email:domain` - parse header value as email address and extract user name from it (`Somebody <user@example.com>` -> `example.com`)
* `email:name` - parse header value as email address and extract displayed name from it (`Somebody <user@example.com>` -> `Somebody`)
* `regexp:/re/` - extracts generic information using the specified regular expression

URL maps allows another set of filters (by default, url maps are matched using hostname part):

* `tld` - matches TLD (top level domain) part of urls
* `full` - matches the complete URL not the hostname
* `is_phished` - matches hostname but if and only if the URL is phished (e.g. pretended to be from another domain)
* `regexp:/re/` - extracts generic information using the specified regular expression from the hostname
* `tld:regexp:/re/` - extracts generic information using the specified regular expression from the TLD part
* `full:regexp:/re/` - extracts generic information using the specified regular expression from the full URL text

Filename maps support this filters set:

* `extension` - matches file extension
* `regexp:/re/` - extract data from filename according to some regular expression

Here are some examples of pre-filter configurations:

~~~ucl
sender_from_whitelist_user {
type = "from";
filter = "email:user";
map = "file:///tmp/from.map";
symbol = "SENDER_FROM_WHITELIST_USER";
action = "accept"; # Prefilter mode
}
sender_from_regexp {
type = "header";
header = "from";
filter = "regexp:/.*@/";
map = "file:///tmp/from_re.map";
symbol = "SENDER_FROM_REGEXP";
}
url_map {
type = "url";
filter = "tld";
map = "file:///tmp/url.map";
symbol = "URL_MAP";
}
url_tld_re {
type = "url";
filter = "tld:regexp:/\.[^.]+$/"; # Extracts the last component of URL
map = "file:///tmp/url.map";
symbol = "URL_MAP_RE";
}
filename_blacklist {
type = "filename";
filter = "extension";
map = "/${LOCAL_CONFDIR}/filename.map";
symbol = "FILENAME_BLACKLISTED";
action = "reject";
}
~~~

+ 0
- 22
doc/markdown/modules/once_received.md Ver fichero

@@ -1,22 +0,0 @@
# Once received module

This module is intended to do simple checks for mail with one `Received` header. The idea behind these checks is that legitimate mail likely has more than one received and some bad patterns, such as `dynamic` or `broadband` are common for spam from hacked users' machines.

## Configuration

The configuration of this module is pretty straightforward: specify `symbol` for generic one received mail, specify `symbol_strict` for emails with bad patterns or with unresolvable hostnames and add **good** and **bad** patterns. Patterns can contain [lua patterns](http://lua-users.org/wiki/PatternsTutorial). `good_host` lines are used to negate this module for certain hosts, `bad_host` lines are used to specify certain bad patterns. It is also possible to specify `whitelist` to define a list of networks for which `once_received` checks should be excluded.

## Example

~~~ucl
once_received {
good_host = "^mail";
bad_host = "static";
bad_host = "dynamic";
symbol_strict = "ONCE_RECEIVED_STRICT";
symbol = "ONCE_RECEIVED";
whitelist = "/tmp/ip.map";
}
~~~

IP map can contain, as usually, IP's (both v4 and v6), networks (in CIDR notation) and optional comments starting from `#` symbol.

+ 0
- 114
doc/markdown/modules/phishing.md Ver fichero

@@ -1,114 +0,0 @@
# Phishing module

This module is designed to report about potentially phished URL's.

## Principles of phishing detection

Rspamd tries to detect phished URL's merely in HTML text parts. First,
it get URL from `href` or `src` attribute and then tries to find the text enclosed
within this link tag. If some url is also enclosed in the specific tag then
rspamd decides to compare whether these two URL's are related, namely if they
belong to the same top level domain. Here are examples of urls that are considered
to be non-phished:

<a href="http://sub.example.com/path">http://example.com/other</a>
<a href="https://user:password@sub.example.com/path">http://example.com/</a>

And the following URLs are considered as phished:

<a href="http://evil.co.uk">http://example.co.uk</a>
<a href="http://t.co/xxx">http://example.com</a>
<a href="http://redir.to/example.com">http://example.com</a>

## Configuration of phishing module

Here is an example of full module configuration.

~~~ucl
phishing {
symbol = "R_PHISHING"; # Default symbol

# Check only domains from this list
domains = "file:///path/to/map";

# Make exclusions for known redirectors
# Entry format: URL/path for map, colon, name of symbol
redirector_domains = [
"${CONFDIR}/redirectors.map:REDIRECTOR_FALSE"
];
# For certain domains from the specified strict maps
# use another symbol for phishing plugin
strict_domains = [
"${CONFDIR}/paypal.map:PAYPAL_PHISHING"
];
}
~~~

If an anchoring (actual as opposed to phished) domain is found in a map
referenced by the `redirector_domains` setting then the related symbol is
yielded and the URL is not checked further. This allows making exclusions
for known redirectors, especially ESPs.

Further to this, if the phished domain is found in a map referenced by
`strict_domains` the related symbol is yielded and the URL not checked
further. This allows fine-grained control to avoid false positives and
enforce some really bad phishing mails, such as bank phishing or other
payments system phishing.

Finally, the default symbol is yielded- if `domains` is specified then
only if the phished domain is found in the related map.

Maps for this module can consist of effective second level domain parts (eSLD)
or whole domain parts of the URLs (FQDN) as well.

## Openphish support

Since version 1.3, there is [openphish](https://openphish.com) support in rspamd.
Now rspamd loads this public feed as a map (using HTTPS) and checks URLs in messages using
openphish list. If any match is found, then rspamd adds symbol `PHISHED_OPENPHISH`.

If you use research or commercial data feed, rspamd can also use its data and gives
more details about URLs found: their sector (e.g. 'Finance'), brand name (e.g.
'Bank of Zimbabwe') and other useful information.

There are couple of options available to configure openphish module:

~~~ucl
phishing {
# URL of feed, default is public url:
openphish_map = "https://www.openphish.com/feed.txt";
# For premium feed, change that to your personal URL, e.g.
# openphish_map = "https://openphish.com/samples/premium_feed.json";

# Change this to true if premium feed is enabled
openphish_premium = false;
}
~~~

## Phishtank support

There is also [phishtank](https://phishtank.com) support in rspamd since 1.3. Unlike
openphish feed, phishtank's one is not enabled by default since it has quite a big size (about 50Mb) so
you might want to setup some reverse proxy (e.g. nginx) to cache that data among rspamd instances:

~~~nginx
proxy_cache_path /data/nginx/cache levels=1:2 keys_zone=phish:10m;

server {
listen 8080;
location / {
proxy_pass http://data.phishtank.com:80;
proxy_cache phish;
proxy_cache_lock on;
}
}
~~~


To enable phishtank feed, you can edit `local.d/phishing.conf` file and add the following lines there:

~~~ucl
phishtank_enabled = true;
# Where nginx is installed
phishtank_map = "http://localhost:8080/data/online-valid.json";
~~~

+ 0
- 94
doc/markdown/modules/ratelimit.md Ver fichero

@@ -1,95 +0,0 @@
# Ratelimit plugin

Ratelimit plugin is designed to limit messages coming from certain senders, to
certain recipients from certain IP addresses combining these parameters into
a separate limits.

All limits are stored in [redis](http://redis.io) server (or servers cluster) to enable
shared cache between different scanners.

## Module configuration

In the default configuration, there are no cache servers specified, hence, the module won't work unless you add this option to the configuration.

`Ratelimit` module supports the following configuration options:

- `servers` - list of servers where ratelimit data is stored
- `whitelisted_rcpts` - comma separated list of whitelisted recipients. By default
the value of this option is 'postmaster, mailer-daemon'
- `whitelisted_ip` - a map of ip addresses or networks whitelisted
- `max_rcpts` - do not apply ratelimit if it contains more than this value of recipients (5 by default). This
option allows to avoid too many work for setting buckets if there are a lot of recipients in a message).
- `max_delay` - maximum lifetime for any limit bucket (1 day by default)
- `rates` - a table of allowed rates in form:

type = [burst,leak];

Where `type` is one of:

- `to`
- `to_ip`
- `to_ip_from`
- `bounce_to`
- `bounce_to_ip`

`burst` is a capacity of a bucket and `leak` is a rate in messages per second.
Both these attributes are floating point values.

- `symbol` - if this option is specified, then `ratelimit` plugin just adds the corresponding symbol instead of setting pre-result, the value is scaled as $$ 2 * tanh(\frac{bucket}{threshold * 2}) $$, where `tanh` is the hyperbolic tanhent function

## Principles of work

The basic principle of ratelimiting in rspamd is called `leaked bucket`. It could
be visually represented as a bucket that has some capacity, and a small hole in a bottom.
Messages comes to this bucket and leak through the hole over time (it doesn't delay messages, just count them). If the capacity of
a bucket is exhausted, then a temporary reject is sent. This happens unless the capacity
of bucket is enough to accept more messages (and since messages are leaking then after some
time, it will be possible to process new messages).

Rspamd uses 3 types of limit buckets:

- `to` - a bucket based on a recipient only
- `to:ip` - a bucket combining a recipient and a sender's IP
- `to:from:ip` - a bucket combining a recipient, a sender and a sender's IP

For bounce messages there are special buckets that lack `from` component and have more
restricted limits. Rspamd treats the following senders as bounce senders:

- 'postmaster',
- 'mailer-daemon'
- '' (empty sender)
- 'null'
- 'fetchmail-daemon'
- 'mdaemon'

Each recipient has its own triple of buckets, hence it is useful
to limit number of recipients to check.

Each bucket has two parameters:
- `capacity` - how many messages could go into a bucket before a limit is reached
- `leak` - how many messages per second are leaked from a bucket.

For example, a bucket with capacity `100` and leak `1` can accept up to 100 messages but then
will accept not more than a message per second.

By default, ratelimit module has the following settings which disable all limits:

~~~lua
local settings = {
-- Limit for all mail per recipient (burst 100, rate 2 per minute)
to = {0, 0.033333333},
-- Limit for all mail per one source ip (burst 30, rate 1.5 per minute)
to_ip = {0, 0.025},
-- Limit for all mail per one source ip and from address (burst 20, rate 1 per minute)
to_ip_from = {0, 0.01666666667},

-- Limit for all bounce mail (burst 10, rate 2 per hour)
bounce_to = {0, 0.000555556},
-- Limit for bounce mail per one source ip (burst 5, rate 1 per hour)
bounce_to_ip = {0, 0.000277778},

-- Limit for all mail per user (authuser) (burst 20, rate 1 per minute)
user = {0, 0.01666666667}
}
~~~

+ 0
- 116
doc/markdown/modules/rbl.md Ver fichero

@@ -1,116 +0,0 @@
# RBL module

The RBL module provides support for checking the IPv4/IPv6 source address of a message's sender against a set of RBLs as well as various less conventional methods of using RBLs: against addresses in Received headers; against the reverse DNS name of the sender and against the parameter used for HELO/EHLO at SMTP time.

Configuration is structured as follows:

~~~ucl
rbl {
# default settings defined here
rbls {
# 'rbls' subsection under which the RBL definitions are nested
an_rbl {
# rbl-specific subsection
}
# ...
}
}
~~~

The default settings define the ways in which the RBLs are used unless overridden in an RBL-specific subsection.

Defaults may be set for the following parameters (default values used if these are not set are shown in brackets - note that these may be redefined in the default config):

- default_ipv4 (true)

Use this RBL to test IPv4 addresses.

- default_ipv6 (false)

Use this RBL to test IPv6 addresses.

- default_received (true)

Use this RBL to test IPv4/IPv6 addresses found in Received headers. The RBL should also be configured to check one/both of IPv4/IPv6 addresses.

- default_from (false)

Use this RBL to test IPv4/IPv6 addresses of message senders. The RBL should also be configured to check one/both of IPv4/IPv6 addresses.

- default_rdns (false)

Use this RBL to test reverse DNS names of message senders (hostnames passed to rspamd should have been validated with a forward lookup, particularly if this is to be used to provide whitelisting).

- default_helo (false)

Use this RBL to test parameters sent for HELO/EHLO at SMTP time.

- default_dkim (false)

Use this RBL to test domains found in validated DKIM signatures.

- default_dkim_domainonly (true)

If true test top-level domain only, otherwise test entire domain found in DKIM signature.

- default_emails (false)

Use this RBL to test email addresses in form [localpart].[domainpart].[rbl] or if set to "domain_only" uses [domainpart].[rbl].

- default_unknown (false)

If set to false, do not yield a result unless the response received from the RBL is defined in its related returncodes {} subsection, else return the default symbol for the RBL.

- default_exclude_users (false)

If set to true, do not use this RBL if the message sender is authenticated.

- default_exclude_private_ips (true)

If true, do not use the RBL if the sending host address is in `local_addrs` & do not check received headers baring these addresses.

- default_exclude_local (true)

If true & local_exclude_ip_map has been set - do not use the RBL if the sending host address is in the local IP list & do not check received headers baring these addresses.

- default_is_whitelist (false)

If true matches on this list should neutralise any listings where this setting is false and ignore_whitelists is not true.

- default_ignore_whitelists (false)

If true this list should not be neutralised by whitelists.

Other parameters which can be set here are:

- local_exclude_ip_map

Can be set to a URL of a list of IPv4/IPv6 addresses & subnets not to be considered as local exclusions by exclude_local checks.

RBL-specific subsection is structured as follows:

~~~ucl
# Descriptive name of RBL or symbol if symbol is not defined.
an_rbl {
# Explicitly defined symbol
symbol = "SOME_SYMBOL";
# RBL-specific defaults (where different from global defaults)
#The global defaults may be overridden using 'helo' to override 'default_helo' and so on.
ipv6 = true;
ipv4 = false;
# Address used for RBL-testing
rbl = "v6bl.example.net";
# Possible responses from RBL and symbols to yield
returncodes {
# Name_of_symbol = "address";
EXAMPLE_ONE = "127.0.0.1";
EXAMPLE_TWO = "127.0.0.2";
}
}
~~~

The following extra settings are valid in the RBL subsection:

- whitelist_exception

(For whitelists) - Symbols named as parameters for this setting will not be used for neutralising blacklists (set this multiple times to add multiple exceptions).

+ 0
- 146
doc/markdown/modules/regexp.md Ver fichero

@@ -1,146 +0,0 @@
# Rspamd regexp module

This is a core module that deals with regexp expressions to filter messages.

## Principles of work

Regexp module operates with `expressions` - a logical sequence of different `atoms`. Atoms
are elements of the expression and could be represented as regular expressions, rspamd
functions and lua functions. Rspamd supports the following operators in expressions:

* `&&` - logical AND (can be also written as `and` or even `&`)
* `||` - logical OR (`or` `|`)
* `!` - logical NOT (`not`)
* `+` - logical PLUS, usually used with comparisons:
- `>` more than
- `<` less than
- `>=` more or equal
- `<=` less or equal

Whilst logical operators are clear for understanding, PLUS is not so clear. In rspamd,
it is used to join multiple atoms or subexpressions and compare them to a specific number:

A + B + C + D > 2 - evaluates to `true` if at least 3 operands are true
(A & B) + C + D + E >= 2 - evaluates to `true` if at least 2 operands are true

Operators has their own priorities:
1. NOT
2. PLUS
3. COMPARE
4. AND
5. OR

You can change priorities by braces, of course. All operations are *right* associative in rspamd.
While evaluating expressions, rspamd tries to optimize their execution time by reordering and does not evaluate
unnecessary branches.

## Expressions components

Rspamd support the following components within expressions:

* Regular expressions
* Internal functions
* Lua global functions (not widely used)

### Regular expressions

In rspamd, regular expressions could match different parts of messages:

* Headers (should be `Header-Name=/regexp/flags`), mime headers
* Full headers string
* Textual mime parts
* Raw messages
* URLs

The match type is defined by special flags after the last `/` symbol:

* `H` - header regexp
* `X` - undecoded header regexp (e.g. without quoted-printable decoding)
* `B` - MIME header regexp (applied for headers in MIME parts only)
* `R` - full headers content (applied for all headers undecoded and for the message only - **not** including MIME headers)
* `M` - raw message regexp
* `P` - part regexp without HTML tags
* `Q` - part regexp with HTML tags
* `C` - spamassassin `BODY` regexp analogue(see http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.txt)
* `D` - spamassassin `RAWBODY` regexp analogue
* `U` - URL regexp

From 1.3, it is also possible to specify long regexp types for convenience in curly braces:

* `{header}` - header regexp
* `{raw_header}` - undecoded header regexp (e.g. without quoted-printable decoding)
* `{mime_header}` - MIME header regexp (applied for headers in MIME parts only)
* `{all_header}` - full headers content (applied for all headers undecoded and for the message only - **not** including MIME headers)
* `{body}` - raw message regexp
* `{mime}` - part regexp without HTML tags
* `{raw_mime}` - part regexp with HTML tags
* `{sa_body}` - spamassassin `BODY` regexp analogue(see http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.txt)
* `{sa_raw_body}` - spamassassin `RAWBODY` regexp analogue
* `{url}` - URL regexp

Each regexp also supports the following flags:

* `i` - ignore case
* `u` - use utf8 regexp
* `m` - multiline regexp - treat string as multiple lines. That is, change "^" and "$" from matching the start of the string's first line and the end of its last line to matching the start and end of each line within the string
* `x` - extended regexp - this flag tells the regular expression parser to ignore most whitespace that is neither backslashed nor within a bracketed character class. You can use this to break up your regular expression into (slightly) more readable parts. Also, the # character is treated as a metacharacter introducing a comment that runs up to the pattern's closing delimiter, or to the end of the current line if the pattern extends onto the next line.
* `s` - dotall regexp - treat string as single line. That is, change `.` to match any character whatsoever, even a newline, which normally it would not match. Used together, as `/ms`, they let the `.` match any character whatsoever, while still allowing `^` and `$` to match, respectively, just after and just before newlines within the string.
* `O` - do not optimize regexp (rspamd optimizes regexps by default)

### Internal functions

Rspamd supports a set of internal functions to do some common spam filtering tasks:

* `check_smtp_data(type[, str or /re/])` - checks for the specific envelope argument: `from`, `rcpt`, `user`, `subject`
* `compare_encoding(str or /re/)` - compares message encoding with string or regexp
* `compare_parts_distance(inequality_percent)` - if a message is multipart/alternative, compare two parts and return `true` if they are inequal more than `inequality_percent`
* `compare_recipients_distance(inequality_percent)` - check how different are recipients of a message (works for > 5 recipients)
* `compare_transfer_encoding(str or /re/)` - compares message transfer encoding with string or regexp
* `content_type_compare_param(param, str or /re/)` - compare content-type parameter `param` with string or regexp
* `content_type_has_param(param)` - return true if `param` exists in content-type
* `content_type_is_subtype(str or /re/` - return `true` if subtype of content-type matches string or regexp
* `content_type_is_type(str or /re/)`- return `true` if type of content-type matches string or regexp
* `has_content_part(type)` - return `true` if the part with the specified `type` exists
* `has_content_part_len(type, len)` - return `true` if the part with the specified `type` exists and have at least `len` lenght
* `has_fake_html()` - check if there is an HTML part in message with no HTML tags
* `has_html_tag(tagname)` - return `true` if html part contains specified tag
* `has_only_html_part()` - return `true` if there is merely a single HTML part
* `header_exists(header)` - return if a specified header exists in the message
* `is_html_balanced()` - check whether HTML part has balanced tags
* `is_recipients_sorted()` - return `true` if there are more than 5 recipients in a message and they are sorted
* `raw_header_exists()` - does the same as `header_exists`

Many of these functions are just legacy but they are supported in terms of compatibility.

### Lua atoms

Lua atoms now can be lua global functions names or callbacks. This is
a compatibility feature for previously written rules.

### Regexp objects

From rspamd 1.0, it is possible to add more power to regexp rules by using of
table notation while writing rules. A table can have the following fields:

- `callback`: lua callback for the rule
- `re`: regular expression (mutually exclusive with `callback` option)
- `condition`: function of task that determines when a rule should be executed
- `score`: default score
- `description`: default description
- `one_shot`: default one shot settings

Here is an example of table form definition of regexp rule:

~~~lua
config['regexp']['RE_TEST'] = {
re = '/test/i{mime}',
score = 10.0,
condition = function(task)
if task:get_header('Subject') then
return true
end
return false
end,
}
~~~

+ 0
- 48
doc/markdown/modules/replies.md Ver fichero

@@ -1,48 +0,0 @@
# Replies module

This module collects the `message-id` header of messages sent by authenticated users and stores corresponding hashes to Redis, which are set to expire after a configuable amount of time (by default 1 day). Furthermore, it hashes `in-reply-to` headers of all received messages & checks for matches (ie. messages sent in response to messages our system originated)- and yields a symbol which could be used to adjust scoring or forces an action (most likely "no action" to accept) according to configuration.


## Configuration

Settings for the module are described below (default values are indicated in brackets).

- action (null)

If set, apply the given action to messages identified as replies (would typically be set to "no action" to accept).

- expire (86400)

Time, in seconds, after which to expire records (default is one day).

- key_prefix (rr)

String prefixed to keys in Redis.

- message (Message is reply to one we originated)

Message passed when action is forced.

- servers (null)

Comma seperated list of Redis hosts

- symbol (REPLY)

Symbol yielded on messages identified as replies.

## Example

~~~ucl
replies {
# This setting is non-default & is required to be set
servers = "localhost";
# This setting is non-default & may be desirable
action = "no action";
# These are default settings you may want to change
expire = 86400;
key_prefix = "rr";
message = "Message is reply to one we originated";
symbol = "REPLY";
}
~~~

+ 0
- 90
doc/markdown/modules/rspamd_update.md Ver fichero

@@ -1,90 +0,0 @@
# Rspamd update module

This module allows to load rspamd rules, adjust symbols scores and actions without full daemon restart.
`rspamd_update` provides method to backport new rules and scores changing without updating rspamd itself. This might be useful, for example, if you want to use the stable version of rspamd but would like to improve filtering quality at the same time.

## Security considerations

Rspamd update module can execute lua code which is executed with scanner's privilleges - usually `_rspamd` or `nobody` user. Therefore, you should not use untrusted sources of updates.
Rspamd supports digital signatures to check the validity of updates downloaded using [EdDSA](http://ed25519.cr.yp.to/) signatures scheme.
For your own updates that are loaded from the filesystem or from some trusted network you might use unsigned files, however, signing is recommended even in this case.

To sign a map you can use `rspamadm signtool` and to generate signing keypair - `rspamadm kaypair -s -u`:

~~~ucl
keypair {
pubkey = "zo4sejrs9e5idqjp8rn6r3ow3x38o8hi5pyngnz6ktdzgmamy48y";
privkey = "pwq38sby3yi68xyeeuup788z6suqk3fugrbrxieri637bypqejnqbipt1ec9tsm8h14qerhj1bju91xyxamz5yrcrq7in8qpsozywxy";
id = "bs4zx9tcf1cs5ed5mt4ox8za54984frudpzzny3jwdp8mkt3feh7nz795erfhij16b66piupje4wooa5dmpdzxeh5mi68u688ixu3yd";
encoding = "base32";
algorithm = "curve25519";
type = "sign";
}
~~~

Then you can use `signtool` to edit map's file:

```
rspamadm signtool -e --editor=vim -k <keypair_file> <map_file>
```

To enforce signing policies you should add `sign+` string to your map definition:

~~~ucl
map = "sign+http://example.com/map"
~~~

To specify trusted key you could either put **public** key from the keypair to `local.d/options.inc` file as following:

```
trusted_keys = ["<public key string>"];
```

or add it as `key` definition to the map string:

~~~ucl
map = "sign+key=<key_string>+http://example.com/map"
~~~

## Module configuration

The module itself has very few parameters:

* `key`: use this key (base32 encoded) as trusted key

All other keys are threated as rules to load maps. By default, rspamd tries to load signed updates from `rspamd.com` site using trusted key `qxuogdh5eghytji1utkkte1dn3n81c3y5twe61uzoddzwqzuxxyb`:

~~~ucl
rspamd_update {
rules = "sign+http://rspamd.com/update/rspamd-${BRANCH_VERSION}.ucl";
key = "qxuogdh5eghytji1utkkte1dn3n81c3y5twe61uzoddzwqzuxxyb";
}
~~~

## Updates structure

Update files are quite simple: they have 3 sections:

* `symbols` - list of new scores for symbols that are already in rspamd (loaded with `priority = 1` to override default settings)
* `actions` - list of scores for actions (also loaded with `priority = 1`)
* `rules` - list of lua code fragments to load into rspamd, they can use `rspamd_config` global to register new rules

Here is an example of update file:

~~~ucl
rules = {
test =<<EOD
rspamd_config.TEST = {
callback = function(task) return true end,
score = 1.0,
description = 'test',
}
EOD
}
actions = {
greylist = 3.4,
}
symbols = {
R_DKIM_ALLOW = -0.5,
}
~~~

+ 0
- 72
doc/markdown/modules/spamassassin.md Ver fichero

@@ -1,72 +0,0 @@
# Spamassassin rules module

This module is designed to read and adopt spamassassin rules for rspamd.

## Overview

Spamassassin provides an excellent set of rules that are useful in some relatively
low volume environments. The goal of this plugin is to re-use the existing set
of spamassassin rules natively within rspamd. The configuration of this plugin
is very simple: just glue all your SA rules into a single file and feed it to
spamassassin module:

~~~ucl
spamassassin {
ruleset = "/path/to/file";
# Limit search size to 100 kilobytes for all regular expressions
match_limit = 100k;
# Those regexp atoms will not be passed through hyperscan:
pcre_only = ["RULE1", "__RULE2"];
}
~~~

Rspamd can read multiple files containing SA rules, however it doesn't support
glob patterns so far. All rules are parsed to the same structure, so individual
rules might be overwritten if they occurs in multiple times.

## Limitations and principles of work

Rspamd tries to optimize SA rules quite aggressively. Some of that optimizations
are described in the following [presentation](http://highsecure.ru/ast-rspamd.pdf).
To achieve this goal, rspamd counts all rules as `expression atoms`. Meta rules are
**real** rspamd rules that can have their symbol and score. Other rules are normally
hidden. However, it is possible to specify some minimum score that is needed for a rule
to be treated as normal rule:

alpha = 0.1

With this setting in `spamassassin` section, all rules whose scores are higher than
`0.1` are treated not as atoms but as the complete rules and evaluated accordingly.

Currently, rspamd supports the following functions:

* body, rawbody, meta, header, uri and other rules
* some header functions, such as `exists`
* some eval functions
* some plugins:
+ 'Mail::SpamAssassin::Plugin::FreeMail',
+ 'Mail::SpamAssassin::Plugin::HeaderEval',
+ 'Mail::SpamAssassin::Plugin::ReplaceTags',
+ 'Mail::SpamAssassin::Plugin::RelayEval',
+ 'Mail::SpamAssassin::Plugin::MIMEEval',
+ 'Mail::SpamAssassin::Plugin::BodyEval',
+ 'Mail::SpamAssassin::Plugin::MIMEHeader'

Rspamd does **not** support network plugins, HTML plugins and some other plugins.
This is planned for the next releases of rspamd.

Nevertheless, the vast majority of spamassassin rules can work in rspamd making
the migration process much smoother for those who decide to replace SA with rspamd.

The overall performance of rspamd, of course, goes down since SA rules contain a lot
of inefficient regular expressions that scan large text bodies. However, the optimizations
performed by rspamd can significantly reduce the amount of work required to process
SA rules. Moreover, if your PCRE library is built with JIT support, rspamd can benefit
from this by a significant grade. On start, rspamd tells if it can use JIT compilation and
warns if it cannot. Some regular expressions might also benefit from `hyperscan` support
that is available on x86_64 platforms starting from rspamd 1.1.

Spamassassin plugin is written in lua with many functional elements. Hence, to speed
it up you might want to build rspamd with [luajit](http://luajit.org) that performs
blazingly fast and is almost as fast as plain C. Luajit is enabled by default since
rspamd 0.9.

+ 0
- 34
doc/markdown/modules/spf.md Ver fichero

@@ -1,34 +0,0 @@
# SPF module

SPF module performs checks of the sender's [SPF](http://www.openspf.org/) policy.
Many mail providers uses SPF records to define which hosts are eligible to send email
for this specific domain. In fact, there are many possibilities to create and use
SPF records, however, all they check merely the sender's domain and the sender's IP.

The specific case are automated messages from the special mailer daemon address:
`<>`. In this case rspamd uses `HELO` to grab domain information as specified in the
standart.

## Principles of work

`SPF` can be a powerfull tool when properly used. However, it is very fragile in many
cases: when a message is somehow redirected or reconstructed by mailing lists software.

Moreover, many mail providers have no clear understanding of this technology and
misuse the SPF technique. Hence, the scores for SPF symbols are relatively small
in rspamd.

SPF uses DNS service extensively, therefore rspamd maintain the cache of SPF records.
This caches operates on principle of `least recently used` expiration. All cached items
lifetimes is accordingly limited by the matching DNS record time to live.

You can manually specify the size of this cache by configuring SPF module:

~~~ucl
spf {
spf_cache_size = 1k; # cache up to 1000 of the most recent SPF records
}
~~~

Currently, rspamd supports the full set of SPF elements, macroes and has internal
protection from DNS recursion.

+ 0
- 184
doc/markdown/modules/surbl.md Ver fichero

@@ -1,184 +0,0 @@
# SURBL module

This module performs scanning of URL's found in messages against a list of known
DNS lists. It can add different symbols depending on the DNS replies from a
specific DNS URL list. It is also possible to resolve domains of URLs and then
check the IP addresses against the normal `RBL` style list.

## Module configuration

The default configuration defines several public URL lists. However, their terms
of usage normally disallows commercial or very extensive usage without purchasing
a specific sort of license.

Nonetheless, they can be used by personal services or low volume requests free
of charge.

~~~ucl
surbl {
# List of domains that are not checked by surbl
whitelist = "file://$CONFDIR/surbl-whitelist.inc";
# Additional exceptions for TLD rules
exceptions = "file://$CONFDIR/2tld.inc";

rule {
# DNS suffix for this rule
suffix = "multi.surbl.org";
symbol = "SURBL_MULTI";
bits {
# List of bits ORed when reply is given
JP_SURBL_MULTI = 64;
AB_SURBL_MULTI = 32;
MW_SURBL_MULTI = 16;
PH_SURBL_MULTI = 8;
WS_SURBL_MULTI = 4;
SC_SURBL_MULTI = 2;
}
}
rule {
suffix = "multi.uribl.com";
symbol = "URIBL_MULTI";
bits {
URIBL_BLACK = 2;
URIBL_GREY = 4;
URIBL_RED = 8;
}
}
rule {
suffix = "uribl.rambler.ru";
# Also check images
images = true;
symbol = "RAMBLER_URIBL";
}
rule {
suffix = "dbl.spamhaus.org";
symbol = "DBL";
# Do not check numeric URL's
noip = true;
}
rule {
suffix = "uribl.spameatingmonkey.net";
symbol = "SEM_URIBL_UNKNOWN";
bits {
SEM_URIBL = 2;
}
noip = true;
}
rule {
suffix = "fresh15.spameatingmonkey.net";
symbol = "SEM_URIBL_FRESH15_UNKNOWN";
bits {
SEM_URIBL_FRESH15 = 2;
}
noip = true;
}
}
~~~

In general, the configuration of `surbl` module is definition of DNS lists. Each
list must have suffix that defines the list itself and optionally for some lists
it is possible to specify either `bit` or `ips` sections.

Since some URL lists do not accept `IP` addresses, it is also possible to disable sending of URLs with IP address in the host to such lists. That could be done by specifying `noip = true` option:

~~~ucl
rule {
suffix = "dbl.spamhaus.org";
symbol = "DBL";
# Do not check numeric URL's
noip = true;
}
~~~

It is also possible to check HTML images URLs using URL blacklists. Just specify `images = true` for such list and you are done:

~~~ucl
rule {
suffix = "uribl.rambler.ru";
# Also check images
images = true;
symbol = "RAMBLER_URIBL";
}
~~~

## Principles of operation

In this section, we define how `surbl` module performs its checks.

### TLD composition

By default, we want to check some top level domain, however, many domains contain
two components while others can have 3 or even more components to check against the
list. By default, rspamd takes top level domain as defined in the [public suffixes](https://publicsuffix.org).
Then one more component is prepended, for example:

sub.example.com -> [.com] -> example.com
sub.co.uk -> [.co.uk] -> sub.co.uk

However, sometimes even more levels of domain components are required. In this case,
the `exceptions` map can be used. For example, if we want to check all subdomains of
`example.com` and `example.co.uk`, then we can define the following list:

example.com
example.co.uk

Here are new composition rules:

sub.example.com -> [.example.com] -> sub.example.com
sub1.sub2.example.co.uk -> [.example.co.uk] -> sub2.example.co.uk

### DNS composition

SURBL module composes the DNS request of two parts:

- TLD component as defined in the previous section;
- DNS list suffix

For example, to form a request to multi.surbl.org, the following applied:

example.com -> example.com.multi.surbl.com

### Results parsing

Normally, DNS blacklists encode reply in A record from some private network
(namely, `127.0.0.0/8`). Encoding varies from one service to another. Some lists
use bits encoding, where a single DNS list or error message is encoded as a bit
in the least significant octet of the IP address. For example, if bit 1 encodes `LISTA`
and bit 2 encodes `LISTB`, then we need to perform bitwise `OR` for each specific bit
to decode reply:

127.0.0.3 -> LISTA | LISTB -> both bit symbols are added
127.0.0.2 -> LISTB only
127.0.0.1 -> LISTA only

This encoding can save DNS requests to query multiple lists one at a time.

Some other lists use direct encoding of lists by some specific addresses. In this
case you should define results decoding principle in `ips` section not `bits` since
bitwise rules are not applicable to these lists. In `ips` section you explicitly
match the ip returned by a list and its meaning.

## IP lists

From rspamd 1.1 it is also possible to do two step checks:

1. Resolve IP addresses of each URL
2. Check each IP resolved against SURBL list

In general this procedure could be represented as following:

* Check `A` or `AAAA` records for `example.com`
* For each ip address resolve it using reverse octets composition: so if IP address of `example.com` is `1.2.3.4`, then checks would be for `4.3.2.1.uribl.tld`

For example, [SBL list](https://www.spamhaus.org/sbl/) of `spamhaus` project provides such functions using `ZEN` multi list. This is included in rspamd default configuration:

~~~ucl
rule {
suffix = "zen.spamhaus.org";
symbol = "ZEN_URIBL";
resolve_ip = true;
ips {
URIBL_SBL = "127.0.0.2";
}
}
~~~

+ 0
- 39
doc/markdown/modules/trie.md Ver fichero

@@ -1,39 +0,0 @@
# Trie plugin

Trie plugin is designed to search multiple strings within raw messages or text parts
doing this blazingly fast. In fact, it uses aho-corasic algorithm that performs incredibly
good even on large texts and many input strings.

This module provides a convenient interface to the search trie structure.

## Configuration

Here is an example of trie configuration:

~~~ucl
trie {
# Each subsection defines a single rule with associated symbol
SYMBOL1 {
# Define rules in the file (it is *NOT* a map)
file = "/some/path";
# Raw rules search within the whole undecoded messages
raw = true;
# If we have multiple occurrences of strings from this rule
# then we insert a symbol multiple times
multi = true;
}
SYMBOL2 {
patterns = [
"pattern1",
"pattern2",
"pattern3"
]
}
}
~~~

Despite of the fact that aho-corasic trie is very fast, it supports merely plain
strings. Moreover, it cannot distinguish words boundaries, for example, a string
`test` will be found in texts `test`, `tests` or even `123testing`. Therefore, it
might be used to search some concrete and relatively specific patterns and should
not be used for words match.

+ 0
- 119
doc/markdown/modules/whitelist.md Ver fichero

@@ -1,119 +0,0 @@
# Whitelist module

Whitelist module is intended to negate or increase scores for some messages that are known to
be from the trusted sources. Due to `SMTP` protocol design flaws, it is quite easy to
forge sender. Therefore, rspamd tries to validate sender based on the following additional
properties:

- `DKIM`: a message has a valid DKIM signature for this domain
- `SPF`: a message matches SPF record for the domain
- `DMARC`: a message also satisfies domain's DMARC policy (usually implies SPF and DMARC)

## Whitelist setup

Whitelist configuration is quite straightforward. You can define a set of rules within
`rules` section. Each rule **must** have `domains` attribute that specifies either
map of domains (if specified as a string) or a direct list of domains (if specified as an array).

### Whitelist constraints

The following constraints are allowed:

- `valid_spf`: require a valid SPF policy
- `valid_dkim`: require DKIM validation
- `valid_dmarc`: require a valid DMARC policy

### Whitelist rules modes

Each whitelist rule can work in 3 modes:

- `whitelist` (default): add symbol when a domain has been found and one of constraints defined is satisfied (e.g. `valid_dmarc`)
- `blacklist`: add symbol when a domain has been found and one of constraints defined is *NOT* satisfied (e.g. `valid_dmarc`)
- `strict`: add symbol with negative (ham) score when a domain has been found and one of constraints defined is satisfied (e.g. `valid_dmarc`) and add symbol with **POSITIVE** (spam) score when some of constraints defined has failed

If you do not define any constraints, then all both `strict` and `whitelist` rules just insert result for all mail from the specified domains. For `blacklist` rules the result has normally positive score.

These options are combined using `AND` operator for `whitelist` and using `OR` for `blacklist` and `strict` rules. Therefore, if `valid_dkim = true` and
`valid_spf = true` would require both DKIM and SPF validation to whitelist domains from
the list. On the contrary, for blacklist and strict rules any violation would cause positive score symbol being inserted.

### Optional settings

You can also set the default metric settings using the ordinary attributes, such as:

- `score`: default score
- `group`: default group (`whitelist` group is used if not specified explicitly)
- `one_shot`: default one shot mode
- `description`: default description

Within lists, you can also use optional `multiplier` argument that defines additional
multiplier for the score added by this module. For example, let's define twice bigger
score for `github.com`:

["github.com", 2.0]

or if using map:

github.com 2.0

## Configuration example

~~~ucl
whitelist {
rules {
WHITELIST_SPF = {
valid_spf = true;
domains = [
"github.com",
];
score = -1.0;
}

WHITELIST_DKIM = {
valid_dkim = true;
domains = [
"github.com",
];
score = -2.0;
}

WHITELIST_SPF_DKIM = {
valid_spf = true;
valid_dkim = true;
domains = [
["github.com", 2.0],
];
score = -3.0;
}

STRICT_SPF_DKIM = {
valid_spf = true;
valid_dkim = true;
strict = true;
domains = [
["paypal.com", 2.0],
];
score = -3.0; # For strict rules negative score should be defined
}

BLACKLIST_DKIM = {
valid_spf = true;
valid_dkim = true;
blacklist = true;
domains = "/some/file/blacklist_dkim.map";
score = 3.0; # Mention positive score here
}

WHITELIST_DMARC_DKIM = {
valid_dkim = true;
valid_dmarc = true;
domains = [
"github.com",
];
score = -7.0;
}
}
}
~~~

Rspamd also comes with a set of pre-defined whitelisted domains that could be useful for start.

+ 0
- 9
doc/markdown/tutorials/index.md Ver fichero

@@ -1,9 +0,0 @@
# Rspamd tutorials

In this section you can find the current step-by-step tutorials coverign various topics about rspamd.

* [Migrating from SA](migrate_sa.md) - the guide for those who wants to migrate an existing SpamAssassin system to Rspamd
* [Writing rspamd rules](writing_rules.md) - how to extend rspamd by writing your own rules
* [Creating your fuzzy storage](http://rspamd.com/doc/fuzzy_storage.html) - learn how to make your own fuzzy storage
* [Training rspamd with dovecot antispam plugin, part 1](https://kaworu.ch/blog/2014/03/25/dovecot-antispam-with-rspamd/) - this tutorial describes how to train rspamd automatically using the `antispam` pluging of the `dovecot` IMAP server
* [Training rspamd with dovecot antispam plugin, part 2](https://kaworu.ch/blog/2015/10/12/dovecot-antispam-with-rspamd-part2/) - continuation of the previous tutorial

+ 0
- 82
doc/markdown/tutorials/migrate_sa.md Ver fichero

@@ -1,82 +0,0 @@
# Migrating from SpamAssassin to Rspamd

This guide provides information for those who wants to migrate an existing system from [SpamAssassin](https://spamassassin.apache.org) to Rspamd. You will find information about major differences between the spam filtering engines and how to deal with the transition process.

## Why migrate to Rspamd

rspamd runs **significantly faster** than SpamAssassin while providing approximately the same quality of filtering. However, if you don't care about the performance and resource consumption of your spam filtering engine you might still find Rspamd useful because it has a simple but powerful web management system (WebUI).

On the other hand, if you have a lot of custom rules, or you use Pyzor/Razor/DCC, or you have some commercial 3rd party products that depend on SpamAssassin then you may not want to migrate.

In short: Rspamd is for **speed**!

## What about dspam/spamoracle...?

You could also move from these projects to Rspamd. You should bear in mind, however, that Rspamd and SA are multi-factor spam filtering systems that use three main approaches to filter messages:

* Content filtering - static rules that are designed to find known bad patterns in messages (usually regexp or other custom rules)
* Dynamic lists - DNS or reputation lists that are used to filter known bad content, such as abused IP addresses or URL domains
* Statistical filters - which learn to distinguish spam and ham messages

`dspam`, `spamoracle` and others usually implement the third approach, only providing statistical filtering. This method is quite powerful but it can cause false-positives and is not very suitable for multi-user environments. Rspamd and SA, in contrast, are designed for systems with many users. Rspamd, in particular, was written for a very large system with more than 40 million users and about 10 million emails per hour.

## Before you start

There are a couple of things you need to know before transition:

1. Rspamd does not support SpamAssassin statistics so you'd need to **train** your filter from scratch with spam and ham samples (or install the [pre-built statistics](https://rspamd.com/rspamd_statistics/)). Rspamd uses a different statistical engine - called [OSB-Bayes](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf) - which is intended to be more precise than SA's 'naive' Bayes classifier
2. Rspamd uses `Lua` for plugins and rules, so basic knowledge of this language is more than useful for playing with Rspamd; however, Lua is very simple and can be learned [very quickly](http://lua-users.org/wiki/LuaTutorial)
3. Rspamd uses the `HTTP` protocol to communicate with the MTA or milter, so SA native milters might not communicate with Rspamd. There is some limited support of the SpamAssassin protocol, though some commands are not supported, in particular those which require copying of data between scanner and milter. More importantly, `Length`-less messages are not supported by Rspamd as they completely break HTTP semantics and will never be supported. To achieve the same functionality, a dedicated scanner could use, e.g. HTTP `chunked` encoding.
4. Rspamd is **NOT** intended to work with blocking libraries or services, hence, something like `mysql` or `postgresql` will likely not be supported
5. Rspamd is developing quickly so you should be aware that there might be some incompatible changes between major versions - they are usually listed in the [migration](../migration.md) section of the site.
6. Unlike SA where there are only `spam` and `ham` results, Rspamd supports five levels of messages called `actions`:
+ `no action` - ham message
+ `greylist` - turn on adaptive greylisting (which is also used on higher levels)
+ `add header` - adds Spam header (meaning soft-spam action)
+ `rewrite subject` - rewrite subject to `*** SPAM *** original subject`
+ `reject` - ultimately reject message

Each action can have its own score limit which could also be modified by a user's settings. Rspamd assumes the following order of actions: `no action` <= `greylist` <= `add header` <= `rewrite subject` <= `reject`.

Actions are **NOT** performed by Rspamd itself - they are just recommendations for the MTA agent, rmilter for example, that performs the necessary actions such as adding headers or rejecting mail.

SA `spam` is almost equal to the Rspamd `add header` action in the default setup. With this action, users will be able to check messages in their `Junk` folder, which is usually a desired behaviour.

## First steps with Rspamd

To install Rspamd, I recommend using one of the [official packages](https://rspamd.com/downloads.html) that are available for many popular platforms. If you'd like to have more features then you can consider the `experimental` branch of packages, while if you would like to have more stability then you can select the `stable` branch. However, normally even the `experimental` branch is stable enough for production use, and bugs are fixed more quickly in the `experimental` branch.

## General SpamAssassin rules

For those who have a lot of custom rules, there is good news: Rspamd supports a certain set of SpamAssassin rules via a special [plugin](../modules/spamassassin.md) that allows **direct** loading of SA rules into Rspamd. You just need to specify your SA configuration files in the plugin configuration:

~~~ucl
spamassassin {
sa_main = "/etc/spamassassin/conf.d/*";
sa_local = "/etc/spamassassin/local.cf";
}
~~~

On the other hand, if you don't have a lot of custom rules and primarily use the default ruleset then you shouldn't use this plugin: many SA rules are already implemented natively in Rspamd so you won't get any benefit from including such rules from SA.

## Integration

If you have your SA up and running it is usually possible to switch the system to Rspamd using the existing tools. However, please check the [integration document](https://rspamd.com/doc/integration.html) for further details.

## Statistics

rspamd statistics are not compatible with SA as Rspamd uses a more advanced statistics algorithm, described in the following [article](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf), so please bear in mind that you need to **relearn** your statistics. This can be done, for example, by using the `rspamc` command: assuming that you have your messages in separate files (e.g. `maildir` format), placed in directories `spam` and `ham`:

rspamc learn_spam spam/
rspamd learn_ham ham/

(You will need Rspamd up and running to use these commands.)

### Learning using mail interface

You can also setup rspamc to learn via passing messages to a certain email address. I'd recommend using `/etc/aliases` for this purpose and a `mail-redirect` command (e.g. provided by [Mail Redirect addon](https://addons.mozilla.org/en-GB/thunderbird/addon/mailredirect/) for `thunderbird` MUA). The desired aliases could be the following:

learn-spam123: "| rspamc learn_spam"
learn-ham123: "| rspamc learn_ham"

(You would need to use less predictable aliases to avoid the sending of messages to such addresses by an adversary, or just by mistake, to prevent statistics pollution.)

+ 0
- 481
doc/markdown/tutorials/writing_rules.md Ver fichero

@@ -1,484 +0,0 @@
# Writing Rspamd rules

In this tutorial, I describe how to create new rules for Rspamd - both Lua and regexp rules.

## Introduction

Rules are the essential part of a spam filtering system and Rspamd ships with some prepared rules by default. However, if you run your own system you might want to have your own rules for better spam filtering or a better false positives rate. Rules are usually written in `Lua`, where you can specify both custom logic and generic regular expressions.

## Configuration files

Since Rspamd ships with its own rules it is a good idea to store your custom rules and configuration in separate files to avoid clashing with the default rules which might change from version to version. There are some possibilities to achieve this:

- Local rules in Lua should be stored in the file named `${CONFDIR}/lua/rspamd.local.lua` where `${CONFDIR}` is the directory where your configuration files are placed (e.g. `/etc/rspamd`, or `/usr/local/etc/rspamd` for some systems)
- Local configuration that **adds** options to Rspamd should be placed in `${CONFDIR}/rspamd.conf.local`
- Local configuration that **overrides** the default settings should be placed in `${CONFDIR}/rspamd.conf.override`

Lua local configuration can be used to both override and extend:

`rspamd.lua`:

~~~lua
config['regexp']['symbol'] = '/some_re/'
~~~

`rspamd.local.lua`:

~~~lua
config['regexp']['symbol1'] = '/other_re/' -- add 'symbol1' key to the table
config['regexp']['symbol'] = '/override_re/' -- replace regexp for 'symbol'
~~~

For configuration rules you can take a look at the following examples:

`rspamd.conf`:

~~~ucl
var1 = "value1";

section "name" {
var2 = "value2";
}
~~~

`rspamd.conf.local`:

~~~ucl
var1 = "value2";

section "name" {
var3 = "value3";
}
~~~

Resulting config:

~~~ucl
var1 = "value1";
var1 = "value2";

section "name" {
var2 = "value2";
}
section "name" {
var3 = "value3";
}
~~~

Override example:

`rspamd.conf`:

~~~ucl
var1 = "value1";

section "name" {
var2 = "value2";
}
~~~

`rspamd.conf.override`:

~~~ucl
var1 = "value2";

section "name" {
var3 = "value3";
}
~~~

Resulting config:

~~~ucl
var1 = "value2";

# Note that var2 is removed completely

section "name" {
var3 = "value3";
}
~~~

For each individual configuration file shipped with Rspamd, there are two special includes:

.include(try=true,priority=1) "$CONFDIR/local.d/config.conf"
.include(try=true,priority=1) "$CONFDIR/override.d/config.conf"

Therefore, you can either extend (using local.d) or ultimately override (using override.d) any settings in the Rspamd configuration.

For example, let's override some default symbols shipped with Rspamd. To do that we can create and edit `etc/rspamd/local.d/metrics.conf`:

symbol "BLAH" {
score = 20.0;
}

We can also use an override file. For example, let's redefine actions and set a more restrictive `reject` score. To do this, we create `etc/rspamd/override.d/metrics.conf` with the following content:

actions {
reject = 150;
add_header = 6;
greylist = 4;
}

Note that you need to define a complete action to redefine an existing one. For example, you **cannot** write something like

actions {
reject = 150;
}

as this will set the other actions (`add_header` and `greylist`) as undefined.

## Writing rules

There are two types of rules that are normally defined by Rspamd:

- `Lua` rules: code in written in Lua
- `Regexp` rules: regular expressions and combinations of regular expressions to match specific patterns

Lua rules are useful for some complex tasks: check DNS, query redis or HTTP, examine some task-specific details. Regexp rules are useful since they are heavily optimized by Rspamd (especially when `hyperscan` is enabled) and allow matching custom patterns in headers, urls, text parts and even the entire message body.

### Rule weights

Rule weights are usually defined in the `metrics` section and contain the following data:

- score triggers for different actions
- symbol scores
- symbol descriptions
- symbol group definitions:
+ symbols in group
+ description of group
+ joint group score limit

For built-in rules scores are placed in the file called `${CONFDIR}/metrics.conf`, however, you have two possibilities to define scores for your rules:

1. Define scores in `rspamd.conf.local` as following:

~~~ucl
metric "default" {
symbol "MY_SYMBOL" {
description = "my cool rule";
score = 1.5;
}
}
~~~

2. Define scores directly in Lua when describing symbol:

~~~lua
config['regexp']['MY_SYMBOL'] = {
re = '/a/M & From=/blah/',
score = 1.5,
description = 'my cool rule',
group = 'my symbols'
}

rspamd_config.MY_LUA_SYMBOL = {
callback = function(task)
-- Do something
return true
end,
score = -1.5,
description = 'another cool rule',
group = 'my symbols'
}
~~~

## Regexp rules

Regexp rules are executed by the `regexp` module of Rspamd. You can find a detailed description of the syntax in [the regexp module documentation](../modules/regexp.md)

Here are some hints to maximise performance of your regexp rules:

* Prefer lightweight regexps, such as header or url, to heavy ones, such as mime or body regexps
* If you need to match text in a message's content, prefer `mime` regexps as they are executed on text content only
* If you **really** need to match the whole messages, then you might consider using the [trie](../modules/trie.md) module as it is significantly faster
* Avoid complex regexps, avoid backtracing, avoid negative groups `(?!)`, avoid capturing patterns (replace with `(?:)`), avoid potentially empty patterns, e.g. `/^.*$/`

Following these rules allows you to create fast but efficient rules. To add regexp rules you should use the `config` global table that is defined in any Lua file used by Rspamd:

~~~lua
config['regexp'] = {} -- Remove all regexp rules (including internal ones)
local reconf = config['regexp'] -- Create alias for regexp configs

local re1 = 'From=/foo@/H' -- Mind local here
local re2 = '/blah/P'

reconf['SYMBOL'] = {
re = string.format('(%s) && !(%s)', re1, re2), -- use string.format to create expression
score = 1.2,
description = 'some description',

condition = function(task) -- run this rule only if some condition is satisfied
return true
end,
}
~~~

## Lua rules

Lua rules are more powerful than regexp ones but they are not as heavily optimized and can cause performance issues if written incorrectly. All Lua rules accept a special parameter called `task` which represents a scanned message.

### Return values

Each Lua rule can return 0, or false, meaning that the rule has not matched, or true if the symbol should be inserted. In fact, you can return any positive or negative number which would be multiplied by the rule's score, e.g. if the rule score is `1.2`, then when your function returns `1` the symbol will have a score of `1.2`, and when your function returns `2.0` then the symbol will have a score of `2.4`.

### Rule conditions

Like regexp rules, conditions are allowed for Lua regexps, for example:

~~~lua
rspamd_config.SYMBOL = {
callback = function(task)
return 1
end,
score = 1.2,
description = 'some description',

condition = function(task) -- run this rule only if some condition is satisfied
return true
end,
}
~~~

### Useful task manipulations

There are a number of methods in [task](../lua/task.md) objects. For example, you can get any part of a message:

~~~lua
rspamd_config.HTML_MESSAGE = {
callback = function(task)
local parts = task:get_text_parts()

if parts then
for i,p in ipairs(parts) do
if p:is_html() then
return 1
end
end
end

return 0
end,
score = -0.1,
description = 'HTML included in message',
}
~~~

You can get HTML information:

~~~lua
local function check_html_image(task, min, max)
local tp = task:get_text_parts()

for _,p in ipairs(tp) do
if p:is_html() then
local hc = p:get_html()
local len = p:get_length()


if len >= min and len < max then
local images = hc:get_images()
if images then
for _,i in ipairs(images) do
if i['embedded'] then
return true
end
end
end
end
end
end
end

rspamd_config.HTML_SHORT_LINK_IMG_1 = {
callback = function(task)
return check_html_image(task, 0, 1024)
end,
score = 3.0,
group = 'html',
description = 'Short html part (0..1K) with a link to an image'
}
~~~

You can get message headers with full information passed:

~~~lua

rspamd_config.SUBJ_ALL_CAPS = {
callback = function(task)
local util = require "rspamd_util"
local sbj = task:get_header('Subject')

if sbj then
local stripped_subject = subject_re:search(sbj, false, true)
if stripped_subject and stripped_subject[1] and stripped_subject[1][2] then
sbj = stripped_subject[1][2]
end

if util.is_uppercase(sbj) then
return true
end
end

return false
end,
score = 3.0,
group = 'headers',
description = 'All capital letters in subject'
}
~~~

You can also access HTTP headers, urls and other useful properties of Rspamd tasks. Moreover, you can use global convenience modules exported by Rspamd, such as [rspamd_util](../lua/util.md) or [rspamd_logger](../lua/logger.md) by requiring them in your rules:

~~~lua
rspamd_config.SUBJ_ALL_CAPS = {
callback = function(task)
local util = require "rspamd_util"
local logger = require "rspamd_logger"
...
end,
}
~~~

## Rspamd symbols

rspamd rules fall under three categories:

1. Pre-filters - run before other rules
2. Filters - run normally
3. Post-filters - run after all checks

The most common type of rules are generic filters. Each filter is basically a callback that is executed by Rspamd at some time, along with an optional symbol name associated with this callback. In general, there are three options to register symbols:

* register callback and associated symbol
* register just a plain callback
* register symbol with no callback (*virtual* symbol)

The last option is useful when you have a single callback but with different possible results; for example `SYMBOL_ALLOW` or `SYMBOL_DENY`. Filters are registered with three methods:

* `rspamd_config:register_symbol('SYMBOL', nominal_weight, callback)` - registers normal symbol
* `rspamd_config:register_callback_symbol(nominal_weight, callback)` - registers callback only symbol
* `rspamd_config:register_virtual_symbol('SYMBOL', nominal_weight, id)` - registers normal symbol

`nominal_weight` is used to define priority and the initial score multiplier. It should usually be `1.0` for normal symbols and `-1.0` for symbols with negative scores that should be executed before other symbols. Here is an example of registering one callback and a couple of virtual symbols used in the [dmarc](../modules/dmarc.md) module:

~~~lua
local id = Rspamd_config:register_callback_symbol('DMARC_CALLBACK', 1.0,
dmarc_callback)
rspamd_config:register_virtual_symbol('DMARC_POLICY_ALLOW', -1, id)
rspamd_config:register_virtual_symbol('DMARC_POLICY_REJECT', 1, id)
rspamd_config:register_virtual_symbol('DMARC_POLICY_QUARANTINE', 1, id)
rspamd_config:register_virtual_symbol('DMARC_POLICY_SOFTFAIL', 1, id)
rspamd_config:register_dependency(id, symbols['spf_allow_symbol'])
rspamd_config:register_dependency(id, symbols['dkim_allow_symbol'])
~~~

Numeric `id` is returned by a registration function with callbacks (`register_symbol` or `register_callback_symbol`) and can be used to link symbols:

* add virtual symbols associated with this callback
* correctly display average time for symbols without callbacks
* properly sort symbols
* register dependencies on virtual symbols (in fact, the true dependency is created based on the parent symbol but it is sometimes convenient to use virtual symbols for simplicity)

### Asynchronous actions

For asynchronous actions, such as redis access or DNS checks it is recommended to use
dedicated callbacks, called symbol handlers. The difference to generic Lua rules is that
dedicated callbacks are not obliged to return value but they use the method `task:insert_result(symbol, weight)` to indicate a match. All Lua plugins are implemented as symbol handlers. Here is a simple example of a symbol handler that checks DNS:

~~~lua
rspamd_config:register_symbol('SOME_SYMBOL', 1.0,
function(task)
local to_resolve = 'google.com'
local logger = require "rspamd_logger"

local dns_cb = function(resolver, to_resolve, results, err)
if results then
logger.infox(task, '<%1> host: [%2] resolved for symbol: %3',
task:get_message_id(), to_resolve, 'RULE')
task:insert_result(rule['symbol'], 1)
end
end
task:get_resolver():resolve_a({
task=task,
name = to_resolve,
callback = dns_cb})
end)
~~~

You can also set the desired score and description:

~~~lua
rspamd_config:set_metric_symbol('SOME_SYMBOL', 1.2, 'some description')
if rule['score'] then
if not rule['group'] then
rule['group'] = 'whitelist'
end
rule['name'] = symbol
Rspamd_config:set_metric_symbol(rule)
end
~~~

## Difference between `config` and `rspamd_config`

It might be confusing that there are two variables with a common meaning. (This is a legacy of older versions of Rspamd). However, currently `rspamd_config` represents an object that can have many purposes:

* Get configuration options:

~~~lua
rspamd_config:get_all_opts('section')
~~~

* Add maps:

~~~lua
rule['map'] = Rspamd_config:add_kv_map(rule['domains'],
"Whitelist map for " .. symbol)
~~~

* Register callbacks for symbols:

~~~lua
rspamd_config:register_symbol('SOME_SYMBOL', 1.0, some_functions)
~~~

* Register lua rules (note that `__newindex` metamethod is actually used here):

~~~lua
rspamd_config.SYMBOL = {...}
~~~

* Register composites, pre-filters, post-filters and so on

On the other hand, the `config` global is extremely simple: it's just a plain table of configuration options that is exactly the same as defined in `rspamd.conf` (and `rspamd.conf.local` or `rspamd.conf.override`). However, you can also use Lua tables and even functions for some options. For example, the `regexp` module also can accept a `callback` argument:

~~~lua
config['regexp']['SYMBOL'] = {
callback = function(task) ... end,
...
}
~~~

Such syntax is discouraged, however, and is preserved mostly for compatibility reasons.

## Configuration order

There is a strict order of configuration application:

1. `rspamd.conf` and `rspamd.conf.local` are processed
2. `rspamd.conf.override` is processed and it **overrides** anything parsed on the previous step
3. **Lua** rules are loaded and they can override everything from the previous steps, with the important exception of rules scores, which are **NOT** overridden if the relevant symbol is also defined in a `metric` section
4. **Dynamic** configuration options defined in the WebUI (normally) are loaded and can override rule scores or action scores from the previous steps

## Rules check order

Rules in Rspamd are checked in the following order:

1. **Pre-filters**: checked every time and can stop all further processing by calling `task:set_pre_result()`
2. **All symbols***: can depend on each other by calling `rspamd_config:add_dependency(from, to)`
3. **Statistics**: is checked only when all symbols are checked
4. **Composites**: combine symbols to adjust the final results
5. **Post-filters**: are executed even if a message is already rejected and symbols processing has been stopped

+ 0
- 69
doc/markdown/workers/controller.md Ver fichero

@@ -1,69 +0,0 @@
# Controller worker

Controller worker is used to manage rspamd stats, to learn rspamd and to serve WebUI.

Internally, the controller worker is just a web server that accepts requests and sends replies using JSON serialization.
Each command is defined by URL. Some commands are read only and are considered as `unprivileged` whilst other commands, such as
maps modification, config modifications and learning requires higher level of privileges: `enable` level. The differece between levels is specified
by password. If only one password is specified in the configuration, it is used for both type of commands.

## Controller configuration

Rspamd controller worker supports the following options:

* `password`: password for read-only commands
* `enable_password`: password for write commands
* `secure_ip`: list or map with IP addresses that are treated as `secure` so **all** commands are allowed from these IPs **without** passwords
* `static_dir`: directory where interface static files are placed (usually `${WWWDIR}`)
* `stats_path`: path where controller save persistent stats about rspamd (such as scanned messages count)

## Encryption support

To generate a keypair for the scanner you could use:

rspamadm keypair -u

After that keypair should appear as following:

~~~ucl
keypair {
pubkey = "tm8zjw3ougwj1qjpyweugqhuyg4576ctg6p7mbrhma6ytjewp4ry";
privkey = "ykkrfqbyk34i1ewdmn81ttcco1eaxoqgih38duib1e7b89h9xn3y";
}
~~~

You can use its **public** part thereafter when scanning messages as following:

rspamc --key tm8zjw3ougwj1qjpyweugqhuyg4576ctg6p7mbrhma6ytjewp4ry <file>

## Passwords encryption

Rspamd now suggests to encrypt passwords when storing them in a configuration. Currently, it uses `PBKDF2-Blake2` function to derive key from a password. To encrypt key, you can use `rspamadm pw` command as following:

rspamadm pw
Enter passphrase: <hidden input>
$1$cybjp37q4w63iogc4erncz1tgm1ce9i5$kxfx9xc1wk9uuakw7nittbt6dgf3qyqa394cnradg191iqgxr8kb

You can use that line as `password` and `enable_password` values.

## Supported commands

* `/auth`
* `/symbols`
* `/actions`
* `/maps`
* `/getmap`
* `/graph`
* `/pie`
* `/history`
* `/historyreset` (priv)
* `/learnspam` (priv)
* `/learnham` (priv)
* `/saveactions` (priv)
* `/savesymbols` (priv)
* `/savemap` (priv)
* `/scan`
* `/check`
* `/stat`
* `/statreset` (priv)
* `/counters`

+ 0
- 138
doc/markdown/workers/fuzzy_storage.md Ver fichero

@@ -1,138 +0,0 @@
# Fuzzy storage worker

Fuzzy storage worker is intended to store fuzzy hashes of messages.

## Protocol format

Fuzzy storage accepts requests using `UDP` protocol with the following structure:

~~~C
struct fuzzy_cmd { /* attribute(packed) */
unit8_t version; /* command version, must be 0x2 */
unit8_t cmd; /* numeric command */
unit8_t shingles_count; /* number of shingles */
unit8_t flag; /* flag number */
int32_t value; /* value to store */
uint32_t tag; /* random tag */
char digest[64]; /* blake2b digest */
};
~~~

All numbers are in host byte order, so if you want to check fuzzy hashes from a
host with different byte order you need some additional conversions (not currently
supported by rspamd). In future, rspamd might use little endian byte order for all
operations.

Fuzzy storage accepts the following commands:
- `FUZZY_CHECK` - check for a fuzzy hash
- `FUZZY_ADD` - add a new hash
- `FUZZY_DEL` - remove a hash

`flag` field is used to store different hashes in a single storage. For example,
it allows to store blacklists and whitelists in the same fuzzy storage worker.
A client should set the `flag` field when adding or deleting hashes and check it
when querying for a hash.

`value` is added to the currently stored value of a hash if that hash has been found.
This field can handle negative numbers as well.

`tag` is used to distinguish requests by a client. Fuzzy storage just sets this
field in the reply equal to the value in the request.

`digest` field contains the content of hash. Currently, rspamd uses `blake2b` hash
in its binary form granting the `2^512` of possible hashes with negligible collisions
probability. At the same time, rspamd saves the legacy format of fuzzy hashes by
means of this field. Old rspamd can work with legacy hashes only.

`shingles_count` defines how many `shingles` are attached to this command.
Currently, rspamd uses 32 shingles and this value thus should be 32 for commands
with shingles. Shingles should be included in the same packet and follow the command as
an array of int64_t values. Please note, that rspamd rejects commands that have wrong
shingles count or their size is not equal to the desired one:

sizeof(fuzzy_cmd) + shingles_count * sizeof(int64_t)
Reply format of fuzzy storage is also presented as a structure:

~~~C
struct fuzzy_cmd { /* attribute(packed) */
int32_t value;
uint32_t flag;
uint32_t tag;
float prob;
};
~~~

`prob` field is used to store the probability of match. This value is changed from
`0.0` (no match) to `1.0` (full match).

## Storage format

Rspamd fuzzy storage uses `sqlite3` for storing hashes. All update operations are
performed in a transaction which is committed to the main database approximately once
per minute. `VACUUM` command is executed on startup and hashes expiration is performed
at the termination of rspamd fuzzy storage worker.

Here is the internal database structure:

```
CREATE TABLE digests(id INTEGER PRIMARY KEY,
flag INTEGER NOT NULL,
digest TEXT NOT NULL,
value INTEGER,
time INTEGER);
CREATE TABLE shingles(value INTEGER NOT NULL,
number INTEGER NOT NULL,
digest_id INTEGER REFERENCES digests(id) ON DELETE CASCADE ON UPDATE CASCADE);
```

Since rspamd uses normal sqlite3 you can use all tools for working with the hashes
database to perform, for example backup or analysis.

## Operation notes

To check a hash, rspamd fuzzy storage initially queries for the direct match using
`digest` field as a key. If that match succeed then the value is returned immediately.
Otherwise, if a command contains shingles then rspamd checks for fuzzy match trying
to find each shingle's value. If more than 50% of shingles matches the same digest
then rspamd returns that digest's value and the probability of match that means
generally `match_count / shingles_count`.

## Configuration

Fuzzy storage accepts the following extra options:

- `hashfile` - path to the sqlite storage (where are also few outdated aliases for this command exist: hash_file, file, database)
- `sync` - time to perform database sync in seconds, default value: 60
- `expire` - time value for hashes expiration in seconds, default value: 2 days
- `keypair` - encryption keypair (can be repeated for different keys), can be obtained via *rspamadm keypair -u* command
- `keypair_cache_size` - Size of keypairs cache, default value: 512
- `encrypted_only` - allow encrypted requests only (and forbid all unknown keys or plaintext requests)
- `master_timeout` - master protocol IO timeout
- `sync_keypair` - encryption key for master/slave updates
- `masters` - string, allow master/slave updates from the following IP addresses
- `master_key` - allow master/slave updates merely using the specified key
- `slave` - list of slave hosts.
- `mirror` - list of slave hosts, same as `slave`
- `allow_update` - string, array of strings or a map of IP addresses that are allowed
to perform changes to fuzzy storage (you should also set `read_only = no` in your fuzzy_check plugin).

Here is an example configuration of fuzzy storage:

~~~ucl
worker {
type = "fuzzy";
bind_socket = "*:11335";
hashfile = "${DBDIR}/fuzzy.db"
expire = 90d;
allow_update = ["127.0.0.1", "::1"];
}
~~~

## Compatibility notes

Rspamd fuzzy storage of version `0.8` can work with rspamd clients of all versions,
however, all updates from legacy versions (less that `0.8`) won't update fuzzy shingles
database. Rspamd [fuzzy check module](../modules/fuzzy_check.md) can work **only**
with the recent rspamd fuzzy storage (it won't get anything from the legacy storages).

+ 0
- 84
doc/markdown/workers/index.md Ver fichero

@@ -1,84 +0,0 @@
# Rspamd workers

Rspamd defines several types of worker processes. Each type is designed for its specific
purpose, for example to scan mail messages, to perform control actions, such as learning or
statistic grabbing. There is also flexible worker type named `lua` worker that allows
to run any lua script as Rspamd worker providing proxy from Rspamd lua API.

## Worker types

Currently Rspamd defines the following worker types:

- [normal](normal.md): this worker is designed to scan mail messages
- [controller](controller.md): this worker performs configuration actions, such as
learning, adding fuzzy hashes and serving web interface requests
- [fuzzy_storage](fuzzy_storage.md): stores fuzzy hashes
- [lua](lua_worker.md): runs custom lua scripts

## Workers connections

All client applications should interact with two main workers: `normal` and `controller`.
Both of these workers use `HTTP` protocol for all operations and rely on HTTP headers
to get extra information from a client. Depending on network configuration, it might be
useful to bind all workers to the loopback interface preventing all interaction from the
outside. Rspamd workers are **not** supposed to run in an unprotected environment, such as
Internet. Currently there is neither secrecy nor integrity control in these protocols and
using of plain HTTP might leak sensitive information.

[Fuzzy worker](fuzzy_storage.md) is different: it is intended to serve external requests, however, it
listens on an UDP port and does not save any state information.

## Common workers options

All workers shares a set of common options. Here is a typical example of a normal
worker configuration that uses merely common worker options:

~~~ucl
worker {
type = "normal";
bind_socket = "*:11333";
}
~~~

Here are options available to all workers:

- `type` - a **mandatory** string that defines type of worker.
- `bind_socket` - a string that defines bind address of a worker.
- `count` - number of worker instances to run (some workers ignore that option, e.g. `fuzzy_storage`)

`bind_socket` is the mostly common used option. It defines the address where worker should accept
connections. Rspamd allows both names and IP addresses for this option:

~~~ucl
bind_socket = "localhost:11333";
bind_socket = "127.0.0.1:11333";
bind_socket = "[::1]:11333"; # note that you need to enclose ipv6 in '[]'
~~~

Also universal listening addresses are defined:

~~~ucl
bind_socket = "*:11333"; # any ipv4 and ipv6 address
bind_socket = "*v4:11333"; # any ipv4 address
bind_socket = "*v6:11333"; # any ipv6 address
~~~

Moreover, you can specify systemd sockets if Rspamd is invoked by systemd:

~~~ucl
bind_socket = "systemd:1"; # the first socket passed by systemd throught environment
~~~

For unix sockets, it is also possible to specify owner and mode using this syntax:

~~~ucl
bind_socket = "/tmp/rspamd.sock mode=0666 owner=user";
~~~

Without owner and mode, Rspamd uses the active user as owner (e.g. if started by root,
then `root` is used) and `0644` as access mask. Please mention that you need to specify
**octal** number for mode, namely prefixed by a zero. Otherwise, modes like `666` will produce
a weird result.

You can specify multiple `bind_socket` options to listen on as many addresses as
you want.

+ 0
- 3
doc/markdown/workers/lua_worker.md Ver fichero

@@ -1,3 +0,0 @@
# Lua worker

TODO

+ 0
- 29
doc/markdown/workers/normal.md Ver fichero

@@ -1,29 +0,0 @@
# Rspamd normal worker

Rspamd normal worker is intended to scan messages for spam. It has the following configuration options available:

* `mime`: turn to `off` if you want to scan non-mime messages (e.g. forum comments or SMS), default: `on`
* `allow_learn`: turn to `on` if you want to learn messages using this worker (usually you should use [controller](controller.md) worker), default: `off`
* `timeout`: input/output timeout, default: `1min`
* `task_timeout`: maximum time to process a single task, default: `8s`
* `max_tasks`: maximum count of tasks processes simultaneously, default: `0` - no limit
* `keypair`: encryption keypair

## Encryption support

To generate a keypair for the scanner you could use:

rspamadm keypair -u

After that keypair should appear as following:

~~~ucl
keypair {
pubkey = "tm8zjw3ougwj1qjpyweugqhuyg4576ctg6p7mbrhma6ytjewp4ry";
privkey = "ykkrfqbyk34i1ewdmn81ttcco1eaxoqgih38duib1e7b89h9xn3y";
}
~~~

You can use its **public** part thereafter when scanning messages as following:

rspamc --key tm8zjw3ougwj1qjpyweugqhuyg4576ctg6p7mbrhma6ytjewp4ry <file>

Cargando…
Cancelar
Guardar