## Introduction
-Rspamd is a universal spam filtering system based on event-driven processing
+Rspamd is a universal spam filtering system based on event-driven processing
model. It means that rspamd is intended not to block anywhere in the code. To
process messages rspamd uses a set of so called `rules`. Each `rule` is a symbolic
name associated with some message property. For example, we can define the following
that C modules are embeded all the time and can be enabled in `filters` attribute
in the `options` section of the config:
-~~~nginx
+~~~ucl
options {
filters = "regexp,surbl,spf,dkim,fuzzy_check,chartable,email";
...
These actions are just recommendations for MTA and are not to be strictly followed.
For all actions that are greater or equal than `greylist` it is recommended to
perform explicit greylisting. `Add header` and `rewrite subject` actions are very
-close in semantics and are both considered as `probable spam`. `Reject` is a
+close in semantics and are both considered as `probable spam`. `Reject` is a
strong rule that usually means that a message should be really rejected by MTA.
The triggering score for these actions should be specified according to their logic
priorities. If two actions have the same weight, the result is unspecified.
the only algorithm defined is OSB-Bayes. You may find the concrete details of this
algorithm in the following [paper](http://osbf-lua.luaforge.net/papers/osbf-eddc.pdf).
Rspamd uses window size of 5 words in its classification. During classification procedure,
-rspamd split a message to a set of tokens.
+rspamd split a message to a set of tokens.
Tokens are separated by punctiation or space characters. Short tokens (less than 3 symbols) are ignored. For each token rspamd
calculates two non-cryptographic hashes used subsequently as indices. All these tokens
chain, and if this chain is full then rspamd tries to expire less significant tokens to
insert a new one. It is possible to obtain the current state of tokens by running
- rspamc stat
+ rspamc stat
command that asks controller for free and used tokens in each statfile.
Please note that if a statfile is close to be completely filled then during subsequent
such statfiles.
## Running rspamd
-
-There are several command-line options that can be passed to rspamd. All of them can be displayed by passing `--help` argument:
-All options are optional: by default rspamd would try to read `etc/rspamd.conf` config file and run as daemon. Also there is test mode that can be turned on by passing `-t` argument. In test mode, rspamd reads config file and checks its syntax. If a configuration file is OK, then exit code is zero. Test mode is useful for testing new config file withou rspamd restart. `--convert-config` option can be used to convert old style (pre 0.6.0) config to [ucl](../configuration/ucl.md) one:
+There are several command-line options that can be passed to rspamd. All of them can be displayed by passing `--help` argument:
+
+All options are optional: by default rspamd would try to read `etc/rspamd.conf` config file and run as daemon. Also there is test mode that can be turned on by passing `-t` argument. In test mode, rspamd reads config file and checks its syntax. If a configuration file is OK, then exit code is zero. Test mode is useful for testing new config file withou rspamd restart. `--convert-config` option can be used to convert old style (pre 0.6.0) config to [ucl](../configuration/ucl.md) one:
$ rspamd -c ./rspamd.xml --convert-conf ./rspamd.conf
-
+
## Managing rspamd using signals
-First of all, it is important to note that all user's signals should be sent to rspamd main process and not to its children (as for child processes these signals can have other meanings). To determine which process is main you can use two ways:
+First of all, it is important to note that all user's signals should be sent to rspamd main process and not to its children (as for child processes these signals can have other meanings). To determine which process is main you can use two ways:
-- by reading pidfile:
+- by reading pidfile:
$ cat pidfile
-- by getting process info:
+- by getting process info:
$ ps auxwww | grep rspamd
nobody 28378 0.0 0.2 49744 9424 rspamd: main process
nobody 64083 0.0 0.3 51792 11036 rspamd: worker process
nobody 64084 0.0 2.7 158288 114200 rspamd: controller process
nobody 64085 0.0 1.8 116304 75228 rspamd: fuzzy storage
-
+
$ ps auxwww | grep rspamd | grep main
nobody 28378 0.0 0.2 49744 9424 rspamd: main process
After getting the pid of main process it is possible to manage rspamd with signals:
-
-- `SIGHUP` - restart rspamd: reread config file, start new workers (as well as controller and other processes), stop accepting connections by old workers, reopen all log files. Note that old workers would be terminated after one minute that should allow to process all pending requests. All new requests to rspamd will be processed by newly started workers.
+
+- `SIGHUP` - restart rspamd: reread config file, start new workers (as well as controller and other processes), stop accepting connections by old workers, reopen all log files. Note that old workers would be terminated after one minute that should allow to process all pending requests. All new requests to rspamd will be processed by newly started workers.
- `SIGTERM` - terminate rspamd system.
-- `SIGUSR1` - reopen log files (useful for log files rotation).
+- `SIGUSR1` - reopen log files (useful for log files rotation).
-These signals may be used in start scripts as it is done in `FreeBSD` start script. Restarting of rspamd is performed softly: no connections are dropped and if a new config is incorrect then the old config is used.
\ No newline at end of file
+These signals may be used in start scripts as it is done in `FreeBSD` start script. Restarting of rspamd is performed softly: no connections are dropped and if a new config is incorrect then the old config is used.
For example, you can define a composite that is added when two of symbols are found:
-~~~nginx
+~~~ucl
composite {
name = "TEST_COMPOSITE";
expression = "SYMBOL1 and SYMBOL2";
You also can use braces to define priorities. Otherwise operators are evaluated from left to right.
For example:
-~~~nginx
+~~~ucl
composite {
name = "TEST";
expression = "SYMBOL1 and SYMBOL2 and ( not SYMBOL3 | not SYMBOL4 | not SYMBOL5 )";
~~~
Composite rule can include other composites in the body. There is no restriction of definition order:
-~~~nginx
+~~~ucl
composite {
name = "TEST1";
expression = "SYMBOL1 AND TEST2";
It is also possible to include the whole group of symbols to a composite rule. This
efficiently means **any** symbol of the specified group:
-~~~nginx
+~~~ucl
composite {
name = "TEST2";
expression = "SYMBOL2 && !g:mua";
}
-~~~
\ No newline at end of file
+~~~
The basic rspamd configuration is stored in $CONFDIR/rspamd.conf. By default, this
file looks like this one:
-~~~nginx
+~~~ucl
lua = "$CONFDIR/lua/rspamd.lua"
.include "$CONFDIR/options.conf"
and internal modules) in [modules](../modules/index.md) section while modules itself are
loaded from the following portion of configuration:
-~~~nginx
+~~~ucl
modules {
path = "$PLUGINSDIR/lua/"
}
Each metric is defined by a `metric` object in rspamd configuration. This object has one
mandatory attribute - `name` which defines the name of this metric:
-~~~nginx
+~~~ucl
metric {
# Define default metric
name = "default";
Actions section is an object of all actions defined by this metric. If some actions are skipped,
they won't be ever suggested by rspamd. Actions section looks as following:
-~~~nginx
+~~~ucl
metric {
...
actions {
So far, the symbol definition looks like this one:
-~~~nginx
+~~~ucl
symbol {
name = "RWL_SPAMHAUS_WL_IND";
weight = -0.7;
which could be useful, for instance if some specific group should not unconditionally send a message
to `spam` class. Here is an example of such a functionality:
-~~~nginx
+~~~ucl
metric {
name = default; # This is mandatory option
}
}
}
-~~~
\ No newline at end of file
+~~~
Options section defines the basic rspamd behaviour and are global for all types of workers.
The default options are depicted in the following example configuration snippet:
-~~~nginx
+~~~ucl
filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check";
raw_mode = false;
one_shot = false;
* `nameserver`: list (or array) of DNS servers to be used (if this option is missed, then `/etc/resolv.conf` is parsed instead). It is also possible to specify weights of DNS servers to balance the payload, e.g.
-~~~nginx
+~~~ucl
options {
dns {
# 9/10 on 127.0.0.1 and 1/10 to 8.8.8.8
To load settings as dynamic map, you can set 'settings' to a map string:
-~~~nginx
+~~~ucl
settings = "http://host/url"
~~~
If you don't want dynamic updates then you can set settings to an object:
-~~~nginx
+~~~ucl
settings {
setting1 = {
...
The settings file itself should contain a single section called "settings":
-~~~nginx
+~~~ucl
settings {
some_users {
priority = high;
* in nginx like:
-~~~nginx
+~~~ucl
param = value;
section {
param = value;
string = "something";
subsection {
host = {
- host = "hostname";
+ host = "hostname";
port = 900;
}
host = {
* There is no requirement of quotes for strings and keys, moreover, `:` may be replaced `=` or even be skipped for objects:
-~~~nginx
+~~~ucl
key = value;
section {
key = value;
UCL accepts named keys and organize them into objects hierarchy internally. Here is an example of this process:
-~~~nginx
+~~~ucl
section "blah" {
key = value;
}
is converted to the following object:
-~~~nginx
+~~~ucl
section {
blah {
key = value;
}
}
~~~
-
+
Plain definitions may be more complex and contain more than a single level of nested objects:
-
-~~~nginx
+
+~~~ucl
section "blah" "foo" {
key = value;
}
is presented as:
-~~~nginx
+~~~ucl
section {
blah {
foo {
UCL supports different style of comments:
-* single line: `#`
+* single line: `#`
* multiline: `/* ... */`
Multiline comments may be nested:
~~~c
# Sample single line comment
-/*
+/*
some comment
/* nested comment */
end of comment
UCL supports external macros both multiline and single line ones:
-~~~nginx
+~~~ucl
.macro "sometext";
.macro {
Some long text
arguments themselves are the UCL object that is parsed and passed to a macro as
options:
-~~~nginx
+~~~ucl
.macro(param=value) "something";
.macro(param={key=value}) "something";
.macro(.include "params.conf") "something";
UCL also provide a convenient `include` macro to load content from another files
to the current UCL object. This macro accepts either path to file:
-~~~nginx
+~~~ucl
.include "/full/path.conf"
.include "./relative/path.conf"
.include "${CURDIR}/path.conf"
key <<EOD
-
+
some
text
-
+
EOD
jansson: parsed json in 1.3899 seconds
jansson: emitted object in 0.2609 seconds
-
+
ucl: parsed input in 0.6649 seconds
ucl: emitted config in 0.2423 seconds
ucl: emitted json in 0.2329 seconds
Many lua rules are shipped with rspamd. They can be included to rspamd by using tag **lua** in rspamd.conf:
-~~~nginx
+~~~ucl
lua = "$CONFDIR/lua/rspamd.lua"
~~~
If you need old behaviour, then you'd need to use separate classifier
for per-user statistics, for example:
-~~~nginx
+~~~ucl
classifier {
tokenizer {
name = "osb";
statistics model used in pre 1.0 versions. Therefore, to use all these advantages you should either **relearn**
your statistics or continue using your old statistics **without** new features by adding `compat` parameter:
-~~~nginx
+~~~ucl
classifier {
...
tokenizer {
The recommended way to create statistics now is `sqlite3` backend (which is incompatible with old mmap backend however):
-~~~nginx
+~~~ucl
classifier {
type = "bayes";
tokenizer {
Here is an example of the full configuration of rspamd controller worker to
serve webui:
-~~~nginx
+~~~ucl
worker {
type = "controller";
bind_socket = "localhost:11334";
Rspamd now uses `HTTP` protocols for all operations, therefore an additional
client library is unlikely needed. The fallback to old `spamc` protocol has also
been implemented automatically to be compatible with `rmilter` and other software
-that uses `rspamc` protocol.
\ No newline at end of file
+that uses `rspamc` protocol.
This module allows to find number of characters from the different [unicode scripts](http://www.unicode.org/reports/tr24/). Finally, it evaluates number of scrips changes, e.g. 'a網絡a' is treated as 2 script changes - from latin to chineese and from chineese back to latin, divided by total number of unicode characters. If the product of this division is higher than threshold then a symbol is inserted. By default threshold is `0.1` meaning that script changes occurrs approximantely for 10% of characters.
-~~~nginx
+~~~ucl
chartable {
symbol = "R_CHARSET_MIXED";
threshold = 0.1;
}
-~~~
\ No newline at end of file
+~~~
DMARC configuration is very simple:
-~~~nginx
+~~~ucl
dmarc {
servers = "localhost:6390";
key_prefix = "dmarc_"; # Keys would have format of dmarc_domain.com
where results are `true` or `false` meaning allow and reject values accordingly.
Unixtime and IP are inserted in text form. Keys are therefore `lists` in redis terminology.
-Keys are inserted to redis servers when a server is selected by hash value from sender's domain.
\ No newline at end of file
+Keys are inserted to redis servers when a server is selected by hash value from sender's domain.
list to check or learn and a set of flags and optional parameters. Here is an example of
rule's settings:
-~~~nginx
+~~~ucl
fuzzy_check {
rule {
# List of servers, can be an array or multi-value item
fuzzy storage can contain both good and bad hashes that should have different symbols
and thus different weights. Maps are defined inside fuzzy rules as following:
-~~~nginx
+~~~ucl
fuzzy_check {
rule {
...
added with the weight of `1`. If `max_score` is `200`, then the rule will be added with the
weight likely `0.2` (the real function is hyperbolic tangent). In the following configuration:
-~~~nginx
+~~~ucl
metric {
name = "default";
...
rspamc -f <flag> fuzzy_del ...
On learning, rspamd sends commands to **all** servers inside specific rule. On check,
-rspamd selects a server in round-robin matter.
\ No newline at end of file
+rspamd selects a server in round-robin matter.
configuration. If no `filters` attribute is defined then all modules are disabled.
The default configuration enables all modules explicitly:
-~~~nginx
+~~~ucl
filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check";
~~~
[Lua API documentation](../lua/). To define path to lua modules there is a special section
named `modules` in rspamd:
-~~~nginx
+~~~ucl
modules {
path = "/path/to/dir/";
path = "/path/to/module.lua";
- [phishing](phishing.md) - detects messages with phished URLs.
- [ratelimit](ratelimit.md) - implements leaked bucket algorithm for ratelimiting and
uses `redis` to store data.
-- [trie](trie.md) - uses suffix trie for extra-fast patterns lookup in messages.
\ No newline at end of file
+- [trie](trie.md) - uses suffix trie for extra-fast patterns lookup in messages.
No filters will be processed for a message if such a map matches.
-~~~nginx
+~~~ucl
multimap {
test { type = "ip"; map = "/tmp/ip.map"; symbol = "TESTMAP"; }
spamhaus { type = "dnsbl"; map = "pbl.spamhaus.org"; symbol = "R_IP_PBL";
Here are some examples of pre-filter configurations:
-~~~nginx
+~~~ucl
sender_from_whitelist_user {
- type = "from";
- filter = "email:user";
- map = "file:///tmp/from.map";
- symbol = "SENDER_FROM_WHITELIST_USER";
- action = "accept"; # Prefilter mode
+ type = "from";
+ filter = "email:user";
+ map = "file:///tmp/from.map";
+ symbol = "SENDER_FROM_WHITELIST_USER";
+ action = "accept"; # Prefilter mode
}
sender_from_regexp {
- type = "header";
- header = "from";
- filter = "regexp:/.*@/";
- map = "file:///tmp/from_re.map";
- symbol = "SENDER_FROM_REGEXP";
+ type = "header";
+ header = "from";
+ filter = "regexp:/.*@/";
+ map = "file:///tmp/from_re.map";
+ symbol = "SENDER_FROM_REGEXP";
}
url_map {
- type = "url";
- filter = "tld";
- map = "file:///tmp/url.map";
- symbol = "URL_MAP";
+ type = "url";
+ filter = "tld";
+ map = "file:///tmp/url.map";
+ symbol = "URL_MAP";
}
url_tld_re {
- type = "url";
- filter = "tld:regexp:/\.[^.]+$/"; # Extracts the last component of URL
- map = "file:///tmp/url.map";
- symbol = "URL_MAP_RE";
+ type = "url";
+ filter = "tld:regexp:/\.[^.]+$/"; # Extracts the last component of URL
+ map = "file:///tmp/url.map";
+ symbol = "URL_MAP_RE";
}
-~~~
\ No newline at end of file
+filename_blacklist {
+ type = "filename";
+ filter = "extension";
+ map = "/${LOCAL_CONFDIR}/filename.map";
+ symbol = "FILENAME_BLACKLISTED";
+ action = "reject";
+}
+~~~
## Example
-~~~nginx
+~~~ucl
once_received {
good_host = "^mail";
bad_host = "static";
Here is an example of full module configuration.
-~~~nginx
+~~~ucl
phishing {
symbol = "R_PHISHING"; # Default symbol
Configuration is structured as follows:
-~~~nginx
+~~~ucl
rbl {
# default settings defined here
rbls {
RBL-specific subsection is structured as follows:
-~~~nginx
+~~~ucl
# Descriptive name of RBL or symbol if symbol is not defined.
an_rbl {
# Explicitly defined symbol
is very simple: just glue all your SA rules into a single file and feed it to
spamassassin module:
-~~~nginx
+~~~ucl
spamassassin {
ruleset = "/path/to/file";
# Limit search size to 100 kilobytes for all regular expressions
You can manually specify the size of this cache by configuring SPF module:
-~~~nginx
+~~~ucl
spf {
spf_cache_size = 1k; # cache up to 1000 of the most recent SPF records
}
~~~
Currently, rspamd supports the full set of SPF elements, macroes and has internal
-protection from DNS recursion.
\ No newline at end of file
+protection from DNS recursion.
Nonetheless, they can be used by personal services or low volume requests free
of charge.
-~~~nginx
+~~~ucl
surbl {
# List of domains that are not checked by surbl
whitelist = "file://$CONFDIR/surbl-whitelist.inc";
Since some URL lists do not accept `IP` addresses, it is also possible to disable sending of URLs with IP address in the host to such lists. That could be done by specifying `noip = true` option:
-~~~nginx
+~~~ucl
rule {
suffix = "dbl.spamhaus.org";
symbol = "DBL";
It is also possible to check HTML images URLs using URL blacklists. Just specify `images = true` for such list and you are done:
-~~~nginx
+~~~ucl
rule {
suffix = "uribl.rambler.ru";
# Also check images
For example, [SBL list](https://www.spamhaus.org/sbl/) of `spamhaus` project provides such functions using `ZEN` multi list. This is included in rspamd default configuration:
-~~~nginx
+~~~ucl
rule {
suffix = "zen.spamhaus.org";
symbol = "ZEN_URIBL";
URIBL_SBL = "127.0.0.2";
}
}
-~~~
\ No newline at end of file
+~~~
Here is an example of trie configuration:
-~~~nginx
+~~~ucl
trie {
# Each subsection defines a single rule with associated symbol
SYMBOL1 {
strings. Moreover, it cannot distinguish words boundaries, for example, a string
`test` will be found in texts `test`, `tests` or even `123testing`. Therefore, it
might be used to search some concrete and relatively specific patterns and should
-not be used for words match.
\ No newline at end of file
+not be used for words match.
## Configuration example
-~~~nginx
+~~~ucl
whitelist {
rules {
WHITELIST_SPF = {
}
~~~
-Rspamd also comes with a set of pre-defined whitelisted domains that could be useful for start.
\ No newline at end of file
+Rspamd also comes with a set of pre-defined whitelisted domains that could be useful for start.
For those who has a lot of custom rules, there is good news: rspamd supports a certain set of SpamAssassin rules via special [plugin](../modules/spamassassin.md) that allows **direct** loading of SA rules into rspamd. You just need to specify all your configuration files in the plugin configuration:
-~~~nginx
+~~~ucl
spamassassin {
sa_main = "/etc/spamassassin/conf.d/*";
sa_local = "/etc/spamassassin/local.cf";
learn-spam123: "| rspamc learn_spam"
learn-ham123: "| rspamc learn_ham"
-You'd need some less predictable aliases to avoid sending messages to such addresses by some adversary or just by a mistake to prevent statistics pollution.
\ No newline at end of file
+You'd need some less predictable aliases to avoid sending messages to such addresses by some adversary or just by a mistake to prevent statistics pollution.
rspamd.conf:
-~~~nginx
+~~~ucl
var1 = "value1";
section "name" {
rspamd.conf.local:
-~~~nginx
+~~~ucl
var1 = "value2";
section "name" {
Resulting config:
-~~~nginx
+~~~ucl
var1 = "value1";
var1 = "value2";
rspamd.conf:
-~~~nginx
+~~~ucl
var1 = "value1";
section "name" {
rspamd.conf.override:
-~~~nginx
+~~~ucl
var1 = "value2";
section "name" {
Resulting config:
-~~~nginx
+~~~ucl
var1 = "value2";
# Note that var2 is removed completely
1. Define scores in `rspamd.conf.local` as following:
-~~~nginx
+~~~ucl
metric "default" {
symbol "MY_SYMBOL" {
description = "my cool rule";
After that keypair should appear as following:
-~~~nginx
+~~~ucl
keypair {
pubkey = "tm8zjw3ougwj1qjpyweugqhuyg4576ctg6p7mbrhma6ytjewp4ry";
privkey = "ykkrfqbyk34i1ewdmn81ttcco1eaxoqgih38duib1e7b89h9xn3y";
Here is an example configuration of fuzzy storage:
-~~~nginx
+~~~ucl
worker {
type = "fuzzy";
bind_socket = "*:11335";
Rspamd fuzzy storage of version `0.8` can work with rspamd clients of all versions,
however, all updates from legacy versions (less that `0.8`) won't update fuzzy shingles
database. Rspamd [fuzzy check module](../modules/fuzzy_check.md) can work **only**
-with the recent rspamd fuzzy storage (it won't get anything from the legacy storages).
\ No newline at end of file
+with the recent rspamd fuzzy storage (it won't get anything from the legacy storages).
All workers shares a set of common options. Here is a typical example of a normal
worker configuration that uses merely common worker options:
-~~~nginx
+~~~ucl
worker {
type = "normal";
bind_socket = "*:11333";
`bind_socket` is the mostly common used option. It defines the address where worker should accept
connections. Rspamd allows both names and IP addresses for this option:
-~~~nginx
+~~~ucl
bind_socket = "localhost:11333";
bind_socket = "127.0.0.1:11333";
bind_socket = "[::1]:11333"; # note that you need to enclose ipv6 in '[]'
Also universal listening addresses are defined:
-~~~nginx
+~~~ucl
bind_socket = "*:11333"; # any ipv4 and ipv6 address
bind_socket = "*v4:11333"; # any ipv4 address
bind_socket = "*v6:11333"; # any ipv6 address
Moreover, you can specify systemd sockets if rspamd is invoked by systemd:
-~~~nginx
+~~~ucl
bind_socket = "systemd:1"; # the first socket passed by systemd throught environment
~~~
For unix sockets, it is also possible to specify owner and mode using this syntax:
-~~~nginx
+~~~ucl
bind_socket = "/tmp/rspamd.sock mode=0666 owner=user";
~~~
a weird result.
You can specify multiple `bind_socket` options to listen on as many addresses as
-you want.
\ No newline at end of file
+you want.
After that keypair should appear as following:
-~~~nginx
+~~~ucl
keypair {
pubkey = "tm8zjw3ougwj1qjpyweugqhuyg4576ctg6p7mbrhma6ytjewp4ry";
privkey = "ykkrfqbyk34i1ewdmn81ttcco1eaxoqgih38duib1e7b89h9xn3y";