From 9ebfb824efb4c1fd5325c9451669bf9c5cb5c544 Mon Sep 17 00:00:00 2001 From: Vsevolod Stakhov Date: Mon, 30 Dec 2013 01:29:42 +0000 Subject: [PATCH] More documentation. --- doc/markdown/architecture/index.md | 32 ++++++++++-- doc/markdown/modules/chartable.md | 0 doc/markdown/modules/dkim.md | 0 doc/markdown/modules/emails.md | 0 doc/markdown/modules/forged_recipients.md | 0 doc/markdown/modules/fuzzy_check.md | 0 doc/markdown/modules/index.md | 64 +++++++++++++++++++++++ doc/markdown/modules/maillist.md | 0 doc/markdown/modules/multimap.md | 0 doc/markdown/modules/once_received.md | 0 doc/markdown/modules/phishing.md | 0 doc/markdown/modules/ratelimit.md | 0 doc/markdown/modules/rbl.md | 0 doc/markdown/modules/regexp.md | 0 doc/markdown/modules/spf.md | 0 doc/markdown/modules/surbl.md | 0 doc/markdown/modules/trie.md | 0 17 files changed, 92 insertions(+), 4 deletions(-) create mode 100644 doc/markdown/modules/chartable.md create mode 100644 doc/markdown/modules/dkim.md create mode 100644 doc/markdown/modules/emails.md create mode 100644 doc/markdown/modules/forged_recipients.md create mode 100644 doc/markdown/modules/fuzzy_check.md create mode 100644 doc/markdown/modules/maillist.md create mode 100644 doc/markdown/modules/multimap.md create mode 100644 doc/markdown/modules/once_received.md create mode 100644 doc/markdown/modules/phishing.md create mode 100644 doc/markdown/modules/ratelimit.md create mode 100644 doc/markdown/modules/rbl.md create mode 100644 doc/markdown/modules/regexp.md create mode 100644 doc/markdown/modules/spf.md create mode 100644 doc/markdown/modules/surbl.md create mode 100644 doc/markdown/modules/trie.md diff --git a/doc/markdown/architecture/index.md b/doc/markdown/architecture/index.md index 2b1503933..33669f2d5 100644 --- a/doc/markdown/architecture/index.md +++ b/doc/markdown/architecture/index.md @@ -3,7 +3,7 @@ ## Introduction Rspamd is a universal spam filtering system based on event-driven processing -model. It means that rspamd is intented not to block anywhere in the code. To +model. It means that rspamd is intended not to block anywhere in the code. To process messages rspamd uses a set of so called `rules`. Each `rule` is a symbolic name associated with some message property. For example, we can define the following rules: @@ -13,7 +13,7 @@ rules: - FORGED_OUTLOOK_MID - message ID seems to be forged for Outlook MUA. Rules are defined by [modules](../modules/). So far, if there is a module that -performs SPF checks it may define several rules accroding to SPF policy: +performs SPF checks it may define several rules according to SPF policy: - SPF_ALLOW - a sender is allowed to send messages for this domain; - SPF_DENY - a sender is denied by SPF policy; @@ -49,8 +49,9 @@ means the opposite. ### Rules scheduler -To avoid unnecessary checks rspamd uses scheduler of rules for each message. This -scheduler is rather naive and it performs the following logic: +To avoid unnecessary checks rspamd uses scheduler of rules for each message. So far, +if a message is considered as `definite spam` then further checks are not performed. +This scheduler is rather naive and it performs the following logic: - select negative rules *before* positive ones to prevent false positives; - prefer rules with the following characteristics: @@ -91,3 +92,26 @@ a resulting symbol with weight from 0 to 5.0. To distribute values in the proper way, rspamd usually uses some sort of Sigma function to provide fair distribution curve. Nevertheless, the most of rspamd rules uses static weights with the exception of fuzzy rules. + +## Statistic + +Rspamd uses statistic algorithms to precise the final score of a message. Currently, +the only algorithm defined is OSB-Bayes. You may find the concrete details of this +algorithm in the following [paper](http://osbf-lua.luaforge.net/papers/osbf-eddc.pdf). +Rspamd uses window size of 5 words in its classification. During classification procedure, +rspamd split a message to a set of tokens. + +Tokens are separated by punctiation or space characters. Short tokens (less than 3 symbols) are ignored. For each token rspamd +calculates two non-cryptographic hashes used subsequently as indices. All these tokens +are stored in memory-mapped files called `statistic files` (or `statfiles`). Each statfile +is a set of token chains, indexed by the first hash. A new token may be inserted to some +chain, and if this chain is full then rspamd tries to expire less significant tokens to +insert a new one. It is possible to obtain the current state of tokens by running + + rspamc stat` + +command that asks controller for free and used tokens in each statfile. +Please note that if a statfile is close to be completely filled then during subsequent +learning you will loose existing data. Therefore, it is recommended to increase size for +such statfiles. + diff --git a/doc/markdown/modules/chartable.md b/doc/markdown/modules/chartable.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/dkim.md b/doc/markdown/modules/dkim.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/emails.md b/doc/markdown/modules/emails.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/forged_recipients.md b/doc/markdown/modules/forged_recipients.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/fuzzy_check.md b/doc/markdown/modules/fuzzy_check.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/index.md b/doc/markdown/modules/index.md index e69de29bb..63c55ed2b 100644 --- a/doc/markdown/modules/index.md +++ b/doc/markdown/modules/index.md @@ -0,0 +1,64 @@ +# Rspamd modules + +Rspamd ships with a set of modules. Some modules are written in C to speedup +complex procedures while others are written in lua to reduce code size. +Actually, new modules are encouraged to be written in lua and add the essential +support to the Lua API itself. Truly speaking, lua modules are very close to +C modules in terms of performance. However, lua modules can be written and loaded +dynamically. + +## C Modules + +C modules provides core functionality of rspamd and are actually statically linked +to the main rspamd code. C modules are defined in the `options` section of rspamd +configuration. If no `filters` attribute is defined then all modules are disabled. +The default configuration enables all modules explicitly: + +~~~nginx +filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check"; +~~~ + +Here is the list of C modules available: + +- [regexp](regexp.md): the core module that allow to define regexp rules, +rspamd internal functions and lua rules. +- [surbl](surbl.md): this module extracts URLs from messages and check them against +public DNS black lists to filter messages with malicious URLs. +- [spf](spf.md): checks SPF records for messages processed. +- [dkim](dkim.md): performs DKIM signatures checks. +- [fuzzy_check](fuzzy_check.md): checks messages fuzzy hashes against public blacklists. +- [chartable](chartable.md): checks character sets of text parts in messages. + +## Lua modules + +Lua modules are dynamically loaded on rspamd startup and are reloaded on rspamd +reconfiguration. Should you want to write a lua module consult with the +[Lua API documentation](../lua/). To define path to lua modules there is a special section +named `modules` in rspamd: + +~~~nginx +modules { + path = "/path/to/dir/"; + path = "/path/to/module.lua"; + path = "$PLUGINSDIR/lua"; +} +~~~ + +If a path is a directory then rspamd scans it for `*.lua" pattern and load all +files matched. + +Here is the list of Lua modules shipped with rspamd: + +- [multimap](multimap.md) - a complex module that operates with different types +of maps. +- [rbl](rbl.md) - a plugin that checks messages against DNS blacklist based on +either SMTP FROM addresses or on information from `Received` headers. +- [emails](emails.md) - extract emails from a message and checks it against DNS +blacklists. +- [maillist](maillist.md) - determines the common mailing list signatures in a message. +- [once_received](once_received.md) - detects messages with a single `Received` headers +and performs some additional checks for such messages. +- [phishing](phishing.md) - detects messages with phished URLs. +- [ratelimit](ratelimit.md) - implements leaked bucket algorithm for ratelimiting and +uses `redis` to store data. +- [trie](trie.md) - uses suffix trie for extra-fast patterns lookup in messages. \ No newline at end of file diff --git a/doc/markdown/modules/maillist.md b/doc/markdown/modules/maillist.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/multimap.md b/doc/markdown/modules/multimap.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/once_received.md b/doc/markdown/modules/once_received.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/phishing.md b/doc/markdown/modules/phishing.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/ratelimit.md b/doc/markdown/modules/ratelimit.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/rbl.md b/doc/markdown/modules/rbl.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/regexp.md b/doc/markdown/modules/regexp.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/spf.md b/doc/markdown/modules/spf.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/surbl.md b/doc/markdown/modules/surbl.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/markdown/modules/trie.md b/doc/markdown/modules/trie.md new file mode 100644 index 000000000..e69de29bb -- 2.39.5