## Introduction
Rspamd is a universal spam filtering system based on event-driven processing
-model. It means that rspamd is intented not to block anywhere in the code. To
+model. It means that rspamd is intended not to block anywhere in the code. To
process messages rspamd uses a set of so called `rules`. Each `rule` is a symbolic
name associated with some message property. For example, we can define the following
rules:
- FORGED_OUTLOOK_MID - message ID seems to be forged for Outlook MUA.
Rules are defined by [modules](../modules/). So far, if there is a module that
-performs SPF checks it may define several rules accroding to SPF policy:
+performs SPF checks it may define several rules according to SPF policy:
- SPF_ALLOW - a sender is allowed to send messages for this domain;
- SPF_DENY - a sender is denied by SPF policy;
### Rules scheduler
-To avoid unnecessary checks rspamd uses scheduler of rules for each message. This
-scheduler is rather naive and it performs the following logic:
+To avoid unnecessary checks rspamd uses scheduler of rules for each message. So far,
+if a message is considered as `definite spam` then further checks are not performed.
+This scheduler is rather naive and it performs the following logic:
- select negative rules *before* positive ones to prevent false positives;
- prefer rules with the following characteristics:
way, rspamd usually uses some sort of Sigma function to provide fair distribution curve.
Nevertheless, the most of rspamd rules uses static weights with the exception of
fuzzy rules.
+
+## Statistic
+
+Rspamd uses statistic algorithms to precise the final score of a message. Currently,
+the only algorithm defined is OSB-Bayes. You may find the concrete details of this
+algorithm in the following [paper](http://osbf-lua.luaforge.net/papers/osbf-eddc.pdf).
+Rspamd uses window size of 5 words in its classification. During classification procedure,
+rspamd split a message to a set of tokens.
+
+Tokens are separated by punctiation or space characters. Short tokens (less than 3 symbols) are ignored. For each token rspamd
+calculates two non-cryptographic hashes used subsequently as indices. All these tokens
+are stored in memory-mapped files called `statistic files` (or `statfiles`). Each statfile
+is a set of token chains, indexed by the first hash. A new token may be inserted to some
+chain, and if this chain is full then rspamd tries to expire less significant tokens to
+insert a new one. It is possible to obtain the current state of tokens by running
+
+ rspamc stat`
+
+command that asks controller for free and used tokens in each statfile.
+Please note that if a statfile is close to be completely filled then during subsequent
+learning you will loose existing data. Therefore, it is recommended to increase size for
+such statfiles.
+
+# Rspamd modules
+
+Rspamd ships with a set of modules. Some modules are written in C to speedup
+complex procedures while others are written in lua to reduce code size.
+Actually, new modules are encouraged to be written in lua and add the essential
+support to the Lua API itself. Truly speaking, lua modules are very close to
+C modules in terms of performance. However, lua modules can be written and loaded
+dynamically.
+
+## C Modules
+
+C modules provides core functionality of rspamd and are actually statically linked
+to the main rspamd code. C modules are defined in the `options` section of rspamd
+configuration. If no `filters` attribute is defined then all modules are disabled.
+The default configuration enables all modules explicitly:
+
+~~~nginx
+filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check";
+~~~
+
+Here is the list of C modules available:
+
+- [regexp](regexp.md): the core module that allow to define regexp rules,
+rspamd internal functions and lua rules.
+- [surbl](surbl.md): this module extracts URLs from messages and check them against
+public DNS black lists to filter messages with malicious URLs.
+- [spf](spf.md): checks SPF records for messages processed.
+- [dkim](dkim.md): performs DKIM signatures checks.
+- [fuzzy_check](fuzzy_check.md): checks messages fuzzy hashes against public blacklists.
+- [chartable](chartable.md): checks character sets of text parts in messages.
+
+## Lua modules
+
+Lua modules are dynamically loaded on rspamd startup and are reloaded on rspamd
+reconfiguration. Should you want to write a lua module consult with the
+[Lua API documentation](../lua/). To define path to lua modules there is a special section
+named `modules` in rspamd:
+
+~~~nginx
+modules {
+ path = "/path/to/dir/";
+ path = "/path/to/module.lua";
+ path = "$PLUGINSDIR/lua";
+}
+~~~
+
+If a path is a directory then rspamd scans it for `*.lua" pattern and load all
+files matched.
+
+Here is the list of Lua modules shipped with rspamd:
+
+- [multimap](multimap.md) - a complex module that operates with different types
+of maps.
+- [rbl](rbl.md) - a plugin that checks messages against DNS blacklist based on
+either SMTP FROM addresses or on information from `Received` headers.
+- [emails](emails.md) - extract emails from a message and checks it against DNS
+blacklists.
+- [maillist](maillist.md) - determines the common mailing list signatures in a message.
+- [once_received](once_received.md) - detects messages with a single `Received` headers
+and performs some additional checks for such messages.
+- [phishing](phishing.md) - detects messages with phished URLs.
+- [ratelimit](ratelimit.md) - implements leaked bucket algorithm for ratelimiting and
+uses `redis` to store data.
+- [trie](trie.md) - uses suffix trie for extra-fast patterns lookup in messages.
\ No newline at end of file