More documentation.

author Vsevolod Stakhov <vsevolod@highsecure.ru>

Mon, 30 Dec 2013 01:29:42 +0000 (01:29 +0000)

committer Vsevolod Stakhov <vsevolod@highsecure.ru>

Mon, 30 Dec 2013 01:29:42 +0000 (01:29 +0000)
author Vsevolod Stakhov <vsevolod@highsecure.ru>
Mon, 30 Dec 2013 01:29:42 +0000 (01:29 +0000)
committer Vsevolod Stakhov <vsevolod@highsecure.ru>
Mon, 30 Dec 2013 01:29:42 +0000 (01:29 +0000)
diff --git a/doc/markdown/architecture/index.md b/doc/markdown/architecture/index.md

index 2b1503933a469561f116a0fdb2a31245d25210f1..33669f2d538c6ccd5db9e4f282751d88c7d16ca1 100644 (file)
--- a/doc/markdown/architecture/index.md
+++ b/doc/markdown/architecture/index.md
@@ -3,7 +3,7 @@
  ## Introduction
  
  Rspamd is a universal spam filtering system based on event-driven processing 
-model. It means that rspamd is intented not to block anywhere in the code. To
+model. It means that rspamd is intended not to block anywhere in the code. To
  process messages rspamd uses a set of so called `rules`. Each `rule` is a symbolic
  name associated with some message property. For example, we can define the following
  rules:
@@ -13,7 +13,7 @@ rules:
  - FORGED_OUTLOOK_MID - message ID seems to be forged for Outlook MUA.
  
  Rules are defined by [modules](../modules/). So far, if there is a module that
-performs SPF checks it may define several rules accroding to SPF policy:
+performs SPF checks it may define several rules according to SPF policy:
  
  - SPF_ALLOW - a sender is allowed to send messages for this domain;
  - SPF_DENY - a sender is denied by SPF policy;
@@ -49,8 +49,9 @@ means the opposite.
  
  ### Rules scheduler
  
-To avoid unnecessary checks rspamd uses scheduler of rules for each message. This
-scheduler is rather naive and it performs the following logic:
+To avoid unnecessary checks rspamd uses scheduler of rules for each message. So far,
+if a message is considered as `definite spam` then further checks are not performed.
+This scheduler is rather naive and it performs the following logic:
  
  - select negative rules *before* positive ones to prevent false positives;
  - prefer rules with the following characteristics:
@@ -91,3 +92,26 @@ a resulting symbol with weight from 0 to 5.0. To distribute values in the proper
  way, rspamd usually uses some sort of Sigma function to provide fair distribution curve.
  Nevertheless, the most of rspamd rules uses static weights with the exception of
  fuzzy rules.
+
+## Statistic
+
+Rspamd uses statistic algorithms to precise the final score of a message. Currently,
+the only algorithm defined is OSB-Bayes. You may find the concrete details of this
+algorithm in the following [paper](http://osbf-lua.luaforge.net/papers/osbf-eddc.pdf).
+Rspamd uses window size of 5 words in its classification. During classification procedure,
+rspamd split a message to a set of tokens. 
+
+Tokens are separated by punctiation or space characters. Short tokens (less than 3 symbols) are ignored. For each token rspamd
+calculates two non-cryptographic hashes used subsequently as indices. All these tokens
+are stored in memory-mapped files called `statistic files` (or `statfiles`). Each statfile
+is a set of token chains, indexed by the first hash. A new token may be inserted to some
+chain, and if this chain is full then rspamd tries to expire less significant tokens to
+insert a new one. It is possible to obtain the current state of tokens by running
+
+    rspamc stat` 
+
+command that asks controller for free and used tokens in each statfile.
+Please note that if a statfile is close to be completely filled then during subsequent
+learning you will loose existing data. Therefore, it is recommended to increase size for
+such statfiles.
+
diff --git a/doc/markdown/modules/chartable.md b/doc/markdown/modules/chartable.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/dkim.md b/doc/markdown/modules/dkim.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/emails.md b/doc/markdown/modules/emails.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/forged_recipients.md b/doc/markdown/modules/forged_recipients.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/fuzzy_check.md b/doc/markdown/modules/fuzzy_check.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/index.md b/doc/markdown/modules/index.md

index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..63c55ed2bf0879bdf6a535ae7e1a9ee0996c1982 100644 (file)
--- a/doc/markdown/modules/index.md
+++ b/doc/markdown/modules/index.md
@@ -0,0 +1,64 @@
+# Rspamd modules
+
+Rspamd ships with a set of modules. Some modules are written in C to speedup
+complex procedures while others are written in lua to reduce code size.
+Actually, new modules are encouraged to be written in lua and add the essential
+support to the Lua API itself. Truly speaking, lua modules are very close to 
+C modules in terms of performance. However, lua modules can be written and loaded
+dynamically.
+
+## C Modules
+
+C modules provides core functionality of rspamd and are actually statically linked
+to the main rspamd code. C modules are defined in the `options` section of rspamd
+configuration. If no `filters` attribute is defined then all modules are disabled.
+The default configuration enables all modules explicitly:
+
+~~~nginx
+filters = "chartable,dkim,spf,surbl,regexp,fuzzy_check";
+~~~
+
+Here is the list of C modules available:
+
+- [regexp](regexp.md): the core module that allow to define regexp rules,
+rspamd internal functions and lua rules.
+- [surbl](surbl.md): this module extracts URLs from messages and check them against
+public DNS black lists to filter messages with malicious URLs.
+- [spf](spf.md): checks SPF records for messages processed.
+- [dkim](dkim.md): performs DKIM signatures checks.
+- [fuzzy_check](fuzzy_check.md): checks messages fuzzy hashes against public blacklists.
+- [chartable](chartable.md): checks character sets of text parts in messages.
+
+## Lua modules
+
+Lua modules are dynamically loaded on rspamd startup and are reloaded on rspamd
+reconfiguration. Should you want to write a lua module consult with the 
+[Lua API documentation](../lua/). To define path to lua modules there is a special section
+named `modules` in rspamd:
+
+~~~nginx
+modules {
+  path = "/path/to/dir/";
+  path = "/path/to/module.lua";
+  path = "$PLUGINSDIR/lua";
+}
+~~~
+
+If a path is a directory then rspamd scans it for `*.lua" pattern and load all
+files matched.
+
+Here is the list of Lua modules shipped with rspamd:
+
+- [multimap](multimap.md) - a complex module that operates with different types
+of maps.
+- [rbl](rbl.md) - a plugin that checks messages against DNS blacklist based on
+either SMTP FROM addresses or on information from `Received` headers.
+- [emails](emails.md) - extract emails from a message and checks it against DNS
+blacklists.
+- [maillist](maillist.md) - determines the common mailing list signatures in a message.
+- [once_received](once_received.md) - detects messages with a single `Received` headers
+and performs some additional checks for such messages.
+- [phishing](phishing.md) - detects messages with phished URLs.
+- [ratelimit](ratelimit.md) - implements leaked bucket algorithm for ratelimiting and
+uses `redis` to store data.
+- [trie](trie.md) - uses suffix trie for extra-fast patterns lookup in messages.
+\ No newline at end of file
diff --git a/doc/markdown/modules/maillist.md b/doc/markdown/modules/maillist.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/multimap.md b/doc/markdown/modules/multimap.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/once_received.md b/doc/markdown/modules/once_received.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/phishing.md b/doc/markdown/modules/phishing.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/ratelimit.md b/doc/markdown/modules/ratelimit.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/rbl.md b/doc/markdown/modules/rbl.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/regexp.md b/doc/markdown/modules/regexp.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/spf.md b/doc/markdown/modules/spf.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/surbl.md b/doc/markdown/modules/surbl.md

new file mode 100644 (file)

index 0000000..e69de29
diff --git a/doc/markdown/modules/trie.md b/doc/markdown/modules/trie.md

new file mode 100644 (file)

index 0000000..e69de29
author	Vsevolod Stakhov <vsevolod@highsecure.ru>
	Mon, 30 Dec 2013 01:29:42 +0000 (01:29 +0000)
committer	Vsevolod Stakhov <vsevolod@highsecure.ru>
	Mon, 30 Dec 2013 01:29:42 +0000 (01:29 +0000)
doc/markdown/architecture/index.md		patch \| blob \| history
doc/markdown/modules/chartable.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/dkim.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/emails.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/forged_recipients.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/fuzzy_check.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/index.md		patch \| blob \| history
doc/markdown/modules/maillist.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/multimap.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/once_received.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/phishing.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/ratelimit.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/rbl.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/regexp.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/spf.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/surbl.md	[new file with mode: 0644]	patch \| blob
doc/markdown/modules/trie.md	[new file with mode: 0644]	patch \| blob