From 2c3794e8f11308be3d53e3761810bb2e9901970b Mon Sep 17 00:00:00 2001
From: Vsevolod Stakhov <vsevolod@highsecure.ru>
Date: Sun, 29 Dec 2013 01:21:20 +0000
Subject: [PATCH] Start architecture section.

---
 doc/markdown/architecture/index.md | 93 ++++++++++++++++++++++++++++++
 doc/markdown/index.md              | 10 ++--
 2 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/doc/markdown/architecture/index.md b/doc/markdown/architecture/index.md
index e69de29bb..2b1503933 100644
--- a/doc/markdown/architecture/index.md
+++ b/doc/markdown/architecture/index.md
@@ -0,0 +1,93 @@
+# Rspamd architecture
+
+## Introduction
+
+Rspamd is a universal spam filtering system based on event-driven processing 
+model. It means that rspamd is intented not to block anywhere in the code. To
+process messages rspamd uses a set of so called `rules`. Each `rule` is a symbolic
+name associated with some message property. For example, we can define the following
+rules:
+
+- SPF_ALLOW - means that a message is validated by SPF;
+- BAYES_SPAM - means that a message is statistically considered as spam;
+- FORGED_OUTLOOK_MID - message ID seems to be forged for Outlook MUA.
+
+Rules are defined by [modules](../modules/). So far, if there is a module that
+performs SPF checks it may define several rules accroding to SPF policy:
+
+- SPF_ALLOW - a sender is allowed to send messages for this domain;
+- SPF_DENY - a sender is denied by SPF policy;
+- SPF_SOFTFAIL - there is no affinity defined by SPF policy.
+
+Rspamd supports two main types of modules: internal written in C and external
+written in Lua. There is no real difference between these two types with the exception
+that C modules are embeded all the time and can be enabled in `filters` attribute
+in the `options` section of the config:
+
+~~~nginx
+options {
+ filters = "regexp,surbl,spf,dkim,fuzzy_check,chartable,email";
+ ...
+}
+~~~
+
+## Metrics
+
+Rules in rspamd, defines merely a logic of checks, however it is required to
+set up weights for each rule. Weight means `significance` in terms of rspamd. So
+far, rules with greater absolute value of weight are considered as more important
+than the recent rules. The weight of rules is defined in `metrics`. Each metric
+is a set of grouped rules with specific weights. For example, we may define the
+following weights for our SPF rules:
+
+- SPF_ALLOW: -1
+- SPF_DENY: 2
+- SPF_SOFTFAIL: 0.5
+
+Positive weights means that this rule turns message to more spammy, while negative
+means the opposite.
+
+### Rules scheduler
+
+To avoid unnecessary checks rspamd uses scheduler of rules for each message. This
+scheduler is rather naive and it performs the following logic:
+
+- select negative rules *before* positive ones to prevent false positives;
+- prefer rules with the following characteristics:
+  - frequent rules;
+  - rules with more weight;
+  - faster rules
+
+These optimizations can filter definite spam more quickly than a generic queue.
+
+## Actions
+
+Another important property of metrics is their actions set. This set defines recommended
+actions for a message if it reach a certain score defined by all rules triggered.
+Rspamd defines the following actions:
+
+- **No action**: a message is likely ham;
+- **Greylist**: greylist message is it is not certainly ham;
+- **Add header**: a message is likely spam, so add a specific header;
+- **Rewrite subject**: a message is likely spam, so rewrite its subject;
+- **Reject**: a message is very likely spam, so reject it completely
+
+These actions are just recommendations for MTA and are not to be strictly followed.
+For all actions that are greater or equal than *greylist* it is recommended to
+perform explicit greylisting. *Add header* and *rewrite subject* actions are very
+close in semantics and are both considered as `probable spam`. `Reject` is a 
+strong rule that usually means that a message should be really rejected by MTA.
+The triggering score for these actions should be specified according to their logic
+priorities. If two actions have the same weight, the result is unspecified.
+
+## Rules weight
+
+The weights of rules is not necessarily constant. For example, for statistics rules
+we have no certain confidence if a message is spam or not. We have some probability
+instead. To allow fuzzy rules weight, rspamd supports `dynamic weights`. Generally,
+it means that a rule may add a dynamic range from 0 to a defined weight in the metric.
+So far if we define symbol `BAYES_SPAM` with weight 5.0, then this rule can add
+a resulting symbol with weight from 0 to 5.0. To distribute values in the proper
+way, rspamd usually uses some sort of Sigma function to provide fair distribution curve.
+Nevertheless, the most of rspamd rules uses static weights with the exception of
+fuzzy rules.
diff --git a/doc/markdown/index.md b/doc/markdown/index.md
index 27e889f4d..181dc956e 100644
--- a/doc/markdown/index.md
+++ b/doc/markdown/index.md
@@ -15,8 +15,8 @@ statistics module.
 This document contains the basic documentation for rspamd spam filtering system. It is divided into the following
 parts:
 
-- [Architecture](/architecture/) presents the architecture of rspamd and how spam filtering is performed
-- [Rspamd configuration](/configuration/) describes principles of rspamd configuration
-- [Modules](/modules/) chapter lists rspamd modules and defines their configuration attributes
-- [Workers](/workers/) section describes workers that are implemented in the rspamd
-- [Lua API](/lua/) explains how to extend rspamd with own lua modules
\ No newline at end of file
+- [Architecture](architecture/) presents the architecture of rspamd and how spam filtering is performed
+- [Rspamd configuration](configuration/) describes principles of rspamd configuration
+- [Modules](modules/) chapter lists rspamd modules and defines their configuration attributes
+- [Workers](workers/) section describes workers that are implemented in the rspamd
+- [Lua API](lua/) explains how to extend rspamd with own lua modules
\ No newline at end of file
-- 
2.39.5