Add protocol description.

author Vsevolod Stakhov <vsevolod@highsecure.ru>

Mon, 30 Mar 2015 16:48:06 +0000 (17:48 +0100)

committer Vsevolod Stakhov <vsevolod@highsecure.ru>

Mon, 30 Mar 2015 16:48:06 +0000 (17:48 +0100)
author Vsevolod Stakhov <vsevolod@highsecure.ru>
Mon, 30 Mar 2015 16:48:06 +0000 (17:48 +0100)
committer Vsevolod Stakhov <vsevolod@highsecure.ru>
Mon, 30 Mar 2015 16:48:06 +0000 (17:48 +0100)
diff --git a/doc/markdown/architecture/index.md b/doc/markdown/architecture/index.md

index 3c854a65baa187b6c8f1377a6c8988e49c7a1f1d..324c5260ab25ffbaf4e1b52c25ef8ba3f8d80ab6 100644 (file)
--- a/doc/markdown/architecture/index.md
+++ b/doc/markdown/architecture/index.md
@@ -31,6 +31,10 @@ options {
  }
  ~~~
  
+## Protocol
+
+Rspamd uses HTTP protocol for all operations. This protocol is described in the [protocol section](protocol.md).
+
  ## Metrics
  
  Rules in rspamd, defines merely a logic of checks, however it is required to
@@ -61,6 +65,9 @@ This scheduler is rather naive and it performs the following logic:
  
  These optimizations can filter definite spam more quickly than a generic queue.
  
+Since rspamd-0.9 there are more optimizations for rules and expressions that are
+roughly described in the [following presentation](http://highsecure.ru/ast-rspamd.pdf).
+
  ## Actions
  
  Another important property of metrics is their actions set. This set defines recommended
diff --git a/doc/markdown/architecture/protocol.md b/doc/markdown/architecture/protocol.md

new file mode 100644 (file)

index 0000000..182d0f4
--- /dev/null
+++ b/doc/markdown/architecture/protocol.md
@@ -0,0 +1,185 @@
+# Rspamd protocol
+
+## Protocol basics
+
+Rspamd uses HTTP protocol of either version 1.0 or 1.1. However, there is compatibility layer described further in this document.
+Rspamd defines some servicing headers that allows to pass extra information about a message scanned, such as envelope data, IP address,
+SMTP sasl authentication data and so on. Rspamd supports both normal and chunked encoded HTTP request, however, form URL encoding is **NOT** supported currently.
+
+## Rspamd HTTP request
+
+Rspamd encourages usage of HTTP protocol since it is standard and can be used by literally every programming language without exotic libraries.
+The typical HTTP request looks like the following:
+
+```
+ POST /check HTTP/1.0
+Content-Length: 26969
+From: smtp@example.com
+Pass: all
+Ip: 95.211.146.161
+Helo: localhost.localdomain
+Hostname: localhost
+
+<your message goes here>
+```
+
+You can also use chunked encoding that allows streamlined data transfer which is useful if you don't know the length of the message.
+
+### HTTP request
+
+Normally, you should just use '/check' here. However, if you talk to the controller then you might want to use controllers commands here. 
+
+(TODO: write this part)
+
+### HTTP headers
+
+To avoid unnecessary work, rspamd allows MTA to pass pre-processed data about the message by using either HTTP headers or JSON control block (described further in this document). 
+Rspamd supports the following non-standard HTTP headers:
+
+| Header          | Description                      |
+| --------------- | -------------------------------- |                                                                                                                 
+| **Deliver-To:** | Defines actual delivery recipient of message. Can be used for personalized statistic and for user specific options.|  
+| **IP:**         | Defines IP from which this message is received.                                                                    | 
+| **Helo:**       | Defines SMTP helo.                                                                                                 |
+| **Hostname:**   | Defines resolved hostname.                                                                                         |
+| **From:**       | Defines SMTP mail from command data.                                                                               |  
+| **Queue-Id:**   | Defines SMTP queue id for message (can be used instead of message id in logging).                                  | 
+| **Rcpt:**       | Defines SMTP recipient (it may be several `Rcpt` headers).                                                         |
+| **Pass:**       | If this header has `all` value, all filters would be checked for this message.                                     |
+| **Subject:**    | Defines subject of message (is used for non-mime messages).                                                        |
+| **User:**       | Defines SMTP user. |
+| **Message-Length:**       | Defines the length of message excluding the control block. |
+
+Controller also defines certain headers:
+
+(TODO: write this part)
+
+Standard HTTP headers, such as `Content-Length`, are also supported.
+
+## Rspamd HTTP reply
+
+Rspamd reply is encoded using `json` format. Here is a typical HTTP reply:
+
+~~~json
+HTTP/1.1 200 OK
+Connection: close
+Server: rspamd/0.9.0
+Date: Mon, 30 Mar 2015 16:19:35 GMT
+Content-Length: 825
+Content-Type: application/json
+
+{
+    "default": {
+        "is_spam": false,
+        "is_skipped": false,
+        "score": 5.2,
+        "required_score": 7,
+        "action": "add header",
+        "DATE_IN_PAST": {
+            "name": "DATE_IN_PAST",
+            "score": 0.1
+        },
+        "FORGED_SENDER": {
+            "name": "FORGED_SENDER",
+            "score": 5
+        },
+        "TEST": {
+            "name": "TEST",
+            "score": 100500
+        },
+        "FUZZY_DENIED": {
+            "name": "FUZZY_DENIED",
+            "score": 0,
+            "options": [
+                "1: 1.00 / 1.00",
+                "1: 1.00 / 1.00"
+            ]
+        },
+        "HFILTER_HELO_5": {
+            "name": "HFILTER_HELO_5",
+            "score": 0.1
+        }
+    },
+    "urls": [
+        "www.example.com",
+        "another.example.com"
+    ],
+    "emails": [
+        "user@example.com"
+    ],
+    "message-id": "4E699308EFABE14EB3F18A1BB025456988527794@example"
+}
+~~~
+
+For convenience, the reply is LINTed using [jsonlint](http://jsonlint.com). The actual reply is compressed for speed.
+
+The reply can be treated as the JSON object where keys are metric names (namely `default`) and values are objects that represent metric.
+
+Each metric has the following fields:
+
+* `is_spam` - boolean value that indicates whether a message is spam
+* `is_skipped` - boolean flag that is `true` if a message has been skipped due to settings
+* `score` - floating point value representing the effective score of message
+* `required_score` - floating point value meaning the treshold value for the metric
+* `action` - recommended action for a message:
+       - `no action` - message is likely ham;
+       - `greylist` - message should be greylisted;
+       - `add header` - message is suspicious and should be marked as spam
+       - `rewrite subject` - message is suspicious and should have subject rewritten
+       - `soft reject` - message should be temporary rejected at the moment (for example, due to rate limit exhausting)
+       - `reject` - message should be rejected as spam
+
+Additionally, metric contains all symbols added during message's processing indexed by symbols' names.
+
+Moreover, some other keys might be in the reply:
+
+* `subject` - if action is `rewrite subject` then this value defines the desired subject for a message
+* `urls` - a list of urls found in a message (only hostnames)
+* `emails` - a list of emails found in a message
+* `message-id` - ID of message (useful for logging)
+* `messages` - array of optional messages added by some rspamd filters (such as `SPF`) 
+
+## Legacy RSPAMC protocol
+
+For compatibility, rspamd also supports legacy `RSPAMC` and also spamassassin `SPAMC` protocols. Thought their usage is discouraged, these protocols could be still used as last resort to communicate with rspamd from legacy applications.
+The rspamc dialog looks as following:
+
+```
+SYMBOLS RSPAMC/1.1
+Content-Length: 2200
+
+<message octets>
+```
+
+```
+RSPAMD/1.1 0 OK
+Metric: default; True; 10.40 / 10.00 / 0.00
+Symbol: R_UNDISC_RCPT
+Symbol: ONCE_RECEIVED
+Symbol: R_MISSING_CHARSET
+Urls: 
+```
+
+Rspamc protocol support different commands as well:
+
+| Command | Mean  |
+| ------- | ----- |                                                                               
+| CHECK   | Check a message and output results for each metric. But do not output symbols. |      
+| SYMBOLS | Same as //CHECK// but output symbols.                                          |    
+| PROCESS | Same as //SYMBOLS// but output also original message with inserted X-Spam headers. |  
+| PING    | Do not do any processing, just check rspamd state:                                 |
+
+
+After command there should be one mandatory header: `Content-Length` that defines message's length in bytes and optional headers (same as for HTTP).
+
+Rspamd supports spamassassin `spamc` protocol and you can even pass rspamc headers in spamc mode, but reply of rspamd in `spamc` mode is truncated to "default" metric only with no options for symbols being displayed. Rspamc reply looks as following: 
+
+```
+RSPAMD/1.1 0 OK
+Metric: default; True; 10.40 / 10.00 / 0.00
+Symbol: R_UNDISC_RCPT
+Symbol: ONCE_RECEIVED
+Symbol: R_MISSING_CHARSET
+Urls: 
+```
+ 
+\ No newline at end of file
author	Vsevolod Stakhov <vsevolod@highsecure.ru>
	Mon, 30 Mar 2015 16:48:06 +0000 (17:48 +0100)
committer	Vsevolod Stakhov <vsevolod@highsecure.ru>
	Mon, 30 Mar 2015 16:48:06 +0000 (17:48 +0100)
doc/markdown/architecture/index.md		patch \| blob \| history
doc/markdown/architecture/protocol.md	[new file with mode: 0644]	patch \| blob