From 5050901ef5b8f742be5218d033e1d8c00499d966 Mon Sep 17 00:00:00 2001 From: Larry Hynes Date: Fri, 13 May 2016 23:57:48 +0100 Subject: [PATCH] [Doc] Update migrate_sa.md --- doc/markdown/tutorials/migrate_sa.md | 59 ++++++++++++++-------------- 1 file changed, 29 insertions(+), 30 deletions(-) diff --git a/doc/markdown/tutorials/migrate_sa.md b/doc/markdown/tutorials/migrate_sa.md index a85ff10ee..88b1d844f 100644 --- a/doc/markdown/tutorials/migrate_sa.md +++ b/doc/markdown/tutorials/migrate_sa.md @@ -1,54 +1,54 @@ -# Migrating from Spamassassin to Rspamd +# Migrating from SpamAssassin to rspamd -This guide provides information for those who wants to migrate their existing system from [spamassassin](https://spamassassin.apache.org) to rspamd. Here you will find information about major differences between these two spam filtering engines and how to deal with the transition process. +This guide provides information for those who wants to migrate an existing system from [SpamAssassin](https://spamassassin.apache.org) to rspamd. You will find information about major differences between the spam filtering engines and how to deal with the transition process. -## Why migrate to Rspamd +## Why migrate to rspamd -The first question you need to ask yourself is 'what is **wrong** with my spamassassin installation?' To my sense and according to many users' reports, rspamd runs **significantly faster** than spamassassin providing almost the same quality of filtering. However, if you don't care about performance and resources consumption of your spam filtering engine you might still find rspamd useful because it has a simple but yet powerful web management system (WebUI). +rspamd runs **significantly faster** than SpamAssassin while providing approximately the same quality of filtering. However, if you don't care about the performance and resource consumption of your spam filtering engine you might still find rspamd useful because it has a simple but powerful web management system (WebUI). -On the contrary, if you have a lot of custom rules, or if you use Pyzor/Razor/DCC, or you have some commercial 3rd party products that are SpamAssassin only then there are no clear reasons of migrating indeed. +On the other hand, if you have a lot of custom rules, or you use Pyzor/Razor/DCC, or you have some commercial 3rd party products that depend on SpamAssassin then you may not want to migrate. -In brief, Rspamd is for **speed**! +In short: rspamd is for **speed**! ## What about dspam/spamoracle...? -You could also move from these projects to Rspamd. However, you should bear in mind that Rspamd and SA are multi-factor spam filtering systems that uses 3 main approaches to filter Spam messages: +You could also move from these projects to rspamd. You should bear in mind, however, that rspamd and SA are multi-factor spam filtering systems that use three main approaches to filter messages: -* Content filtering - static rules that are designed to find some known `bad patterns` in messages (usually regexp or other custom rules) +* Content filtering - static rules that are designed to find known bad patterns in messages (usually regexp or other custom rules) * Dynamic lists - DNS or reputation lists that are used to filter known bad content, such as abused IP addresses or URL domains -* Statistical filters - are learned dynamically to distinguish spam and ham messages +* Statistical filters - which learn to distinguish spam and ham messages -`dspam`, `spamoracle` and many others usually implement the third approach providing merely statistical filtering. This method is quite powerful but it might cause many false-positives and is not very well suitable for more than one user. Rspamd and SA, in contrast, are designed for systems with many users. Rspamd in particular was written for a very large system with more than 40 millions of users and about 10 millions of emails per hour. +`dspam`, `spamoracle` and others usually implement the third approach, only providing statistical filtering. This method is quite powerful but it can cause false-positives and is not very suitable for multi-user environments. rspamd and SA, in contrast, are designed for systems with many users. rspamd, in particular, was written for a very large system with more than 40 million users and about 10 million emails per hour. ## Before you start There are a couple of things you need to know before transition: -1. Rspamd does not support Spamassassin statistics so you'd need to **train** your filter from the scratch with spam and ham samples (or install the [pre-built statistics](https://rspamd.com/rspamd_statistics/)). Rspamd uses different statistical engine called [OSB-Bayes](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf) which is intended to be more precise than SA 'naive' bayes classifier -2. Rspamd uses `Lua` for plugins and rules, so basic knowledge of this language is more than useful for playing with rspamd, however, Lua is very simple and can be learnt [very quickly](http://lua-users.org/wiki/LuaTutorial) -3. Rspamd uses `HTTP` protocol for communicating with MTA or milter, so SA native milters might fail to communicate with rspamd. There is some limited support of SpamAssassin protocol, thought some commands are not supported, in particular those which require copying of data batween scanner and milter. What's more important is that `Length`-less messages are not supported by Rspamd as they completely break HTTP semantics, so it won't be supported ever. For achieving the same functionality, a dedicated scanner could use, e.g. HTTP `chunked` encoding. -4. Rspamd is **NOT** intended to work with blocking libraries or services, hence, something like `mysql` or `postgresql` won't likely be supported as well -5. Rspamd is developping quickly, therefore you should be aware that there might be still some incompatible changes between major versions - they are usually listed in the [migration](../migration.md) section of the site. -6. Unlike SA where there are only `spam` and `ham` results, Rspamd supports 4 levels of messages called `actions`: +1. rspamd does not support SpamAssassin statistics so you'd need to **train** your filter from scratch with spam and ham samples (or install the [pre-built statistics](https://rspamd.com/rspamd_statistics/)). rspamd uses a different statistical engine - called [OSB-Bayes](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf) - which is intended to be more precise than SA's 'naive' Bayes classifier +2. rspamd uses `Lua` for plugins and rules, so basic knowledge of this language is more than useful for playing with rspamd; however, Lua is very simple and can be learned [very quickly](http://lua-users.org/wiki/LuaTutorial) +3. rspamd uses the `HTTP` protocol to communicate with the MTA or milter, so SA native milters might not communicate with rspamd. There is some limited support of the SpamAssassin protocol, though some commands are not supported, in particular those which require copying of data between scanner and milter. More importantly, `Length`-less messages are not supported by rspamd as they completely break HTTP semantics and will never be supported. To achieve the same functionality, a dedicated scanner could use, e.g. HTTP `chunked` encoding. +4. rspamd is **NOT** intended to work with blocking libraries or services, hence, something like `mysql` or `postgresql` will likely not be supported +5. rspamd is developing quickly so you should be aware that there might be some incompatible changes between major versions - they are usually listed in the [migration](../migration.md) section of the site. +6. Unlike SA where there are only `spam` and `ham` results, Rspamd supports five levels of messages called `actions`: + `no action` - ham message + `greylist` - turn on adaptive greylisting (which is also used on higher levels) + `add header` - adds Spam header (meaning soft-spam action) + `rewrite subject` - rewrite subject to `*** SPAM *** original subject` + `reject` - ultimately reject message -Each action can have its own score limit which could also be modified by users settings. Rspamd assumes the following order of actions scores: `no action` <= `greylist` <= `add header` <= `rewrite subject` <= `reject`. +Each action can have its own score limit which could also be modified by a user's settings. rspamd assumes the following order of actions: `no action` <= `greylist` <= `add header` <= `rewrite subject` <= `reject`. -Actions are **NOT** performed by rspamd itself - they are just recommendations for MTA agent, for example, rmilter that performs the necessary actions, such as adding headers or rejecting mail. +Actions are **NOT** performed by rspamd itself - they are just recommendations for the MTA agent, rmilter for example, that performs the necessary actions such as adding headers or rejecting mail. -SA `spam` is almost equal to rspamd `add header` action in the default setup. With this action, users will be able to observe messages in their `Junk` folder which is usually a desired behaviour. +SA `spam` is almost equal to the rspamd `add header` action in the default setup. With this action, users will be able to check messages in their `Junk` folder, which is usually a desired behaviour. -## The first steps in Rspamd +## First steps with rspamd -To install rspamd, I'd recommend using of the [official packages](https://rspamd.com/downloads.html) that are available for many popular platforms. If you'd like to have more features then you should consider `experimental` branch of packages, whilst if you'd like to have more stability then you could select the `stable` branch. However, normally even `experimental` branch is stable enough for the production usage, and the bugs are fixed more quickly in the `experimental` branch. +To install rspamd, I recommend using one of the [official packages](https://rspamd.com/downloads.html) that are available for many popular platforms. If you'd like to have more features then you can consider the `experimental` branch of packages, while if you would like to have more stability then you can select the `stable` branch. However, normally even the `experimental` branch is stable enough for production use, and bugs are fixed more quickly in the `experimental` branch. -## General spamassassin rules +## General SpamAssassin rules -For those who has a lot of custom rules, there is good news: rspamd supports a certain set of SpamAssassin rules via special [plugin](../modules/spamassassin.md) that allows **direct** loading of SA rules into rspamd. You just need to specify all your configuration files in the plugin configuration: +For those who have a lot of custom rules, there is good news: rspamd supports a certain set of SpamAssassin rules via a special [plugin](../modules/spamassassin.md) that allows **direct** loading of SA rules into rspamd. You just need to specify your SA configuration files in the plugin configuration: ~~~ucl spamassassin { @@ -57,27 +57,26 @@ spamassassin { } ~~~ -On the other hand, if you don't have many custom rules and use primarily the default ruleset then you shouldn't use this plugin: many rules of SA are already implemented in rspamd natively so you won't get any benefit from including such rules from SA. +On the other hand, if you don't have a lot of custom rules and primarily use the default ruleset then you shouldn't use this plugin: many SA rules are already implemented natively in rspamd so you won't get any benefit from including such rules from SA. ## Integration -If you have your SA up and running it is usually possible to switch the system to rspamd using the existing tools. -However, please check the [integration document](https://rspamd.com/doc/integration.html) for furhter details. +If you have your SA up and running it is usually possible to switch the system to rspamd using the existing tools. However, please check the [integration document](https://rspamd.com/doc/integration.html) for further details. ## Statistics -Rspamd statistics is not compatible with SA as it uses more advanced statistics algorithms described in the following [article](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf). Statistics setup might be tricky, therefore, there are a couple of examples in [the statistics description](../configuration/statistics.md). However, please bear in mind that you need to **relearn** your statistics with messages. This can be done, for example, by using `rspamc` command assuming that you have your messages as a separate files (e.g. `Maildir` format) placed in directories `spam` and `ham`: +rspamd statistics are not compatible with SA as rspamd uses a more advanced statistics algorithm, described in the following [article](http://osbf-lua.luaforge.net/papers/trec2006_osbf_lua.pdf), so please bear in mind that you need to **relearn** your statistics. This can be done, for example, by using the `rspamc` command: assuming that you have your messages in separate files (e.g. `maildir` format), placed in directories `spam` and `ham`: rspamc learn_spam spam/ rspamd learn_ham ham/ -You need rspamd up and running for using of this commands. +(You will need rspamd up and running to use these commands.) ### Learning using mail interface -You can also setup rspamc to learn via passing messages to a certain email address. I'd recommend to use `/etc/aliases` for these purposes and `mail-redirect` command (e.g. provided by [Mail Redirect addon](https://addons.mozilla.org/en-GB/thunderbird/addon/mailredirect/) for `thunderbird` MUA). The desired aliases could be the following: +You can also setup rspamc to learn via passing messages to a certain email address. I'd recommend using `/etc/aliases` for this purpose and a `mail-redirect` command (e.g. provided by [Mail Redirect addon](https://addons.mozilla.org/en-GB/thunderbird/addon/mailredirect/) for `thunderbird` MUA). The desired aliases could be the following: learn-spam123: "| rspamc learn_spam" learn-ham123: "| rspamc learn_ham" -You'd need some less predictable aliases to avoid sending messages to such addresses by some adversary or just by a mistake to prevent statistics pollution. +(You would need to use less predictable aliases to avoid the sending of messages to such addresses by an adversary, or just by mistake, to prevent statistics pollution.) -- 2.39.5