contain several lists identified by number. Each hash has its own weight that
allows to set up dynamic rules that add different score from different hashes.
+@chapter Rspamd modules.
+
+@section Introduction.
+
+This chapter describes modules that are shipped with rspamd. Here you can find
+details about modules configuration, principles of working, tricks to make spam
+filtering effective. First sections describe internal modules written in C:
+regexp (regular expressions), surbl (black list for URLs), fuzzy_check (checks
+for fuzzy hashes), chartable (check for character sets in messages) and emails
+(check for blacklisted email addresses in messages). Modules configuration can
+be done in lua or in config file itself.
+
+@subsection Lua configuration.
+You may use lua for setting configuration options for modules. With lua you can
+write rather complex rules that can contain not only text lines, but also some
+lua functions that would be called while processing messages. For loading lua
+configuration you should add line to rspamd.xml:
+@example
+<lua src="/usr/local/etc/rspamd/lua/my.lua">fake</lua>
+@end example
+@noindent
+It is possible to load several scripts this way. Inside lua file there would be
+defined global table with name @var{config}. This table should contain
+configuration options for modules indexed by module. This can be written this
+way:
+@example
+config['module_name'] = {}
+local mconfig = config['module_name']
+
+mconfig['option_name'] = 'option value'
+
+local a = 'aa'
+local b = 'bb'
+
+mconfig['other_option'] = string.format('%s, %s', a, b)
+@end example
+@noindent
+In this simple example we defines new element of table that is associated with
+module named 'module_name'. Then we assign to it an empty table (@code{@{@}})
+and associate local variable mconfig. Then we set some elements of this table,
+that is equialent to setting module options like that:
+@example
+option_name = option_value
+other_option = aa, bb
+@end example
+@noindent
+Also you may assign to elements of modules tables some functions. That functions
+should accept one argument - worker task object and return result specific for
+that option: number, string, boolean. This can be shown on this simple example:
+@example
+
+local function test (task)
+ if task:get_ip() == '127.0.0.1' then
+ return 1
+ else
+ return 0
+ end
+end
+
+mconfig['some_option'] = test
+@end example
+In this example we assign to module option 'some_option' a function that check
+for message's ip and return 1 if that ip is '127.0.0.1'.
+
+So using lua for configuration can help for making complex rules and for
+structuring rules - you can place options for specific modules to specific files
+and use lua function @code{dofile} for loading them (or add other @code{<lua>}
+tag to rspamd.xml).
+
+@subsection XML configuration.
+
+Options for rspamd modules can be set up from xml file too. This can be used for
+simple and/or temporary rules and should not be used for complex rules as this
+would make xml file too hard to read and edit. Thought it is surely possible but
+not recommended from points of config file understanding. Here is a simple
+example of module config options:
+@example
+<module name="module_name">
+ <option name="option_name">option_value</option>
+ <option name="other_option">aa, bb</option>
+</module>
+@end example
+@noindent
+Note that you need to encode xml entitles like @code{&} - @code{&} and so
+on. Also only utf8 encoding is allowed. In sample rspamd configuration all
+modules except regexp module are configured via xml as they have only settings
+and regexp module has rules that are sometimes rather complex.
+
+@section Regexp module.
+
+@subsection Introduction.
+Regexp module is one of the most important rspamd modules. Regexp module can
+load regular expressions and filter messages according to them. Also it is
+possible to use logical expressions of regexps to create complex rules of
+filtering. It is allowed to use logical operators:
+@itemize @bullet
+@item & - logical @strong{AND} function
+@item | - logical @strong{OR} function
+@item ! - logical @strong{NOT} function
+@end itemize
+Also it is possible to use brackets for making priorities in expressions. Regexp
+module operates with @emph{regexp items} that can be combined with logical
+operators into logical @emph{regexp expresions}. Each expression is associated
+with its symbol and if it evaluates to true with this message the symbol would
+be inserted. Note that rspamd uses internal optimization of logical expressions
+(for example if we have expression 'rule1 & rule2' rule2 would not be evaluated
+if rule1 is false) and internal regexp cache (so if rule1 and rule2 have common
+items they would be evaluated only once). So if you need speed optimization of
+your rules you should take this fact into consideration.
+
+@subsection Regular expressions.
+Rspamd uses perl compatible regular expressions. You may read about perl regular
+expression syntax here: @url{http://perldoc.perl.org/perlre.html}. In rspamd
+regular expressions must be enclosed in slashes:
+@example
+/^\\d+$/
+@end example
+@noindent
+If '/' symbol must be placed into regular expression it should be escaped:
+@example
+/^\\/\\w+$/
+@end example
+@noindent
+After last slash it is possible to place regular expression modificators:
+@multitable @columnfractions 0.1 0.9
+@headitem Modificator @tab Mean
+@item @strong{i} @tab Ignore case for this expression.
+@item @strong{m} @tab Assume this expression as multiline.
+@item @strong{s} @tab Assume @emph{.} as all characters including newline
+characters (should be used with @strong{m} flag).
+@item @strong{x} @tab Assume this expression as extended regexp.
+@item @strong{u} @tab Performs ungreedy matches.
+@item @strong{o} @tab Optimize regular expression.
+@item @strong{r} @tab Assume this expression as @emph{raw} (this is actual for
+utf8 mode of rspamd).
+@item @strong{H} @tab Search expression in message's headers.
+@item @strong{X} @tab Search expression in raw message's headers (without mime
+decoding).
+@item @strong{M} @tab Search expression in the whole message (must be used
+carefully as @strong{the whole message} would be checked with this expression).
+@item @strong{P} @tab Search expression in all text parts.
+@item @strong{U} @tab Search expression in all urls.
+@end multitable
+
+You can combine flags with each other:
+@example
+/^some text$/iP
+@end example
+@noindent
+All regexp must be with type: H, X, M, P or U as rspamd should know where to
+search for specified pattern. Header regexps (H and X) have special syntax if
+you need to check specific header, for example @emph{From} header:
+@example
+From=/^evil.*$/Hi
+@end example
+@noindent
+If header name is not specified all headers would be matched. Raw headers is
+matching is usefull for searching for mime specific headers like MIME-Version.
+The problem is that gmime that is used for mime parsing adds some headers
+implicitly, for example @emph{MIME-Version} and you should match them using raw
+headers. Also if header's value is encoded (base64 or quoted-printable encoding)
+you can search for decoded version using H modificator and for raw using X
+modificator. This is usefull for finding bad encodings types or for unnecessary
+encoding.
+
+@subsection Internal function.
+Rspamd provides several internal functions for simplifying message processing.
+You can use internal function as items in logical expressions as they like
+regular expressions return logical value (true or false). Here is list of
+internal functions with their arguments:
+@multitable @columnfractions 0.3 0.2 0.5
+@headitem Function @tab Arguments @tab Description
+@item header_exists
+@tab header name
+@tab Returns true if specified header exists.
+
+@item compare_parts_distance
+@tab number
+@tab If message has two parts (text/plain and text/html) compare how much they
+differs (html messages are compared with stripped tags). The difference is
+number in percents (0 is identically parts and 100 is totally different parts).
+So if difference is more than number this function returns true.
+
+@item compare_transfer_encoding
+@tab string
+@tab Compares header Content-Transfer-Encoding with specified string.
+
+@item content_type_compare_param
+@tab param_name, param_value
+@tab Compares specified parameter of Content-Type header with regexp or certain
+string:
+@example
+content_type_compare_param(Charset, /windows-\d+/)
+content_type_compare_param(Charset, ascii)
+@end example
+@noindent
+
+@item content_type_has_param
+@tab param_name
+@tab Returns true if content-type has specified parameter.
+
+@item content_type_is_subtype
+@tab subtype_name
+@tab Return true if content-type is of specified subtype (for example for
+text/plain subtype is 'plain').
+
+@item content_type_is_type
+@tab type_name
+@tab Return true if content-type is of specified type (for example for
+text/plain subtype is 'text'):
+@example
+content_type_is_type(text)
+content_type_is_subtype(/?.html/)
+@end example
+@noindent
+
+@item regexp_match_number
+@tab number,[regexps list]
+@tab Returns true if specified number of regexps matches for this message. This
+can be used for making rules when you do not know which regexps should match but
+if 2 of them matches the symbol shoul be inserted. For example:
+@example
+regexp_match_number(2, /^some evil text.*$/Pi, From=/^hacker.*$/H, header_exists(Subject))
+@end example
+@noindent
+
+@item has_only_html_part
+@tab nothing
+@tab Returns true when message has only HTML part
+
+@item compare_recipients_distance
+@tab number
+@tab Like compare_parts_distance calculate difference between recipients. Number
+is used as minimum percent of difference. Note that this function would check
+distance only when there are more than 5 recipients in message.
+
+@item is_recipients_sorted
+@tab nothing
+@tab Returns true if recipients list is sorted. This function would also works
+for more than 5 recipients.
+
+@item is_html_balanced
+@tab nothing
+@tab Returns true when all HTML tags in message are balanced.
+
+@item has_html_tag
+@tab tag_name
+@tab Returns true if tag 'tag_name' exists in message.
+
+@end multitable
+
+These internal functions can be easily implemented in lua but I've decided to
+make them built-in as they are widely used in our rules. In fact this list may
+be extended in future.
+
+@subsection Conclusion.
+Rspamd regexp module is powerfull tool for matching different patterns in
+messages. You may use logical expressions of regexps and internal rspamd
+functions to make rules. Rspamd is shipped with many rules for regexp module
+(most of them are taken from spamassassin rules as rspamd originally was a
+replacement of spamassassin) so you can look at them in ETCDIR/rspamd/lua/regexp
+directory. There are many built-in rules with detailed comments. Also note that
+if you add logical rule into XML file you need to escape all XML entitles (like
+@emph{&} operators). When you make complex rules from many parts do not forget
+to add brackets for parts inside expression as you would not predict order of
+checks otherwise. Rspamd regexp module has internal logical optimization and
+regexp cache, so you may use identical regexp many times - they would be matched
+only once. And in logical expression you may optimize performance by putting
+likely TRUE regexp first in @emph{OR} expression and likely FALSE expression
+first in @emph{AND} expression. A number of internal functions can simplify
+complex expressions and for making common filters. Lua functions can be added in
+rules as well (they should return boolean value).
+
+@section SURBL module.
+
+Surbl module is designed for checking urls via blacklists. You may read about
+surbls at @url{http://www.surbl.org}. Here is the sequence of operations that is
+done by surbl module:
+@enumerate 1
+@item Extract all urls in message and get domains for each url.
+@item Check to special list called '2tld' and extract 3 components for domains
+from that list and 2 components for domains that are not listed:
+@example
+http://virtual.somehost.domain.com/some_path
+-> somehost.domain.com if domain.com is in 2tld list
+-> domain.com if not in 2tld
+@end example
+@noindent
+@item Remove duplicates from domain lists
+@item For each registered surbl do dns request in form @emph{domain.surbl_name}
+@item Get result and insert symbol if that name resolves
+@item It is possible to examine bits in returned IP address and insert different
+symbol for each bit that is turned on in result.
+@end enumerate
+All DNS requests are done asynchronously so you may not bother about blocking.
+SURBL module has several configuration options:
+@itemize @bullet
+@item @emph{metric} - metric to insert symbol to.
+@item @emph{2tld} - list argument of domains for those 3 components of domain name
+would be extracted.
+@item @emph{max_urls} - maximum number of urls to check.
+@item @emph{whitelist} - map of domains for which surbl checks would not be performed.
+@item @emph{suffix} - a name of surbl. It is possible to add several suffixes:
+@example
+suffix_RAMBLER_URIBL = insecure-bl.rambler.ru
+or in xml:
+ <param name="suffix_RAMBLER_URIBL">insecure-bl.rambler.ru</param>
+@end example
+@noindent
+It is possible to add %b to symbol name for checking specific bits:
+@example
+suffix_%b_SURBL_MULTI = multi.surbl.org
+then you may define replaces for %b in symbol name for each bit in result:
+bit_2 = SC -> sc.surbl.org
+bit_4 = WS -> ws.surbl.org
+bit_8 = PH -> ph.surbl.org
+bit_16 = OB -> ob.surbl.org
+bit_32 = AB -> ab.surbl.org
+bit_64 = JP -> jp.surbl.org
+@end example
+@noindent
+So we make one DNS request and check for specific list by checking bits in
+result ip. This is described in surbl page:
+@url{http://www.surbl.org/lists.html#multi}. Note that result symbol would NOT
+contain %b as it would be replaced by bit name. Also if several bits are set
+several corresponding symbols would be added.
+@end itemize
+
+Also surbl module can use redirector - a special daemon that can check for
+redirects. It uses HTTP/1.0 for requests and accepts a url and returns resolved
+result. Redirector is shipped with rspamd but not enabled by default. You may
+enable it on stage of configuring but note that it requires many perl modules
+for its work. Rspamd redirector is described in details further. Here are surbl
+options for working with redirector:
+@itemize @bullet
+@item @emph{redirector}: adress of redirector (in format host:port)
+@item @emph{redirector_connect_timeout} (seconds): redirector connect timeout (default: 1s)
+@item @emph{redirector_read_timeout} (seconds): timeout for reading data (default: 5s)
+@item @emph{redirector_hosts_map} (map string): map that contains domains to check with redirector
+@end itemize
+
+So surbl module is an easy to use way to check message's urls and it may be used
+in every configuration as it filters rather big ammount of email spam and scam.
+
+@section SPF module.
+
+SPF module is designed to make checks of spf records of sender's domains. SPF
+records are placed in TXT DNS items for domains that have enabled spf. You may
+read about SPF at @url{http://en.wikipedia.org/wiki/Sender_Policy_Framework}.
+There are 3 results of spf check for domain:
+@itemize @bullet
+@item ALLOW - this ip is allowed to send messages for this domain
+@item FAIL - this ip is @strong{not} allowed to send messages for this domain
+@item SOFTFAIL - it is unknown whether this ip is allowed to send mail for this
+domain
+@end itemize
+SPF supports different mechanizms for checking: dns subrequests, macroses,
+includes, blacklists. Rspamd supports the most of them. Also for security
+reasons there is internal limits for DNS subrequests and inclusions recursion.
+SPF module support very small ammount of options:
+@itemize @bullet
+@item @emph{metric} (string): metric to insert symbol (default: 'default')
+@item @emph{symbol_allow} (string): symbol to insert (default: 'R_SPF_ALLOW')
+@item @emph{symbol_fail} (string): symbol to insert (default: 'R_SPF_FAIL')
+@item @emph{symbol_softfail} (string): symbol to insert (default: 'R_SPF_SOFTFAIL')
+@end itemize
+
+@section Chartable module.
+
+Chartable is a simple module that detects different charsets in a message. This
+module is aimed to protect from emails that contains symbols from different
+character sets that looks like each other. Chartable module works differently
+for raw and utf modes: in utf modes it detects different characters from unicode
+tables and in raw modes only ASCII and non-ASCII symbols. Configuration of whis
+module is very simple:
+@itemize @bullet
+@item @emph{metric} (string): metric to insert symbol (default: 'default')
+@item @emph{symbol} (string): symbol to insert (default: 'R_BAD_CHARSET')
+@item @emph{threshold} (double): value that would be used as threshold in expression
+@math{N_{charset-changes} / N_{chars}}
+(e.g. if threshold is 0.1 than charset change should occure more often than in 10 symbols),
+default: 0.1
+@end itemize
+
+@section Fuzzy check module.
+
+Fuzzy check module provides a client for rspamd fuzzy storage. Fuzzy check can
+work with a cluster of rspamd fuzzy storages and the specific storage is
+selected by value of hash of message's hash. The available configuration options
+are:
+@itemize @bullet
+@item @emph{metric} (string): metric to insert symbol (default: 'default')
+@item @emph{symbol} (string): symbol to insert (default: 'R_FUZZY')
+@item @emph{max_score} (double): maximum score to that weights of hashes would be
+normalized (default: 0 - no normalization)
+@item @emph{fuzzy_map} (string): a string that contains map in format { fuzzy_key => [
+symbol, weight ] } where fuzzy_key is number of fuzzy list. This string itself
+should be in format 1:R_FUZZY_SAMPLE1:10,2:R_FUZZY_SAMPLE2:1 etc, where first
+number is fuzzy key, second is symbol to insert and third - weight for
+normalization
+@item @emph{min_length} (integer): minimum length (in characters) for text part to be
+checked for fuzzy hash (default: 0 - no limit)
+@item @emph{whitelist} (map string): map of ip addresses that should not be checked
+with this module
+@item @emph{servers} (string): list of fuzzy servers in format
+"server1:port,server2:port" - these servers would be used for checking and
+storing fuzzy hashes
+@end itemize
+
+@section Forged recipients.
+
+Forged recipients is a lua module that compares recipients provided by smtp
+dialog and recipients from @emph{To:} header. Also it is possible to compare
+@emph{From:} header with SMTP from. So you may set @strong{symbol_rcpt} option
+to set up symbol that would be inserted when recipients differs and
+@strong{symbol_sender} when senders differs.
+
+@section Maillist.
+
+Maillist is a module that detects whether this message is send by using one of
+popular mailing list systems (among supported are ezmlm, mailman and
+subscribe.ru systems). The module has only option @strong{symbol} that defines a
+symbol that would be inserted if this message is sent via mailing list.
+
+@section Once received.
+
+This lua module checks received headers of message and insert symbol if only one
+received header is presented in message (that usually signals that this mail is
+sent directly to our MTA). Also it is possible to insert @emph{strict} symbol
+that indicates that host from which we receive this message is either
+unresolveable or has bad patterns (like 'dynamic', 'broadband' etc) that
+indicates widely used botnets. Configuration options are:
+@itemize @bullet
+@item @emph{symbol}: symbol to insert for messages with one received header.
+@item @emph{symbol_strict}: symbol to insert for messages with one received
+header and containing bad patterns or unresolveable sender.
+@item @emph{bad_host}: defines pattern that would be count as "bad".
+@item @emph{good_host}: defines pattern that would be count as "good" (no strict
+symbol would be inserted), note that "good" has a priority over "bad" pattern.
+@end itemize
+You can define several "good" and "bad" patterns for this module.
+
+@section Received rbl.
+
+Received rbl module checks for all received headers and make dns requests to IP
+black lists. This can be used for checking whether this email was transfered by
+some blacklisted gateway. Here are options available:
+@itemize @bullet
+@item @emph{symbol}: symbol to insert if message contains blacklisted received
+headers
+@item @emph{rbl}: a name of rbl to check, it is possible to define specific
+symbol for this rbl by adding symbol name after semicolon:
+@example
+rbl = pbl.spamhaus.org:RECEIVED_PBL
+@end example
+@end itemize
+
+@section Conclusion.
+
+Rspamd is shipped with some ammount of modules that provides basic functionality
+fro checking emails. You are allowed to add custom rules for regexp module and
+to set up available parameters for other modules. Also you may write your own
+modules (in C or Lua) but this would be described further in this documentation.
+You may set configuration options for modules from lua or from xml depends on
+its complexity. Internal modules are enabled and disabled by @strong{filters}
+configuration option. Lua modules are loaded and usually can be disabled by
+removing their configuration section from xml file or by removing corresponding
+line from @strong{modules} section.
+
@bye