[Doc] Improve regexp module documentation

author Vsevolod Stakhov <vsevolod@highsecure.ru>

Sat, 9 Apr 2016 12:28:21 +0000 (13:28 +0100)

committer Vsevolod Stakhov <vsevolod@highsecure.ru>

Sat, 9 Apr 2016 12:28:21 +0000 (13:28 +0100)
author Vsevolod Stakhov <vsevolod@highsecure.ru>
Sat, 9 Apr 2016 12:28:21 +0000 (13:28 +0100)
committer Vsevolod Stakhov <vsevolod@highsecure.ru>
Sat, 9 Apr 2016 12:28:21 +0000 (13:28 +0100)
diff --git a/doc/markdown/modules/regexp.md b/doc/markdown/modules/regexp.md

index a1a694f33278163907c819d481797bad6ccbcb13..f08079bff8c620b43b2f493aba9a25b5f6debc94 100644 (file)
--- a/doc/markdown/modules/regexp.md
+++ b/doc/markdown/modules/regexp.md
@@ -47,7 +47,8 @@ Rspamd support the following components within expressions:
  
  In rspamd, regular expressions could match different parts of messages:
  
-* Headers (should be `Header-Name=/regexp/flags`)
+* Headers (should be `Header-Name=/regexp/flags`), mime headers
+* Full headers string
  * Textual mime parts
  * Raw messages
  * URLs
@@ -55,10 +56,14 @@ In rspamd, regular expressions could match different parts of messages:
  The match type is defined by special flags after the last `/` symbol:
  
  * `H` - header regexp
+* `X` - undecoded header regexp (e.g. without quoted-printable decoding)
+* `B` - MIME header regexp (applied for headers in MIME parts only)
+* `R` - full headers content (applied for all headers undecoded and for the message only - **not** including MIME headers)
  * `M` - raw message regexp
  * `P` - part regexp
  * `U` - URL regexp
  
+
  We strongly discourage from using of raw message regexps as they are expensive and
  should be replaced by [trie](trie.md) rules if possible.
  
@@ -66,8 +71,9 @@ Each regexp also supports the following flags:
  
  * `i` - ignore case
  * `u` - use utf8 regexp
-* `m` - multiline regexp
-* `x` - extended regexp
+* `m` - multiline regexp - treat string as multiple lines. That is, change "^" and "$" from matching the start of the string's first line and the end of its last line to matching the start and end of each line within the string
+* `x` - extended regexp - this flag tells the regular expression parser to ignore most whitespace that is neither backslashed nor within a bracketed character class. You can use this to break up your regular expression into (slightly) more readable parts. Also, the # character is treated as a metacharacter introducing a comment that runs up to the pattern's closing delimiter, or to the end of the current line if the pattern extends onto the next line.
+* `s` - dotall regexp - treat string as single line. That is, change `.` to match any character whatsoever, even a newline, which normally it would not match. Used together, as `/ms`, they let the `.` match any character whatsoever, while still allowing `^` and `$` to match, respectively, just after and just before newlines within the string.
  * `O` - do not optimize regexp (rspamd optimizes regexps by default)
  
  ### Internal functions
author	Vsevolod Stakhov <vsevolod@highsecure.ru>
	Sat, 9 Apr 2016 12:28:21 +0000 (13:28 +0100)
committer	Vsevolod Stakhov <vsevolod@highsecure.ru>
	Sat, 9 Apr 2016 12:28:21 +0000 (13:28 +0100)