Browse Source

[Doc] Improve regexp module documentation

tags/1.2.3
Vsevolod Stakhov 8 years ago
parent
commit
f998a65903
No account linked to committer's email address
1 changed files with 9 additions and 3 deletions
  1. 9
    3
      doc/markdown/modules/regexp.md

+ 9
- 3
doc/markdown/modules/regexp.md View File



In rspamd, regular expressions could match different parts of messages: In rspamd, regular expressions could match different parts of messages:


* Headers (should be `Header-Name=/regexp/flags`)
* Headers (should be `Header-Name=/regexp/flags`), mime headers
* Full headers string
* Textual mime parts * Textual mime parts
* Raw messages * Raw messages
* URLs * URLs
The match type is defined by special flags after the last `/` symbol: The match type is defined by special flags after the last `/` symbol:


* `H` - header regexp * `H` - header regexp
* `X` - undecoded header regexp (e.g. without quoted-printable decoding)
* `B` - MIME header regexp (applied for headers in MIME parts only)
* `R` - full headers content (applied for all headers undecoded and for the message only - **not** including MIME headers)
* `M` - raw message regexp * `M` - raw message regexp
* `P` - part regexp * `P` - part regexp
* `U` - URL regexp * `U` - URL regexp



We strongly discourage from using of raw message regexps as they are expensive and We strongly discourage from using of raw message regexps as they are expensive and
should be replaced by [trie](trie.md) rules if possible. should be replaced by [trie](trie.md) rules if possible.




* `i` - ignore case * `i` - ignore case
* `u` - use utf8 regexp * `u` - use utf8 regexp
* `m` - multiline regexp
* `x` - extended regexp
* `m` - multiline regexp - treat string as multiple lines. That is, change "^" and "$" from matching the start of the string's first line and the end of its last line to matching the start and end of each line within the string
* `x` - extended regexp - this flag tells the regular expression parser to ignore most whitespace that is neither backslashed nor within a bracketed character class. You can use this to break up your regular expression into (slightly) more readable parts. Also, the # character is treated as a metacharacter introducing a comment that runs up to the pattern's closing delimiter, or to the end of the current line if the pattern extends onto the next line.
* `s` - dotall regexp - treat string as single line. That is, change `.` to match any character whatsoever, even a newline, which normally it would not match. Used together, as `/ms`, they let the `.` match any character whatsoever, while still allowing `^` and `$` to match, respectively, just after and just before newlines within the string.
* `O` - do not optimize regexp (rspamd optimizes regexps by default) * `O` - do not optimize regexp (rspamd optimizes regexps by default)


### Internal functions ### Internal functions

Loading…
Cancel
Save