summaryrefslogtreecommitdiffstats
path: root/doc/markdown/tutorials/writing_rules.md
blob: 1ce6d933ee63723b3c61ddbc9ac9ad36cf60e87c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
# Writing rspamd rules

In this tutorial, I describe how to create new rules for rspamd both lua and regexp ones.

## Introduction

Rules are the essential part of spam filtering system and rspamd ships with some prepared rules. However, if you run your
own system you might want to have your own rules for better spam filtering or better false positives rate. Rules are usually
written in `lua` language, where you specify both custom logic and generic regular expressions.

## Configuration files

Since rspamd is shipped with internal rules it is a good idea to store your custom rules and configuration in some separate file
to avoid clash with the pre-built rules that might change from version to version. There are some possibilities for these purposes:

- Local rules in lua should be stored in the file named `${CONFDIR}/rspamd.local.lua` where `${CONFDIR}` is the directory where your configuration files are placed (e.g. `/etc/rspamd` or `/usr/local/etc/rspamd` for some systems)
- Local configuration that **adds** options to rspamd should be placed in `${CONFDIR}/rspamd.conf.local`
- Local configuration that **overrides** the default settings should be placed in `${CONFDIR}/rspamd.conf.override`

Lua local configuration can be used for both override and extending:

rspamd.lua:

~~~lua
config['regexp']['symbol'] = '/some_re/'
~~~

rspamd.local.lua:

~~~lua
config['regexp']['symbol1'] = '/other_re/' -- add 'symbol1' key to the table
config['regexp']['symbol'] = '/override_re/' -- replace regexp for 'symbol'
~~~

For the configuration rules you can take a look at the following examples:

rspamd.conf:

~~~nginx
var1 = "value1";

section "name" {
	var2 = "value2";
}
~~~

rspamd.conf.local:

~~~nginx
var1 = "value2";

section "name" {
	var3 = "value3";
}
~~~

Resulting config:

~~~nginx
var1 = "value1";
var2 = "value2";

section "name" {
	var2 = "value2";
}
section "name" {
	var3 = "value3";
}
~~~

Override example:

rspamd.conf:

~~~nginx
var1 = "value1";

section "name" {
	var2 = "value2";
}
~~~

rspamd.conf.override:

~~~nginx
var1 = "value2";

section "name" {
	var3 = "value3";
}
~~~

Resulting config:

~~~nginx
var1 = "value1";

# Note that var2 is removed completely
section "name" {
	var3 = "value3";
}
~~~

The conjunction of `override` and `local` configs should allow to resolve complicated issues without having like Turing complete language to distinguish cases.

## Writing rules

There are two main types of rules that are normally defined by rspamd:

- `Lua` rules: pieces of code in lua programming language to work with messages processed
- `Regexp` rules: regular expressions and combinations of regular expressions to match specific patterns

Lua rules are useful to do some complex tasks: ask DNS, query redis or HTTP, examine some task specific details. Regexp rules are useful since they are
optimized by rspamd heavily (especially when `hyperscan` is enabled) and allow to match custom patterns in headers, urls, text parts and even the whole message body.

### Rules weights

Rules weights are usually defined in the `metrics` which contain the following data:

- score triggers for different actions
- symbols scores
- symbols descriptions
- symbol group definitions:
	+ symbols in group
	+ description of groups
	+ joint group score limit

For built-in rules scores are placed in the file called `${CONFDIR}/metrics.conf`, however, you have two possibilities to define scores for your rules:

1. Define scores in `rspamd.conf.local` as following:

~~~nginx
metric "default" {
	symbol "MY_SYMBOL" {
		description = "my cool rule";
		score = 1.5;
	}
}
~~~

2. Define scores directly in lua when describing symbol:

~~~lua
-- regexp rule
config['regexp']['MY_SYMBOL'] = {
	re = '/a/M & From=/blah/',
	score = 1.5,
	description = 'my cool rule',
	group = 'my symbols'
}

-- lua rule
rspamd_config.MY_LUA_SYMBOL = {
	callback = function(task)
		-- Do something
		return true
	end
	score = -1.5,
	description = 'another cool rule',
	group = 'my symbols'
}
~~~

## Regexp rules

Regexp rules are executed by `regexp` module of rspamd and you can find the detailed description of regexp syntax in [the module documentation](../modules/regexp.md)
In this tutorial, I will give merely some performance considerations about regular expressions:

* Prefer lightweight regexps, such as header or url regexps to heavy ones, such as mime or body regexps
* If you need to match some text in the message's content, prefer `mime` regexp as they are executed on text content only
* If you **really** need to match the whole messages, then you might also consider [trie](../modules/trie.md) module as it is significantly faster
* Avoid complex regexps, avoid backtracing, avoid negative groups `(?!)`, avoid capturing patterns (replace with `(?:)`), avoid potentially empty patterns, e.g. `/^.*$/`

Following these rules allows to create fast but still efficient rules. To add regexp rules you should use `config` global table that is defined in any lua file used by rspamd:

~~~lua
config['regexp'] = {} -- Remove all regexp rules (including internal ones)
local reconf = config['regexp'] -- Create alias for regexp configs

local re1 = 'From=/foo@/H' -- Mind local here
local re2 = '/blah/P'

reconf['SYMBOL'] = {
	re = string.format('(%s) && !(%s)', re1, re2), -- use string.format to create expression
	score = 1.2,
	description = 'some description',
	
	condition = function(task) -- run this rule only if some condition is satisfied
		return true
	end,
}
~~~

## Lua rules

Lua rules are more powerful than regexp ones but they are not optimized so heavily and can cause performance issues if written incorrectly. All lua rules
accepts a special parameter called `task` which represents a message scanned.

### Return values

Each lua rule can return 0 or false that means that rule has not matched or true of the symbol should be inserted.
In fact, you can return any positive or negative number which would be multiplied by rule's score, e.g. if rule score is 
`1.2`, then when your function returns `1` then symbol will have score `1.2`, and when your function returns `2.0` then the symbol will have score `2.4`.

### Rules conditions

Like regexp rules, conditions are allowed for lua regexps, for example:

~~~lua
rspamd_config.SYMBOL = {
	callback = function(task)
		return 1
	end,
	score = 1.2,
	description = 'some description',
	
	condition = function(task) -- run this rule only if some condition is satisfied
		return true
	end,
}
~~~

### Useful task manipulations

There are number of methods in [task](../lua/task.md) objects. For example, you can get any parts in a message:

~~~lua
rspamd_config.HTML_MESSAGE = {
  callback = function(task)
    local parts = task:get_text_parts()
    
    if parts then
      for i,p in ipairs(parts) do
        if p:is_html() then
          return 1
        end
      end
    end
  
    return 0
  end,
  score = -0.1,
  description = 'HTML included in message',
}
~~~

You can get HTML information:

~~~lua
local function check_html_image(task, min, max)
  local tp = task:get_text_parts()

  for _,p in ipairs(tp) do
    if p:is_html() then
      local hc = p:get_html()
      local len = p:get_length()


      if len >= min and len < max then
        local images = hc:get_images()
        if images then
          for _,i in ipairs(images) do
            if i['embedded'] then
              return true
            end
          end
        end
      end
    end
  end
end

rspamd_config.HTML_SHORT_LINK_IMG_1 = {
  callback = function(task)
    return check_html_image(task, 0, 1024)
  end,
  score = 3.0,
  group = 'html',
  description = 'Short html part (0..1K) with a link to an image'
}
~~~

You can get message headers with full information passed:

~~~lua

rspamd_config.SUBJ_ALL_CAPS = {
  callback = function(task)
    local util = require "rspamd_util"
    local sbj = task:get_header('Subject')

    if sbj then
      local stripped_subject = subject_re:search(sbj, false, true)
      if stripped_subject and stripped_subject[1] and stripped_subject[1][2] then
        sbj = stripped_subject[1][2]
      end

      if util.is_uppercase(sbj) then
        return true
      end
    end

    return false
  end,
  score = 3.0,
  group = 'headers',
  description = 'All capital letters in subject'
}
~~~

You can also access HTTP headers, urls and other useful properties of rspamd tasks. Moreover, you can use
global convenience modules exported by rspamd, such as [rspamd_util](../lua/util.md) or [rspamd_logger](../lua/logger.md) by requiring them in your rules:

~~~lua
rspamd_config.SUBJ_ALL_CAPS = {
  callback = function(task)
    local util = require "rspamd_util"
    local logger = require "rspamd_logger"
    ...
  end,
}
~~~

## Asynchronous actions

For asynchronous actions, such as redis access or DNS checks it is recommended to use
dedicated callbacks, called symbol handlers. The difference to generic lua rules is that
dedicated callbacks are not obliged to return value but they use method `task:insert_result(symbol, weight)` to
indicate match. All lua plugins are implemented as symbol handlers. Here is a simple example of symbol handler that checks DNS:

~~~lua
rspamd_config:register_symbol('SOME_SYMBOL', 1.0,
	function(task)
		local to_resolve = 'google.com'
		local logger = require "rspamd_logger"
		
		local dns_cb = function(resolver, to_resolve, results, err)
			if results then
				logger.infox(task, '<%1> host: [%2] resolved for symbol: %3',
					task:get_message_id(), to_resolve, 'RULE')
				task:insert_result(rule['symbol'], 1)
			end
		end
		task:get_resolver():resolve_a({
			task=task, 
			name = to_resolve, 
			callback = dns_cb})
	end)
~~~

You can also set the desired score and description if you'd like:

~~~lua
rspamd_config:set_metric_symbol('SOME_SYMBOL', 1.2, 'some description')
-- Table version
if rule['score'] then
  if not rule['group'] then
    rule['group'] = 'whitelist'
  end
  rule['name'] = symbol
  rspamd_config:set_metric_symbol(rule)
end
~~~

## Difference between `config` and `rspamd_config`

It might be confusing that there are two variables with the common mean. That comes from
the history of rspamd and was used previously for a purpose. However, currently `rspamd_config` represents
the object that can do many things:

* Get configuration options:

~~~lua
rspamd_config:get_all_opts('section')
~~~

* Add maps:

~~~lua
rule['map'] = rspamd_config:add_kv_map(rule['domains'],
            "Whitelist map for " .. symbol)
~~~

* Register callbacks for symbols:

~~~lua
rspamd_config:register_symbol('SOME_SYMBOL', 1.0, some_functions)
~~~

* Register lua rules (note that `__newindex` metamethod is actually used here):

~~~lua
rspamd_config.SYMBOL = {...}
~~~

* Register composites, prefilters, postfilters and so on

On the contrary, `config` global is extremely simple: it's just a plain table of configuration options that is exactly the same
as defined in `rspamd.conf` (and `rspamd.conf.local` or `rspamd.conf.override`). However, you can also use lua tables and even functions for some
options. For example, `regexp` module also can accept `callback` argument:

~~~lua
config['regexp']['SYMBOL'] = {
  callback = function(task) ... end,
  ...
}
~~~

However, such a syntax is discouraged and is preserved mostly for compatibility reasons.

## Configuration applying order

It might be unclear, but there is a strict order of configuration options application and replacements:

1. `rspamd.conf` and `rspamd.conf.local` are processed
2. `rspamd.conf.override` is processed and it **overrides** anything parsed on the previous step
3. **Lua** rules are loaded and they can override everything from the previous steps, with the important exception of rules scores, that are **NOT** overrided if the according symbol is also defined in some `metric` section
4. **Dynamic** configuration defined by webui (normally) is loaded and it can override rules scores or action scores from the previous steps

## Rules check order

Rules in rspamd are checked in the following order:

1. **Prefilters**: checked every time and can stop all further processing by calling `task:set_pre_result()`
2. **All symbols***: can depend on each other by calling `rspamd_config:add_dependency(from, to)`
3. **Statistics**: is checked only when all symbols are checked
4. **Composites**: combine symbols to adjust the final results
5. **Post filters**: are executed even if a message is already rejected and symbols processing has been stopped