../src/libserver/url.c: In function ‘rspamd_url_host_set_add’:
../src/libserver/url.c:3808:11: warning: variable ‘k’ set but not used [-Wunused-but-set-variable]
3808 | khiter_t k;
| ^
../src/lua/lua_task.c: In function ‘lua_task_has_urls’:
../src/lua/lua_task.c:2406:11: warning: variable ‘need_emails’ set but not used [-Wunused-but-set-variable]
2406 | gboolean need_emails = FALSE, ret = FALSE;
| ^~~~~~~~~~~
[Minor] move static keyword to beginning of function declarations
In file included from ../src/libserver/logger/logger_file.c:23:
../src/libserver/logger/logger_private.h:106:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
106 | const static struct rspamd_logger_funcs file_log_funcs = {
| ^~~~~
../src/libserver/logger/logger_private.h:130:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
130 | const static struct rspamd_logger_funcs syslog_log_funcs = {
| ^~~~~
../src/libserver/logger/logger_private.h:154:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
154 | const static struct rspamd_logger_funcs console_log_funcs = {
| ^~~~~
* [Conf] Mark Rspamd emailbl as ignore whitelist
* [Conf] RBL: Add missing emails = true option
* [Feature] Add support for scripts in fuzzy storage
* [Feature] Arc: Add whitelisted_signers_map option
* [Feature] Implement hosts file processing
* [Feature] Neural: Introduce classes bias that allows non-equal classes learning
* [Feature] Update libev to 4.33
* [Fix] Another brain damage html standard adoptions
* [Fix] Another fix for brain damaged obs-fws state
* [Fix] Fix flags that caused force_actions failure
* [Fix] Fix logging issue
* [Fix] Fix lua symbols scores registration when config does not define scores
* [Fix] Fix opaque maps logic
* [Fix] Fix parsing of the html tags with no spaces after attributes
* [Fix] Fix some corner cases in urls parsing, add limits
* [Fix] Fix tlds extraction if custom composition rules are used
* [Fix] Fix variables replacement in mempool
* [Fix] Improve base64 detection
* [Fix] Normalize dynamic scores in ANN correctly
* [Fix] Plug memory leak introduced by #3153
* [Fix] Stat_redis_backend: Fix memory leak and simplify learn path
* [Fix] Try hard to deal with ghost workers
* [Fix] metadata_exporter default formatter
* [Rework] Change the way to extract URLs when dealing with alternative parts
* [Rework] Fix various url extraction issues
* [Rework] Re cache: Load compiled hyperscan in the main process as well
* [Rework] Re cache: Load hyperscan early
* [Rework] Rework URL structure: adjust tld part
* [Rework] Rework URL structure: host field
* [Rework] Rework URL structure: more structure optimisations
* [Rework] Rework URL structure: user field
* [Rework] URL: Another update for urls extraction logic
* [Rework] Urls: Improve query urls handling
* [Rework] Urls: adopt html related stuff
* [Rework] Urls: more rework of the urls sets
* [Rework] Urls: process query urls in HTML urls correctly
* [Rework] Urls: rework urls hash structure
* [Rework] Urls: update lua libraries
* [Rework] Use multiple search tries for different url extraction types
Riccardo Alfieri [Wed, 25 Mar 2020 13:44:20 +0000 (14:44 +0100)]
Update rbl.conf
MSBL list a lot of gmail dropboxes, but these are being excluded from the checks due to gmail.com being whitelisted. Same happens for other freemail providers.
Ignoring the whitelist in this case should be safe enough.
Vsevolod Stakhov [Mon, 23 Mar 2020 14:50:24 +0000 (14:50 +0000)]
[Rework] URL: Another update for urls extraction logic
URL extraction from HTML parts should look like this:
1. Extract href links
2. Convert HTML to plain text and extract:
a) (http|https|ftp)://foo.bar and www.foo
b) email like strings \bfoo@bar.baz\b .
For all extracted strings check if we have host with a domain from the public suffix.