summaryrefslogtreecommitdiffstats
path: root/src/html.c
Commit message (Collapse)AuthorAgeFilesLines
* Another debian license fix.Vsevolod Stakhov2012-09-101-1/+1
| | | | | Add apache license for regexp that were delivered from SpamAssassin project. Fix debian/copyright for src/dns.c.
* Update copyright (required by debian).Vsevolod Stakhov2012-09-041-3/+3
|
* * Rework thread pools locking logic to avoid global lua mutex usage.Vsevolod Stakhov2012-08-221-4/+3
| | | | | | Fixed several memory leaks with modern glib. Fixed memory leak in dkim code. Fixed a problem with static global variables in shared libraries.
* Do not try to detect tld urls inside HTML texts as it generates too much ↵Vsevolod Stakhov2012-08-211-1/+1
| | | | | | false positive matches. Add some prototypes for lua.
* * Add support of strict_domains.Vsevolod Stakhov2012-05-291-1/+1
| | | | | Several fixes in dkim code. Make initial support of body relaxed canonization.
* Use DB_HASH access method for bdb backend.Vsevolod Stakhov2012-03-011-1/+1
| | | | Fix signed and unsigned comparasion while I'm here.
* Fix signness in arithmetic operations.Vsevolod Stakhov2011-08-041-2/+2
|
* * First commit to implement multi-statfile filter system with new learning ↵Vsevolod Stakhov2011-07-121-25/+2
| | | | mechanizm (untested yet)
* * Make fuzzy hashes utf8 compatible.Vsevolod Stakhov2011-07-121-1/+1
|
* Fix phishing detection with img flag.Vsevolod Stakhov2011-07-111-16/+32
| | | | | | Handle unclosed HTML tags properly. Remove warnings for types on 32 bit archs. Do not touch grow factor many times when one shot mode is turned on.
* * Fixes to fuzzy hashing logic, skip urls while estimating fuzzy hashVsevolod Stakhov2011-06-231-7/+5
| | | | | Fix tags stripping. Fix phishing checks (ignore img tags).
* * Fix phishing detector to find phished urls with tags insideVsevolod Stakhov2011-04-191-4/+27
|
* * Add ability to extract urls from subject field0.3.10Vsevolod Stakhov2011-03-231-1/+7
| | | | | | Fix phishing plugin. * Important fix for multimap/cdb handling * Important fix for phishing detector
* Fix error with parsing phishing urls.Vsevolod Stakhov2011-03-171-1/+1
|
* Fix phishing check for special cases like http://host.com and ↵Vsevolod Stakhov2011-03-141-5/+37
| | | | http://www.host.com
* Small fix.Vsevolod Stakhov2011-03-141-1/+1
|
* Make phishing checks working.Vsevolod Stakhov2011-03-051-6/+7
|
* Try to fix memory issues.Vsevolod Stakhov2011-03-021-4/+12
|
* Fix stupid bug in url parser.Vsevolod Stakhov2011-02-251-1/+1
|
* * Rewrite URL storage systemVsevolod Stakhov2011-02-241-1/+3
|
* Fix error with tags like <? xml ?>Vsevolod Stakhov2011-01-251-1/+3
|
* * Many fixes to fuzzy hashes logic and tokenization.Vsevolod Stakhov2011-01-241-1/+7
|
* Detect mailto: inside <a> and <img> tags.Vsevolod Stakhov2010-12-011-1/+2
|
* Make own strlcpy that does not calculate remaining string length (faster and ↵Vsevolod Stakhov2010-11-161-1/+1
| | | | | | more safe) Allow only ASCII symbols in logs, escape control chars
* * Add ability to obtain phished url from luaVsevolod Stakhov2010-11-121-0/+1
| | | | * Add ability to specify check domains for phishing check with 'domains' option
* Urgent fixes.Vsevolod Stakhov2010-11-031-1/+1
|
* * Add phishing detector (now just compares <a href> with tag's data).Vsevolod Stakhov2010-11-021-19/+55
|
* Fixes types (use glib ones) no functional change.Vsevolod Stakhov2010-10-061-22/+22
| | | | | Now all comments in commit logs beginning with '*' would be included in changelog, so important changes would be separated from small ones.
* * Make improvements to HTML entites decoder: now it replaces entities with ↵Vsevolod Stakhov2010-07-161-263/+283
| | | | | | | common characters and remove unknown entities. This behaviour is more like of standart HTML to text conversion * Add -d option to force debug output
* * Fix compatibility issuesVsevolod Stakhov2010-06-231-2/+25
|
* * Introduce new logging system:Vsevolod Stakhov2009-12-221-3/+3
| | | | | | | | | - independent and customizeable buffering - line buffering - errors handling support - custom (ip based) debug - append function name automaticaly (based on __FUNCTION__) - add some logic to logs system
* * Retab, no functional changesVsevolod Stakhov2009-10-021-512/+508
|
* * Small fixes in task constructionVsevolod Stakhov2009-09-231-1/+0
|
* * Add decoding entities as it is specified in w3c recommendationsVsevolod Stakhov2009-09-161-21/+336
|
* * Decode all html entitles in html partsVsevolod Stakhov2009-09-161-5/+16
|
* * Fix html decoding when '/' are encoded tooVsevolod Stakhov2009-08-281-2/+7
|
* * Strip urls from space charactersVsevolod Stakhov2009-08-051-1/+0
|
* * Strip url line from spacesVsevolod Stakhov2009-07-301-0/+1
|
* * Fix entitles decoding for hex and oct charactersVsevolod Stakhov2009-07-281-4/+21
|
* * Use g_ascii_isalnum for more strict decodingVsevolod Stakhov2009-07-201-3/+4
| | | | | * Keep undecoded entitles undecoded * Fix log message
* * Decode html entitles in urls while extracting urls values from html tagsVsevolod Stakhov2009-07-201-0/+46
| | | | NOTE: works only for ascii symbols
* * Handle <?xml> tags correctlyVsevolod Stakhov2009-07-061-1/+1
|
* * Check return value from evdns_resolveVsevolod Stakhov2009-07-031-1/+1
| | | | * Do not parse html parts double time while extracting urls, just parse tags attributes
* * Add hack to disallow misformed urlsVsevolod Stakhov2009-07-031-1/+1
|
* * If tag attribute value empty do not assume it as urlVsevolod Stakhov2009-07-031-0/+4
|
* * Fix html urls processingVsevolod Stakhov2009-07-031-7/+34
|
* * Add autolearn config optionsVsevolod Stakhov2009-07-031-1/+1
| | | | | * Fix parsing of invalid urls in html parser * Add ability to specify symbols in view parameter as comma-separated list
* * Extract url encoded urls from html textsVsevolod Stakhov2009-07-031-2/+59
|
* * Fix issue with <?xml> tagVsevolod Stakhov2009-05-211-1/+1
|
* * Remove unused debugVsevolod Stakhov2009-05-191-6/+0
|