documentation/advanced/advanced-security.asciidoc


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

---
title: Common Security Issues
order: 8
layout: page
---

[[advanced.security]]
= Common Security Issues

[[advanced.security.sanitizing]]
== Sanitizing User Input to Prevent Cross-Site Scripting

You can put raw HTML content in many components, such as the [classname]#Label#
and [classname]#CustomLayout#, as well as in tooltips and notifications. In such
cases, you should make sure that if the content has any possibility to come from
user input, you must make sure that the content is safe before displaying it.
Otherwise, a malicious user can easily make a
link:http://en.wikipedia.org/wiki/Cross-site_scripting[cross-site scripting
attack] by injecting offensive JavaScript code in such components. See other
sources for more information about cross-site scripting.

Offensive code can easily be injected with [literal]#++<script>++# markup or in
tag attributes as events, such as [parameter]#onLoad#.

// TODO Consider an example, Alice, Bob, etc.

Cross-site scripting vulnerabilities are browser dependent, depending on the
situations in which different browsers execute scripting markup.

Therefore, if the content created by one user is shown to other users, the
content must be sanitized. There is no generic way to sanitize user input, as
different applications can allow different kinds of input. Pruning (X)HTML tags
out is somewhat simple, but some applications may need to allow (X)HTML content.
It is therefore the responsibility of the application to sanitize the input.

Character encoding can make sanitization more difficult, as offensive tags can
be encoded so that they are not recognized by a sanitizer. This can be done, for
example, with HTML character entities and with variable-width encodings such as
UTF-8 or various CJK encodings, by abusing multiple representations of a
character. Most trivially, you could input [literal]#++<++# and [literal]#++>++#
with [literal]#++&lt;++# and [literal]#++&gt;++#, respectively. The input could
also be malformed and the sanitizer must be able to interpret it exactly as the
browser would, and different browsers can interpret malformed HTML and
variable-width character encodings differently.

Notice that the problem applies also to user input from a
[classname]#RichTextArea# is transmitted as HTML from the browser to server-side
and is not sanitized. As the entire purpose of the [classname]#RichTextArea#
component is to allow input of formatted text, you can not just remove all HTML
tags. Also many attributes, such as [parameter]#style#, should pass through the
sanitization.