How safe is client-side HTML Sanitization?

问题

I have been looking at Pagedown.js lately for the allure of using mark-down on my pages instead of ugly readonly textareas.

I am extremely cautious though as it seems easy enough to dupe the sanitized converter. I have seen some discussion around Angular.js and it's html bindings and also heard something when Knockout.js 3.0 came out that there had been a previous unsafeness to the html binding.

It would seem all someone would need to do to disable the sanitizer in Pagedown.js for instance is something like -

var safeConverter = new Markdown.Converter();
// safeConverter is open to script injection

safeConverter = Markdown.getSanitizingConverter();
// safeConverter is now safe

// Override the getSanitizingConverter pseudo-code
Markdown.getSanitizingConverter = function () {
    return Markdown.Converter;
};

and they could open a site up to script injection. Is that not true?

Edit

Then why would libraries like that package a sanitizer to use client-side? Sure they say don't render unsanitized html but the next line says use Markdown.Sanitizer..

How is Angular not open to it with the sanitizer service or is that just a farce as well?

回答1:

I believe there is a little misunderstanding about the purpose and nature of such "sanitizers".

The purpose of a sanitizer (e.g. Angular's ngSanitize) is not to prevent "bad" data from being sent to the server-side. It is rather the other way around: A sanitizer is there to protect the non-malicious user from malicious data (being either a result of a security hole on the server side (yeah, no setup is perfect) or being fetched from other sources (ones that you are not in control of)).

Of course, being a client-side feature, a sanitizer could be bypassed, but (since the sanitizer is there to protect the user (not the server)) bypassing it would only leave the bypasser unprotected (which you can't do anything about, nor shouldn't you care - it's their choice).

Furthermore, sanitizers (can) have another (potentially more important) role: A sanitizer is a tool that helps the developer to better organize their code in a way that it is more easily testable for certain kinds of vulnerabilities (e.g. XSS attacks) and even helps in the actual code auditing for such kind of security holes.

In my opinion, the Angular docs summarize the concept pretty neatly:

Strict Contextual Escaping (SCE) is a mode in which AngularJS requires bindings in certain contexts to result in a value that is marked as safe to use for that context.
[...]
SCE assists in writing code in way that (a) is secure by default and (b) makes auditing for security vulnerabilities such as XSS, clickjacking, etc. a lot easier.

[...]
In a more realistic example, one may be rendering user comments, blog articles, etc. via bindings. (HTML is just one example of a context where rendering user controlled input creates security vulnerabilities.)

For the case of HTML, you might use a library, either on the client side, or on the server side, to sanitize unsafe HTML before binding to the value and rendering it in the document.

How would you ensure that every place that used these types of bindings was bound to a value that was sanitized by your library (or returned as safe for rendering by your server?) How can you ensure that you didn't accidentally delete the line that sanitized the value, or renamed some properties/fields and forgot to update the binding to the sanitized value?

To be secure by default, you want to ensure that any such bindings are disallowed unless you can determine that something explicitly says it's safe to use a value for binding in that context. You can then audit your code (a simple grep would do) to ensure that this is only done for those values that you can easily tell are safe - because they were received from your server, sanitized by your library, etc. You can organize your codebase to help with this - perhaps allowing only the files in a specific directory to do this. Ensuring that the internal API exposed by that code doesn't markup arbitrary values as safe then becomes a more manageable task.

_{Note 1: Emphasis is mine.
Note 2: Sorry for the lengthy quote, but I consider this to be a very improtant (as much as sensitive) matter and one that is too often misunderstood.}

回答2:

You can't sanitize client side. Anyone could always turn off JavaScript or change the values and submit them to your server.

The server needs to double check that. Client side in my mind is really just a usability feature. So they know as they type whether they are wrong or right.

In response to the edit:

Most of them are convenience or provide functionality. For example unobtrusive validation will make my text boxes red if they put in an invalid number. That is nice, and I didn't need to post and check and then change the text box border blablablabla...

But when they do post I still need to validate their data. I usually keep that in a core library which could be shared across applications (web, web service, mobile app, thick client etc)

The usability validation varies platform to platform. But at the core, before any application trusts the data, it has to check it within its trust barrier. No one can change the value behind my if on the server, but anyone can change the value in their browser without tripping my JS.

回答3:

Pagedown can run on the server as well as the client.

For sanitizing html on the client, it makes more sense to sanitize on output rather than input. You wouldn't sanitize before sending data to a server, but you might sanitize after recieving data from a server.

Imagine making a web-service call on the client and obtaining data from a third-party service. It could be passed through a sanitizer on the client before being rendered. The user could disable the sanitization on their own computer, but they're only hurting themselves.

It's also useful outside of security reasons just to prevent user input accidentally modifying the formatting of the surrounding page. Such as when typing a html post with a real-time preview (like on StackOverflow).

回答4:

Not at all safe.

Client side sanitation/validation should be used for few reasons:

easier and faster way to tell the non-malicious user what he did wrong
decrease the number of times non-malicious user communicate with your server (in case of errors)

Everything that you validate can be changed because the client is not controlled by you. Things like dev console, fiddler, wireshark allows you to manipulate the data in basically any way you want.

So only server is responsible for real sanitation/validation.

来源：https://stackoverflow.com/questions/23839173/how-safe-is-client-side-html-sanitization

标签

javascript

html

angularjs

knockout.js

javascript-injection