Does this set of regular expressions FULLY protect against cross site scripting?

前端未结

关注

 11  812

旧巷少年郎

What\'s an example of something dangerous that would not be caught by the code below?

EDIT: After some of the comments I added another line, commented below. See V

相关标签:

11条回答

迷失自我

2020-12-13 12:19

Whitespace makes you vulnerable. Read this.

0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2020-12-13 12:20

Another vote for whitelisting. But it looks like you're going about this the wrong way. The way I do it, is to parse the HTML into a tag tree. If the tag you're parsing is in the whitelist, give it a tree node, and parse on. Same goes for its attributes.

Dropped attributes are just dropped. Everything else is HTML-escaped literal content.

And the bonus of this route is because you're effectively regenerating all the markup, it's all completely valid markup! (I hate it when people leave comments and they screw up the validation/design.)

Re "I can't whitelist" (para): Blacklisting is a maintenance-heavy approach. You'll have to keep an eye on new exploits and make sure your covered. It's a miserable existence. Just do it right once and you'll never need to touch it again.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2020-12-13 12:22
```
<a href="javascript:document.writeln('on' + 'unload' + ' and more malicious stuff here...');">example</a>
```
Any time you can write a string to the document, a big door swings open.

There are myriad places to inject malicious things into HTML/JavaScript. For this reason, Facebook didn't initially allow JavaScript in their applications platform. Their solution was to later implement a markup/script compiler that allows them to seriously filter out the bad stuff.

As said already, whitelist a few tags and attributes and strip out everything else. Don't blacklist a few known malicious attributes and allow everything else.
0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2020-12-13 12:22
As an example of an attack that makes it through this:
```
  <div style="color: expression('alert(4)')">
```
Shameless plug: The Caja project defines whitelists of HTML elements and attributes so that it can control how and when scripts in HTML get executed.

See the project at http://code.google.com/p/google-caja/ and the whitelists are the JSON files in http://code.google.com/p/google-caja/source/browse/#svn/trunk/src/com/google/caja/lang/html and http://code.google.com/p/google-caja/source/browse/#svn/trunk/src/com/google/caja/lang/css
0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-12-13 12:24

As David shows, there's no easy way to protect with just some regexes you can always forget something, like javascript: in your case. You better escape the HTML entities on output. There is a lot of discussion about the best way to do this, depending on what you actually need to allow, but what's certain is that your function is not enough.

Jeff has talked a bit about this here.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2