.NET HTML whitelisting (anti-xss/Cross Site Scripting)

倖福魔咒の 提交于 2019-11-28 01:12:12

问题


I've got the common situation where I've got user input that uses a subset of HTML (input with tinyMCE). I need to have some server-side protection against XSS attacks and am looking for a well-tested tool that people are using to do this. On the PHP side I'm seeing lots of libraries like HTMLPurifier that do the job, but I can't seem to find anything in .NET.

I'm basically looking for a library to filter down to a whitelist of tags, attributes on those tags, and does the right thing with "difficult" attributes like a:href and img:src

I've seen Jeff Atwood's post at http://refactormycode.com/codes/333-sanitize-html, but I don't know how up-to-date it is. Does it have any bearing at all to what the site is currently using? And in any case I'm not sure I'm comfortable with that strategy of trying to regexp out valid input.

This blog post lays out what seems to be a much more compelling strategy:

http://blog.bvsoftware.com/post/2009/01/08/How-to-filter-Html-Input-to-Prevent-Cross-Site-Scripting-but-Still-Allow-Design.aspx

This method is to actually parse the HTML into a DOM, validate that, then rebuild valid HTML from it. If the HTML parsing can handle malformed HTML sensibly, then great. If not, no big deal -- I can demand well-formed HTML since the users should be using the tinyMCE editor. In either case I'm rewriting what I know is safe, well-formed HTML.

The problem is that's just a description, without a link to any library that actually executes that algorithm.

Does such a library exist? If not, what would be a good .NET HTML parsing engine? And what regular expressions should be used to perform extra validation a:href, img:src? Am I missing something else important here?

I don't want re-implement a buggy wheel here. Surely there's some commonly used libraries out there. Any ideas?


回答1:


Well if you want to parse, and you're worried about invalid (x)HTML coming in then the HTML Agility Pack is probably the best thing to use for parsing. Remember though it's not just elements, but also attributes on allowed elements you need to allow (of course you should work to an allowed whitelist of elements and their attributes, rather than try to strip things that might be dodgy via a blacklist)

There's also the OWASP AntiSamy Project which is an ongoing work in progress - they also have a test site you can try to XSS

Regex for this is probably too risky IMO.




回答2:


Microsoft has an open-source library to protect against XSS: AntiXSS.




回答3:


We are using the HtmlSanitizer .Net library, which:

  • is open-source
  • is actively maintained
  • doesn't have the problems like Microsoft Anti-XSS library,
  • Is unit tested with the OWASP XSS Filter Evasion Cheat Sheet
  • is special built for this (in contrast to HTML Agility Pack, which is a parser)

Also on NuGet




回答4:


http://www.microsoft.com/en-us/download/details.aspx?id=28589 You can download a version here, but I linked it for the useful DOCX file. My preferred method is to use the NuGet package manager to get the latest AntiXSS package.

You can use the HtmlSanitizationLibrary assembly found in the 4.x AntiXss library. Note that GetSafeHtml() is in the HtmlSanitizationLibrary, under Microsoft.Security.Application.Sanitizer.




回答5:


I had the exact same problem a few years back when I was using TinyMCE.

There still doesn't seem to be any decent XSS / HTML white-listing solutions for .Net so I've uploaded a solution I created and have been using for a few years.

http://www.codeproject.com/KB/aspnet/html-white-listing.aspx

The white list defnintion is based on TinyMCE's valid-elements.

Take Two: Looking around, Microsoft have recently released a white-list based Anti-XSS Library (V3.0), check that out:

The Microsoft Anti-Cross Site Scripting Library V3.0 (Anti-XSS V3.0) is an encoding library designed to help developers protect their ASP.NET web-based applications from XSS attacks. It differs from most encoding libraries in that it uses the white-listing technique -- sometimes referred to as the principle of inclusions -- to provide protection against XSS attacks. This approach works by first defining a valid or allowable set of characters, and encodes anything outside this set (invalid characters or potential attacks). The white-listing approach provides several advantages over other encoding schemes. New features in this version of the Microsoft Anti-Cross Site Scripting Library include: - An expanded white list that supports more languages - Performance improvements - Performance data sheets (in the online help) - Support for Shift_JIS encoding for mobile browsers - A sample application - Security Runtime Engine (SRE) HTTP module




回答6:


https://github.com/Vereyon/HtmlRuleSanitizer exactly solves this problem.

I had this challenge when integrating the wysihtml5 editor in an ASP.NET MVC application. I noted that it had a very nice yet simple white list based sanitizer which used rules to allow a subset of HTML to pass through. I implemented a server side version of it which depends on the HtmlAgility pack for parsing.

Microsoft Web Protection Library (former AntiXSS) seems to simply rip out almost all HTML tags and from what I read you cannot easily tailor the rules to the HTML subset you want to use. So that was not an option for me.

This HTML sanitizer also looks very promising and would be my second choice.



来源:https://stackoverflow.com/questions/1224049/net-html-whitelisting-anti-xss-cross-site-scripting

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!