How to use C# to sanitize input on an html page?

巧了我就是萌 提交于 2019-12-17 04:21:08

问题


Is there a library or acceptable method for sanitizing the input to an html page?

In this case I have a form with just a name, phone number, and email address.

Code must be C#.

For example:

"<script src='bobs.js'>John Doe</script>" should become "John Doe"


回答1:


We are using the HtmlSanitizer .Net library, which:

  • Is open-source (MIT) - GitHub link
  • Is fully customizable, e.g configure which elements should be removed. see wiki
  • Is actively maintained
  • Doesn't have the problems like Microsoft Anti-XSS library,
  • Is unit tested with the OWASP XSS Filter Evasion Cheat Sheet
  • Is special built for this (in contrast to HTML Agility Pack, which is a parser - not a sanitizer)
  • Doesn't use regular expressions (HTML isn't a regular language!)

Also on NuGet




回答2:


Based on the comment you made to this answer, you might find some useful info in this question:
https://stackoverflow.com/questions/72394/what-should-a-developer-know-before-building-a-public-web-site

Here's a parameterized query example. Instead of this:

string sql = "UPDATE UserRecord SET FirstName='" + txtFirstName.Text + "' WHERE UserID=" + UserID;

Do this:

SqlCommand cmd = new SqlCommand("UPDATE UserRecord SET FirstName= @FirstName WHERE UserID= @UserID");
cmd.Parameters.Add("@FirstName", SqlDbType.VarChar, 50).Value = txtFirstName.Text;
cmd.Parameters.Add("@UserID", SqlDbType.Integer).Value = UserID;

Edit: Since there was no injection, I removed the portion of the answer dealing with that. I left the basic parameterized query example, since that may still be useful to anyone else reading the question.
--Joel




回答3:


If by sanitize you mean REMOVE the tags entirely, the RegEx example referenced by Bryant is the type of solution you want.

If you just want to ensure that the code DOESN'T mess with your design and render to the user. You can use the HttpUtility.HtmlEncode method to prevent against that!




回答4:


What about using Microsoft Anti-Cross Site Scripting Library?




回答5:


It sounds like you have users that submit content but you cannot fully trust them, and yet you still want to render the content they provide as super safe HTML. Here are three techniques: HTML encode everything, HTML encode and/or remove just the evil parts, or use a DSL that compiles to HTML you are comfortable with.

  1. Should it become "John Doe"? I would HTML encode that string and let the user, "John Doe" (if indeed that is his real name...), have the stupid looking name <script src='bobs.js'>John Doe</script>. He shouldn't have wrapped his name in script tags or any tags in the first place. This is the approach I use in all cases unless there is a really good business case for one of the other techniques.

  2. Accept HTML from the user and then sanitize it (on output) using a whitelist approach like the sanitization method @Bryant mentioned. Getting this right is (extremely) hard, and I defer pulling that off to greater minds. Note that some sanitizers will HTML encode evil where others would have removed the offending bits completely.

  3. Another approach is to use a DSL that "compiles" to HTML. Make sure to whitehat your DSL compiler because some (like MarkdownSharp) will allow arbitrary HTML like <script> tags and evil attributes through unencoded (which by the way is perfectly reasonable but may not be what you need or expect). If that is the case you will need to use technique #2 and sanitize what your compiler outputs.

Closing thoughts:

  • If there is not a strong business case for technique #2 or #3 then reduce risk and save yourself effort and the use of the worries, go with technique #1.
  • Don't assume your safe because you used a DSL. For example: the original implementation of Markdown allows HTML through, unencoded. "For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags."
  • Encode when you output. You can also encode input but doing so can put you in a bind. If you encoded incorrectly and saved that, how will you get the original input back so that you can re-encode after fixing faulty encoder?



回答6:


You are looking for RegEx class and for pattern like this <(.|\n)*?>.

You can find a lot of examles on google.



来源:https://stackoverflow.com/questions/188870/how-to-use-c-sharp-to-sanitize-input-on-an-html-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!