XSS - Which HTML Tags and Attributes can trigger Javascript Events?

前端 未结 4 639
悲&欢浪女
悲&欢浪女 2020-12-08 11:03

I\'m trying to code a secure and lightweight white-list based HTML purifier which will use DOMDocument. In order to avoid unnecessary complexity I am willing to make the fol

4条回答
  •  情书的邮戳
    2020-12-08 11:19

    Garuda has already given what I would deem as the "correct" answer, and his links are very useful, but he beat me to the punch!

    I give my answer only to reinforce.

    In this day and age of increasing features in the html and ecmascript specs, avoiding script injection and other such vulnerabilities in html becomes more and more difficult. With each new addition, a whole world of possible injections is introduced. This is coupled with the fact that different browsers probably have different ideas of how they are going to implement these specs, so you get even more possible vulnerabilities.

    Take a look at a short list of vectors introduced by html 5

    The best solution is choose what you will allow rather than what you will deny. It is much easier to say "These tags and these attributes for those given tags alone are allowed. Everything else will sanitized accordingly or thrown out."

    It would be very irresponsible for me to compile a list and say "okay, here you go: here's a list of all of the injection vectors you missed. You can sleep easy." In fact, there are probably many injection vectors that are not even known by black hats or white hats. As the ha.ckers website states, script injection is really only limited by the mind.

    I'd like to answer your specific question at least a little bit, so here are some glaring omissions from your blacklist:

    • img src attribute. I think it is important to note that src is a valid attribute on other elements and could be potentially harmful. img also dynsrc and lowsrc, maybe even more.
    • type and language attributes
    • CDATA in addition to just html comments.
    • Improperly sanitized input values. This may not be a problem depending upon how strict your html parsing is.
    • Any ambiguous special characters. In my opinion, even unambiguous ones should probably be encoded.
    • Missing or incorrect quotes on attributes (such as grave quotes).
    • Premature closing of textarea tags.
    • UTF-8 (and 7) encoded characters in scripts
    • Even though you will only return child nodes of the body tag, many browsers will still evaluate head, and html elements inside of body, and most head-only elements inside of body anyway, so this probably won't help much.
    • In addition to css expressions, background image expressions
    • frames and iframes
    • embed and probably object and applet
    • Server side includes
    • PHP tags
    • Any other injections (SQL Injection, executable injection, etc.)

    By the way, I'm sure this doesn't matter, but camelCased attributes are invalid xhtml and should be lower cased. I'm sure this doesn't affect you.

提交回复
热议问题