问题

I'm trying to mitigate XSS. How can I shield from this:

j&#X41vascript:alert('test2')

in the href of a link?

I've tried the following, but it just assigns the literal, unresolved value of that above string as a relative path of the href, not a proper javascript: href capable of triggering code execution. I'm wondering how an attacker might be able to exploit this.

I've tried the following:

a = document.createElement('a');

and then both this:

a.href = 'j&#X41vascript:alert('test2')';

and this:

a.setAttribute('href', "j&#X41vascript:alert('test2')");

But both return "j&#X41vascript:alert('test2')" upon then querying a.href, not the desired (or undesired, depending on your perspective) javascript:alert('test2');

If I can get all the entities to resolve, then I can parse out all occurrences of javascript: in the resulting string, and be safe -- right?

The other thing I was thinking was that what if someone does j&#&#X58;1;vascript:steal_cookie();. I mean, theoretically, they could have infinite levels of recursion, and it would all ultimately resolve, right?

Edit: how does this code look?

function resolve_entities(str) {
  var s = document.createElement('span')
    , nestTally = str.match(/&/) ? 0 : 1
    , limit = 5
    , limitReached = false;

  s.innerHTML = str;
  while (s.textContent.match(/&/)) {
    s.innerHTML = s.textContent;
    if(nestTally++ >= limit) {
      limitReached = true;
      break;
    }
  }

  return s.textContent;
}

回答1:

XML/HTML character entities like A or & are decoded when the string containing them is parsed as XML or HTML. Typically, this happens when they are sent from the server to the browser as part of an HTML page, although there are other situations (such as assigning to element.innerHTML in JavaScript) which can cause a string to be parsed as XML or HTML.

Reading or writing to element attributes in JavaScript does not trigger XML/HTML parsing, and thus does not expand character entities. If you write

a.href = "j&#x41;vascript:alert('test')";

then the href attribute of that a element will be jAvascript:alert('test'), ampersands and all.

What's important to note is that, whenever a string is parsed as XML or HTML, character entities are decoded exactly once. Thus, &x41; becomes a, while &#x41; becomes A. It will not "all ultimately resolve", unless you're doing something silly like reading from .textContent and assigning to .innerHTML repeatedly.

Once the parsing is complete, it's completely irrelevant whether any character sequences in the output might or might not look like XML/HTML character entities — that is, unless you then take the output and feed it through an XML/HTML parser again. (Doing that is very rarely useful, and usually only happens due to bugs such as assigning to .innerHTML when one should have assigned to .textContent.)

Anyway, looking at the comments, you say you're writing some client-side JavaScript code that's getting some untrusted data from a server you don't control, and you're worried that simply assigning the data to .innerHTML could allow XSS attacks. If so, there are two cases:

The data you receive is meant to be plain text. In that case, you should just assign it to .textContent and be done with it.
The data you receive is, in fact, meant to be HTML. In that case you do need to undertake the difficult and laborious job of sanitizing it. This JavaScript HTML sanitizer from the Caja project might help.

回答2:

The best way to mitigate XSS is to encode ALL untrusted output rendered to the screen using the appropriate encoding method method for the context the output will be in (HTML, HTML Attribute, CSS, JS, etc).

Even if you manage to solve this problem, there are likely other attack vectors using encoding that you have not thought of. Blacklist filter is rarely (if ever) the most effective way to protect your site.

~~I'm not sure what server-side language you are using, but there are likely encoding libs for it. ESAPI is available for several languages and was built for this purpose (plus MANY others).~~

UPDATE: Since you need to use JavaScript for this, you may want to look at the ESAPI Encoding Project (Reform). It has a JS version that looks like it will do what you need. I have not tested it, but if it works anything like ESAPI, then it may solve your problem.

To learn more about proper encoding per context, checkout the OWASP XSS Prevention Cheat Sheet

回答3:

As long as the content is well-formed, you can use XML to parse it safely. Something like this, at least as a starting point (fiddle):

function getXmlDoc(s) {
    var parser;
    if(DOMParser){
        parser = new DOMParser();
        xmlDoc = parser.parseFromString(s, "text/xml");
    } else {
        // IE
        xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
        xmlDoc.async = false;
        xmlDoc.loadXML(s); 
    }
    return xmlDoc;
}

var xml = getXmlDoc("<root>j&#x0061;vascript:alert('test2')</root>");
alert(xml.documentElement.firstChild.nodeValue);

However, I would probably just escape unsafe characters:

function safeEscape(s) {
    return s.replace(/[\&\<\>]/g, function($0) {
        switch($0) {
            case '&': return '&amp;';
            case '<': return '&lt;';
            case '>': return '&gt;';
        }
    });
}

You shouldn't run into any problems with recursively escaped characters, as that's not allowed.

来源：https://stackoverflow.com/questions/12331195/how-can-i-programmatically-get-all-of-a-strings-unicode-entities-to-resolve-the

标签

javascript