ColdFusion: Invalid XML Control Char (hex)

戏子无情 提交于 2020-01-14 12:53:12

问题


I'm trying to create an xml object using <cfxml>. I formatted all the data with XMLFormat(). In XML there are some invalid characters like '»'. I added this chars to the xml doctype as follow:

<!ENTITY raquo "»">

The HTML text is not very well formatted, but most of it works with my code. But in some texts there are some control chars. I'm getting the following error:

An invalid XML character (Unicode: 0x13) was found in the element content of the document.

I tried to add the unicode to the doctype and I tried this solution. Both didn't work...


回答1:


Here's valid cfscript code which cleans up our XML, there are two methods, one which clears higher international characters, and one which clears only lower ASCII character which was breaking our XML, if you find more characters, just expand filter rules.

<cfscript>    
    function cleanHighAscii(text){
        var buffer = createObject("java", "java.lang.StringBuffer").init();
        var pattern = createObject("java", "java.util.regex.Pattern").compile(javaCast( "string", "[^\x00-\x7F]" ));
        var matcher = pattern.Matcher(javaCast( "string", text));

        while(matcher.find()){
            var value = matcher.group();
            var asciiValue = asc(value);

            if ((asciiValue == 8220) OR (asciiValue == 8221))
                value = """";
            else if ((asciiValue == 8216) || (asciiValue == 8217))
                value = "'";
            else if (asciiValue == 8230)
                value = "...";
            else
                value = "&###asciiValue#;";

            matcher.AppendReplacement(buffer, javaCast( "string", value ));
        }

        matcher.AppendTail(buffer);
        return buffer.ToString();
    }

    function removeSubAscii(text){

        return rereplaceNoCase(text, "\x1A","&###26#;", "all");
    }

    function XMLSafe(text){
        text = cleanHighAscii(text);
        text = removeSubAscii(text);
        return text;
    }
</cfscript>

Other posisbilty is to user CF10 funciton encodeForXML():

https://learn.adobe.com/wiki/display/coldfusionen/EncodeForXML

Or use ESAPI which comes with CF10 directly or add ESAPI jars to your older CF from OWASP site https://www.owasp.org/index.php/ESAPI_Overview :

var esapi = createObject("java", "org.owasp.esapi.ESAPI");
var esapiEncoder = esapi.encoder();
return esapiEncoder.encodeForXML(text);



回答2:


Try using &#187; instead of ». For example, this CFML:

<cfxml variable="x"><?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc
[
    <!ENTITY raquo "&#187;">
]>
<doc>
    Hello, &raquo; !
</doc>
</cfxml>

<cfdump var="#x#">



回答3:


Pass your XML string through this method, and this will solve your problem.

It allows only valid characters to be sent in the input, if you want to replace invalids with some other character, you can modify the below method to do that

public String stripNonValidXMLCharacters(String in) {
    StringBuffer out = new StringBuffer(); // Used to hold the output.
    char current; // Used to reference the current character.

    if (in == null || ("".equals(in))) return ""; // vacancy test.
    for (int i = 0; i < in.length(); i++) {
        current = in.charAt(i);
        if ((current == 0x9) ||
            (current == 0xA) ||
            (current == 0xD) ||
            ((current >= 0x20) && (current <= 0xD7FF)) ||
            ((current >= 0xE000) && (current <= 0xFFFD)) ||
            ((current >= 0x10000) && (current <= 0x10FFFF)))
            out.append(current);
    }
    return out.toString();
}  


来源:https://stackoverflow.com/questions/13744603/coldfusion-invalid-xml-control-char-hex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!