Disable XML validation when using XDocument

人走茶凉 提交于 2019-12-21 18:40:24

问题


I'm parsing an XLIFF document using the XDocument class. Does XDocument perform some validation of the content which I read into it, and if so - is there any way to disable that validation?

I'm getting some weird errors if the XLIFF isn't valid XML (I don't care that it isn't, I just want to parse it).

E.g.

'.', hexadecimal value 0x00, is an invalid character. 

I'm currently reading the file like this:

string FileLocation = @"C:\XLIFF\text.xlf";
XDocument doc = XDocument.Load(FileLocation);

Thanks.


回答1:


I had similar problem which was fixed by letting StreamReader to read the content.

// this line throws exception like yours
XDocument xd = XDocument.Load(@"C:\test.xml");

// works
XDocument xd = XDocument.Load(new System.IO.StreamReader(@"C:\test.xml"));

If that does not help, try to include proper encoding.




回答2:


If you want to strip characters from strings that are invalid for use in XML, you can use this method:

private static string RemoveXmlInvalidCharacters(string s)
{
    return Regex.Replace(
        s,
        @"[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\u10000-\u10FFFF]",
        string.Empty);
}

It removes any characters that fall outside of the set of valid character values, according to the XML standard.




回答3:


You can't parse invalid XML, because parsing requires a valid XML structure.
It might be the case that you read the file as ASCII when you should have read it as UTF-8 or UTF-16 and that leads to the problem you encountered.

Possible solution:
Read the file as UTF-8.




回答4:


XLIFF document is an XML document. Character 0x00 is not a valid XML character. Invalid XML is not an XML so you cannot read it using XML parsers.

Now well-formed is a different thing, you can use SAX parsers to read XML which is not well-formed but not Invalid XML.

Valid characters according to XML Specification:

 #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

UPDATE

Suggested solution: Pre-Process the files to remove invalid characters. Character \0 can be replaced with space unless it has a meaning (is binary) in which case it needs to come in Base64 format.



来源:https://stackoverflow.com/questions/5497572/disable-xml-validation-when-using-xdocument

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!