Special characters with XDocument

后端 未结 1 1777
情歌与酒
情歌与酒 2021-01-23 09:57

I\'m trying to read a file (not a XML, but the structure is similar), but i\'m getting this Exception:

\'┴\', hexadecimal value 0x15, is an invalid character. Li         


        
相关标签:
1条回答
  • 2021-01-23 10:07

    This XML is pretty bad;

    1. You have <Segment>0000016125 in there which, while not technically illegal (it is a Text node), is just kind of odd.
    2. Your <Control> element contains invalid characters without an XML CDATA section

    You can manually normalize the XML or do it in C# via string manipulation, or RegEx, or something similar.

    In your simple example, only the <Control> element has invalid characters; therefore it is relatively simple to fix it and add a CDATA section using the string.Replace() method, to make it look like this:

    <Control><![CDATA[0003┴300000┴English(U.S.)PORTUGUESE┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf]]></Control>
    

    Then you can load the good XML into your XDocument using XDocument.Parse(string xml):

    string badXml = @"
        <temproot>
            <Codepage>UTF16</Codepage>
            <Segment>0000016125
                <Control>0003┴300000┴English(U.S.)PORTUGUESE┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf</Control>
                <Source>To blablablah the   firewall to blablablah local IP address.    </Source>
                <Target>Para blablablah a uma blablablah local específico.  </Target>
            </Segment>
        </temproot>";
    
    // assuming only <control> element has the invalid characters
    string goodXml = badXml
        .Replace("<Control>", "<Control><![CDATA[")
        .Replace("</Control>", "]]></Control>");
    
    XDocument xDoc = XDocument.Parse(goodXml);
    xDoc.Declaration = new XDeclaration("1.0", "utf-16", "yes");
    
    // do stuff with xDoc
    
    0 讨论(0)
提交回复
热议问题