XmlWriter encoding issues

旧时模样 提交于 2019-12-04 12:21:27

问题


I have the following code:

    MemoryStream ms = new MemoryStream();
    XmlWriter w = XmlWriter.Create(ms);

    w.WriteStartDocument(true);
    w.WriteStartElement("data");

    w.WriteElementString("child", "myvalue");

    w.WriteEndElement();//data
    w.Close();
    ms.Close();

    string test = UTF8Encoding.UTF8.GetString(ms.ToArray());

The XML is generated correctly; however, my problem is the first character of the string 'test' is ï (char #239), making it invalid to some xml parsers: where is this coming from? What exactly am I doing incorrectly?

I know I can resolve the issue by just starting after the first character, but I'd rather know why it's there than simply patching over the problem.

Thanks!


回答1:


Found one solution here: http://www.timvw.be/generating-utf-8-with-systemxmlxmlwriter/

I was missing this at the top:

XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = new UTF8Encoding(false);
MemoryStream ms = new MemoryStream();
XmlWriter w = XmlWriter.Create(ms, xmlWriterSettings);

Thanks for the help everyone!




回答2:


The problem is that your the XML generated by the writer is UTF-16 while you use UTF-8 to convert it to string. Try this instead:

StringBuilder sb = new StringBuilder();
using (StringWriter writer = new StringWriter(sb))
using (XmlWriter w = XmlWriter.Create(writer))
{
    w.WriteStartDocument(true);
    w.WriteStartElement("data");

    w.WriteElementString("child", "myvalue");

    w.WriteEndElement();//data
}

string test = sb.ToString();



回答3:


Check

  • XMLTextWriter Encoding problem
  • XmlWriter, Strings and Byte Order Marks



回答4:


You can change encodings like this:

w.Settings.Encoding = Encoding.UTF8;



回答5:


All of these are slightly off, if you care about the byte order mark which is something editors use (such as Visual Studio detecting UTF8 encoded XML and syntax highlighting properly).

Here's a solution:

MemoryStream stream = new MemoryStream();

XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.UTF8;
settings.Indent = true;
settings.IndentChars = "\t";

using (XmlWriter writer = XmlWriter.Create(stream, settings))
{
    // ... write

    // Make sure you flush or you only get half the text
    writer.Flush();

    // Use a StreamReader to get the byte order correct
    StreamReader reader = new StreamReader(stream,Encoding.UTF8,true);
    stream.Seek(0, SeekOrigin.Begin);
    result = reader.ReadToEnd();
}

I've got 2 snippets in full here



来源:https://stackoverflow.com/questions/863437/xmlwriter-encoding-issues

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!