How do I XmlDocument.Save() to encoding=“us-ascii” with numeric character entities instead of question marks?

前提是你 提交于 2019-12-01 10:06:46

问题


My goal is to get a binary buffer (MemoryStream.ToArray() would yield byte[] in this case) of XML without losing the Unicode characters. I would expect the XML serializer to use numeric character references to represent anything that would be invalid in ASCII. So far, I have:

using System;
using System.IO;
using System.Text;
using System.Xml;

class Program
{
    static void Main(string[] args)
    {
        var doc = new XmlDocument();
        doc.LoadXml("<x>“∞π”</x>");
        using (var buf = new MemoryStream())
        {
            using (var writer = new StreamWriter(buf, Encoding.ASCII))
                doc.Save(writer);
            Console.Write(Encoding.ASCII.GetString(buf.ToArray()));
        }
    }
}

The above program produces the following output:

$ ./ConsoleApplication2.exe
<?xml version="1.0" encoding="us-ascii"?>
<x>????</x>

I figured out how to tell XmlDocument.Save() to use encoding="us-ascii"—by handing it a TextStream with TextStream.Encoding set to Encoding.ASCII. The documentation says The encoding on the TextWriter determines the encoding that is written out. But how can I tell it that I want it to use numeric character entities instead of its default lossy behavior? I have tested that doc.Save(Console.OpenStandardOutput()) writes the expected data (without an XML declaration) as UTF-8 with all of the correct characters, so I know that doc contains the information I wish to serialize. It’s just a matter of figuring out the right way to tell the XML serializer that I want encoding="us-ascii" with character entities…

I understand that it may be non-trivial to write XML documents that are both encoding="us-ascii" and supportive of constructs like <π/> (I think this one might only be doable with external document type definitions. Yes, I have tried just for fun.). But I thought it was quite common to output entities for non-ASCII characters in an ASCII XML document to support preservation of content and attribute value character data in Unicode-unfriendly environments. I thought that numeric character references representing Unicode characters was analogous to using base64 to protect a blob while keeping the content more readable. How do I do this with .NET?


回答1:


You can use XmlWriter instead:

  var doc = new XmlDocument();
    doc.LoadXml("<x>“∞π”</x>");
    using (var buf = new MemoryStream())
    {
        using (var writer =  XmlWriter.Create(buf, 
              new XmlWriterSettings{Encoding= Encoding.ASCII}))
        {
            doc.Save(writer);
        }
        Console.Write(Encoding.ASCII.GetString(buf.ToArray()));
    }

Outputs:

<?xml version="1.0" encoding="us-ascii"?><x>&#x201C;&#x221E;&#x3C0;&#x201D;</x> 


来源:https://stackoverflow.com/questions/22394441/how-do-i-xmldocument-save-to-encoding-us-ascii-with-numeric-character-entiti

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!