Serialize and deserialize char(s)

核能气质少年 提交于 2019-12-12 01:48:57

问题


i have a list of chars on my class. Serialization and deserialization are works as expected. If my list contains which char is need to describe byte order mark. Example char code is 56256. So, created simple test to as this question is below.

[Test]
public void Utf8CharSerializeAndDeserializeShouldEqual()
{
    UInt16 charCode = 56256;
    char utfChar = (char)charCode;
    using (MemoryStream ms = new MemoryStream())
    {
        using (StreamWriter writer = new StreamWriter(ms, Encoding.UTF8, 1024, true))
        {
            var serializer = new JsonSerializer();
            serializer.Serialize(writer, utfChar);
        }

        ms.Position = 0;
        using (StreamReader reader = new StreamReader(ms, true))
        {
            using (JsonTextReader jsonReader = new JsonTextReader(reader))
            { 
                var serializer = new JsonSerializer();
                char deserializedChar = serializer.Deserialize<char>(jsonReader);

                Console.WriteLine($"{(int)utfChar}, {(int)deserializedChar}");
                Assert.AreEqual(utfChar, deserializedChar);
                Assert.AreEqual((int)utfChar, (int)deserializedChar);
            }
        }
    }
}

Test works as fine when char code is not needed a BOM. For example 65(A) will pass this test.


回答1:


Your problem is unrelated to Json.NET. Your problem is that U+DBC0 (decimal 56256) is an invalid unicode character, and, as explained in the documentation, the Encoding.UTF8 used by your StreamWriter will not encode such a character:

Encoding.UTF8 returns a UTF8Encoding object that uses replacement fallback to replace each string that it can't encode and each byte that it can't decode with a question mark ("?") character.

To confirm this, if you replace Encoding.UTF8 with new UTF8Encoding(true, true) in your test example, you will get the following exception:

EncoderFallbackException: Unable to translate Unicode character \uDBC0 at index 1 to specified code page. 

If you are going to try to serialize invalid Unicode char values, you're going to need to manually encode them as, e.g., a byte array using the following:

public static partial class TextExtensions
{
    static void ToBytesWithoutEncoding(char c, out byte lower, out byte upper)
    {
        var u = (uint)c;
        lower = unchecked((byte)u);
        upper = unchecked((byte)(u >> 8));
    }

    public static byte[] ToByteArrayWithoutEncoding(this char c)
    {
        byte lower, upper;
        ToBytesWithoutEncoding(c, out lower, out upper);
        return new byte[] { lower, upper };
    }

    public static byte[] ToByteArrayWithoutEncoding(this ICollection<char> list)
    {
        if (list == null)
            return null;
        var bytes = new byte[checked(list.Count * 2)];
        int to = 0;
        foreach (var c in list)
        {
            ToBytesWithoutEncoding(c, out bytes[to], out bytes[to + 1]);
            to += 2;
        }
        return bytes;
    }

    public static char ToCharWithoutEncoding(this byte[] bytes)
    {
        return bytes.ToCharWithoutEncoding(0);
    }

    public static char ToCharWithoutEncoding(this byte[] bytes, int position)
    {
        if (bytes == null)
            return default(char);
        char c = default(char);
        if (position < bytes.Length)
            c += (char)bytes[position];
        if (position + 1 < bytes.Length)
            c += (char)((uint)bytes[position + 1] << 8);
        return c;
    }

    public static List<char> ToCharListWithoutEncoding(this byte[] bytes)
    {
        if (bytes == null)
            return null;
        var chars = new List<char>(bytes.Length / 2 + bytes.Length % 2);
        for (int from = 0; from < bytes.Length; from += 2)
        {
            chars.Add(bytes.ToCharWithoutEncoding(from));
        }
        return chars;
    }
}

Then modify your test method as follows:

    public void Utf8JsonCharSerializeAndDeserializeShouldEqualFixed()
    {
        Utf8JsonCharSerializeAndDeserializeShouldEqualFixed((char)56256);
    }

    public void Utf8JsonCharSerializeAndDeserializeShouldEqualFixed(char utfChar)
    {
        byte[] data;

        using (MemoryStream ms = new MemoryStream())
        {
            using (StreamWriter writer = new StreamWriter(ms, new UTF8Encoding(true, true), 1024))
            {
                var serializer = new JsonSerializer();
                serializer.Serialize(writer, utfChar.ToByteArrayWithoutEncoding());
            }
            data = ms.ToArray();
        }

        using (MemoryStream ms = new MemoryStream(data))
        {
            using (StreamReader reader = new StreamReader(ms, true))
            {
                using (JsonTextReader jsonReader = new JsonTextReader(reader))
                {
                    var serializer = new JsonSerializer();
                    char deserializedChar = serializer.Deserialize<byte[]>(jsonReader).ToCharWithoutEncoding();

                    //Console.WriteLine(string.Format("{0}, {1}", utfChar, deserializedChar));
                    Assert.AreEqual(utfChar, deserializedChar);
                    Assert.AreEqual((int)utfChar, (int)deserializedChar);
                }
            }
        }
    }

Or, if you have a List<char> property in some container class, you can create the following converter:

public class CharListConverter : JsonConverter
{
    public override bool CanConvert(Type objectType)
    {
        return objectType == typeof(List<char>);
    }

    public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
    {
        if (reader.TokenType == JsonToken.Null)
            return null;
        var bytes = serializer.Deserialize<byte[]>(reader);
        return bytes.ToCharListWithoutEncoding();
    }

    public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
    {
        var list = (ICollection<char>)value;
        var bytes = list.ToByteArrayWithoutEncoding();
        serializer.Serialize(writer, bytes);
    }
}

And apply it as follows:

public class RootObject
{
    [JsonConverter(typeof(CharListConverter))]
    public List<char> Characters { get; set; }
}

In both cases Json.NET will encode the byte array as Base64.



来源:https://stackoverflow.com/questions/43436536/serialize-and-deserialize-chars

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!