In C# String/Character Encoding what is the difference between GetBytes(), GetString() and Convert()?

前端 未结 1 1122
野趣味
野趣味 2020-12-31 20:12

We are having trouble getting a Unicode string to convert to a UTF-8 string to send over the wire:

// Start with our unicode string.
string unicode = \"Conve         


        
相关标签:
1条回答
  • 2020-12-31 20:57

    After a much troubled and confusing morning, we found the answer to this problem.

    The key point we were missing, which was making this very confusing, was that string types are always encoded in 16-bit (2-byte) Unicode. This means that when we do a GetString() on the bytes, they are automatically being re-encoded into Unicode behind the scenes and we are no better off than we were in the first place.

    When we started to get character errors, and double byte data at the other end, we knew something was wrong but at a glance of the code we had, we couldn't see anything wrong. After learning what we have explained above, we realised that we needed to send the byte array if we wanted to preserve the encoding. Luckily, MicrosoftFunc() had an overload which was able to take a byte array instead of a string. This meant that we could convert the unicode string to an encoding of our choice and then send it off exactly as we expect it. The code changed to:

    // Convert from a Unicode string to an array of bytes (encoded as UTF8).
    byte[] source = Encoding.UTF8.GetBytes(unicode); 
    
    // Send the encoded byte array directly! Do not send as a Unicode string.
    MicrosoftFunc(source);
    

    Summary:

    So in conclusion, from the above we can see that:

    • GetBytes() amongst other things, does an Encoding.Convert() from Unicode (because strings are always Unicode) and the specified encoding the function was called from and returns an array of encoded bytes.
    • GetString() amongst other things, does an Encoding.Convert() from the specified encoding the function was called from to Unicode (because strings are always Unicode) and returns it as a string object.
    • Convert() actually converts a byte array of one encoding to another byte array of another encoding. Obviously strings cannot be used (because strings are always Unicode).
    0 讨论(0)
提交回复
热议问题