问题
I have wrote the following simple test:
[Test]
public void TestUTF8()
{
var c = "abc☰def";
var b = Encoding.UTF8.GetBytes(c);
Assert.That(b.Length, Is.EqualTo(9));
//Assuming, you are reading a byte stream and got partial result with the first 5 bytes
var p = Encoding.UTF8.GetChars(b, 0, 5);
Trace.WriteLine(new string(p));
Assert.That(p.Length, Is.EqualTo(3));
}
The Trace outputs abc� and the last assert fails because p.Length is 4.
However, I wanted Trace outputs abc and the last assert passes, since in reality I know the stream will have valid chars and when it is not the case for the last few bytes, just leave them there waiting for more data to come.
So how can I achieve this in C#?
回答1:
Encoding.GetChars isn't really designed for bytes coming from a stream where some state needs to be kept track of during the decoding process because a single character might span multiple buffer segments. To do that work you should use a Decoder obtained from Encoding.GetDecoder. However, Decoder.Convert is really low-level allowing you control over both the input and output buffers and somewhat difficult to use. Decoder.GetChars is somewhat easier to use and does the important work of storing state between calls. We can easily expand on Peter Duniho's answer for arbitrary buffer size:
public static void Main(string[] args)
{
var c = "abc☰def";
var b = Encoding.UTF8.GetBytes(c);
var result = DecodeFromStream(new MemoryStream(b), Encoding.UTF8, 3);
Console.WriteLine(result);
Console.WriteLine(c == result);
}
private static string DecodeFromStream(Stream dataStream, Encoding encoding, int bufferSize)
{
Decoder decoder = encoding.GetDecoder();
StringBuilder sb = new StringBuilder();
int inputByteCount;
byte[] inputBuffer = new byte[bufferSize];
char[] charBuffer = new char[encoding.GetMaxCharCount(inputBuffer.Length)];
while ((inputByteCount = dataStream.Read(inputBuffer, 0, inputBuffer.Length)) > 0)
{
int readChars = decoder.GetChars(inputBuffer, 0, inputByteCount, charBuffer, 0);
if (readChars > 0)
sb.Append(charBuffer, 0, readChars);
}
return sb.ToString();
}
来源:https://stackoverflow.com/questions/26900642/c-sharp-partial-utf-8-byte-stream-conversion