Compressing and decompressing a string yields only the first letter of the original string?

情到浓时终转凉″ 提交于 2019-12-12 02:16:24

问题


I'm compressing a string with Gzip using this code:

public static String Compress(String decompressed)
    {
        byte[] data = Encoding.Unicode.GetBytes(decompressed);
        using (var input = new MemoryStream(data))
        using (var output = new MemoryStream())
        {
            using (var gzip = new GZipStream(output, CompressionMode.Compress, true))
            {
                input.CopyTo(gzip);
            }
            return Convert.ToBase64String(output.ToArray());
        }
    }

and decompressing it with this code:

    public static String Decompress(String compressed)
    {
        byte[] data = Convert.FromBase64String(compressed);
        using (MemoryStream input = new MemoryStream(data))
        using (GZipStream gzip = new GZipStream(input, CompressionMode.Decompress))
        using (MemoryStream output = new MemoryStream())
        {
            gzip.CopyTo(output);
            StringBuilder sb = new StringBuilder();
            foreach (byte b in output.ToArray())
                sb.Append((char)b);
            return sb.ToString();
        }
    }

When I use these functions in this sample code, the result is only the letter S:

String test = "SELECT * FROM foods f WHERE f.name = 'chicken';";
String com = Compress(test);
String decom = Decompress(com);
Console.WriteLine(decom);

If I debug the code, I see that the value of decom is

S\0E\0L\0E\0C\0T\0 \0*\0 \0F\0R\0O\0M\0 \0f\0o\0o\0d\0s\0 \0f\0 \0W\0H\0E\0R\0E\0 \0f\0.\0n\0a\0m\0e\0 \0=\0 \0'\0c\0h\0i\0c\0k\0e\0n\0'\0;\0

but the value displayed is only the letter S.


回答1:


These lines are the problem:

foreach (byte b in output.ToArray())
    sb.Append((char)b);

You are interpreting each byte as its own character, when in fact that is not the case. Instead, you need the line:

string decoded = Encoding.Unicode.GetString(output.ToArray());

which will convert the byte array to a string, based on the encoding.

The basic problem is that you are converting to a byte array based on an encoding, but then ignoring that encoding when you retrieve the bytes. As well, you may want to use Encoding.UTF8 instead of Encoding.Unicode (though that shouldn't matter, as long as the encodings match up.)




回答2:


In your compress method replace Unicode with UTF8:

byte[] data = Encoding.UTF8.GetBytes(decompressed);


来源:https://stackoverflow.com/questions/11195577/compressing-and-decompressing-a-string-yields-only-the-first-letter-of-the-origi

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!