Really simple short string compression

后端 未结 9 1323
感动是毒
感动是毒 2020-12-01 16:21

Is there a really simple compression technique for strings up to about 255 characters in length (yes, I\'m compressing URLs)?

I am not concerned with the strength o

9条回答
  •  孤街浪徒
    2020-12-01 16:38

    As suggested in the accepted answer, Using data compression does not work to shorten URL paths that are already fairly short.

    DotNetZip has a DeflateStream class that exposes a static (Shared in VB) CompressString method. It's a one-line way to compress a string using DEFLATE (RFC 1951). The DEFLATE implementation is fully compatible with System.IO.Compression.DeflateStream, but DotNetZip compresses better. Here's how you might use it:

    string[] orig = {
        "folder1/folder2/page1.aspx",
        "folderBB/folderAA/page2.aspx",
    };
    public void Run()
    {
        foreach (string s in orig)
        {
            System.Console.WriteLine("original    : {0}", s);
            byte[] compressed = DeflateStream.CompressString(s);
            System.Console.WriteLine("compressed  : {0}", ByteArrayToHexString(compressed));
            string uncompressed = DeflateStream.UncompressString(compressed);
            System.Console.WriteLine("uncompressed: {0}\n", uncompressed);
        }
    }
    

    Using that code, here are my test results:

    original    : folder1/folder2/page1.aspx
    compressed  : 4bcbcf49492d32d44f03d346fa0589e9a9867a89c5051500
    uncompressed: folder1/folder2/page1.aspx
    
    original    : folderBB/folderAA/page2.aspx
    compressed  : 4bcbcf49492d7272d24f03331c1df50b12d3538df4128b0b2a00
    uncompressed: folderBB/folderAA/page2.aspx
    

    So you can see the "compressed" byte array, when represented in hex, is longer than the original, about 2x as long. The reason is that a hex byte is actually 2 ASCII chars.

    You could compensate somewhat for that by using base-62, instead of base-16 (hex) to represent the number. In that case a-z and A-Z are also digits, giving you 0-9 (10) + a-z (+26) + A-Z (+26) = 62 total digits. That would shorten the output significantly. I haven't tried that. yet.


    EDIT
    Ok I tested the Base-62 encoder. It shortens the hex string by about half. I figured it would cut it to 25% (62/16 =~ 4) But I think I am losing something with the discretization. In my tests, the resulting base-62 encoded string is about the same length as the original URL. So, no, using compression and then base-62 encoding is still not a good approach. you really want a hash value.

提交回复
热议问题