Finding the minimum length RLE

后端 未结 4 1655
感动是毒
感动是毒 2021-02-04 17:05

The classical RLE algorithm compresses data by using numbers to represent how many times the character following a number appears in the text at that position. For example:

4条回答
  •  半阙折子戏
    2021-02-04 17:45

    A very common way to encode RLE compressed data is to designate a special byte as the "DLE" (sorry, I don't remember what that term stands for), which means "the next is a count followed by a byte".

    This way, only repeating sequences needs to be encoded. Typically the DLE symbol is chosen to minimize the chance of it occuring naturally in the uncompressed data.

    For your original example, let's set the full stop (or dot) as the DLE, this would encode your example as follows:

    AAABBAAABBCECE => 3A2B3A2B1C1E1C1E <-- your encoding
    AAABBAAABBCECE => .3ABB.3ABBCECE   <-- my encoding
    

    You would only encode a sequence if it actually ends up as saving space. If you limit the length of sequences to 255, so that the count fits in a byte, a sequence thus takes 3 bytes, the DLE, the count, and the byte to repeat. You would probably not encode 3-byte sequences either, because decoding those carries slightly more overhead than a non-encoded sequence.

    In your trivial example, the saving is nonexistant, but if you try to compress a bitmap containing a screenshot of a mostly white program, like Notepad, or a browser, then you'll see real space savings.

    If you should encounter the DLE character naturally, just emit a count of 0, since we know we would never encode a 0-length sequence, the DLE followed by a 0-byte means that you decode it as a single DLE byte.

提交回复
热议问题