Data Compression Algorithms

女生的网名这么多〃 提交于 2019-11-29 20:17:58

There are a ton of compression algorithms out there. What you need here is a lossless compression algorithm. A lossless compression algorithm compresses data such that it can be decompressed to achieve exactly what was given before compression. The opposite would be a lossy compression algorithm. Lossy compression can remove data from a file. PNG images use lossless compression while JPEG images can and often do use lossy compression.

Some of the most widely known compression algorithms include:

ZIP archives use a combination of Huffman coding and LZ77 to give fast compression and decompression times and reasonably good compression ratios.

LZ77 is pretty much a generalized form of RLE and it will often yield much better results.

Huffman allows the most repeating bytes to represent the least number of bits. Imagine a text file that looked like this:

aaaaaaaabbbbbcccdd

A typical implementation of Huffman would result in the following map:

Bits Character
   0         a
  10         b
 110         c
1110         d

So the file would be compressed to this:

00000000 10101010 10110110 11011101 11000000
                                       ^^^^^
                              Padding bits required

18 bytes go down to 5. Of course, the table must be included in the file. This algorithm works better with more data :P

Alex Allain has a nice article on the Huffman Compression Algorithm in case the Wiki doesn't suffice.

Feel free to ask for more information. This topic is pretty darn wide.

Here are some lossless algorithms (can perfectly recover the original data using these):

  • Huffman code
  • LZ78 (and LZW variation)
  • LZ77
  • Arithmetic coding
  • Sequitur
  • prediction with partial match (ppm)

Many of the well known formats like png or gif use variants or combinations of these.

On the other hand there are lossy algorithms too (compromise accuracy to compress your data, but often works pretty well). State of the art lossy techniques combine ideas from differential coding, quantization, and DCT, among others.

To learn more about data compression, I recommend https://www.elsevier.com/books/introduction-to-data-compression/sayood/978-0-12-809474-7. It is a very accessible introduction text. The 3rd edition out there in pdf online.

There are an awful lot of data compression algorithms around. If you're looking for something encyclopedic, I recommend the Handbook of Data Compression by Salomon et al, which is about as comprehensive as you're likely to get (and has good sections on the principles and practice of data compression, as well).

My best guess is that ASIC-based compression is usually implemented for a particular application, or as a specialized element of a SoC, rather than as a stand-alone compression chip. I also doubt that looking for a "latest and greatest" compression format is the way to go here -- I would expect standardization, maturity, and fitness for a particular purpose to be more important.

My paper A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems (permalink here) reviews many compression algorithms and also techniques for using them in modern processors. It reviews both research-grade and commercial-grade compression algorithms/techniques, so you may find one which has not yet been implemented in ASIC.

LZW or Lempel Ziv algorithm is a great lossless one. Pseudocode here: http://oldwww.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!