Compression with best ratio in Python?

冷暖自知 提交于 2019-12-10 13:05:21

问题


Which compression method in Python has the best compression ratio?

Is the commonly used zlib.compress() the best or are there some better options? I need to get the best compression ratio possible.

I am compresing strings and sending them over UDP. A typical string I compress has about 1,700,000 bytes.


回答1:


I'm sure that there might be some more obscure formats with better compression, but lzma is the best, of those that are well supported. There are some python bindings here.

EDIT

Don't pick a format without testing, some algorithms do better depending on the data set.




回答2:


If you are willing to trade performance for getter compression then the bz2 library usually gives better results than the gz (zlib) library.

There are other compression libraries like xz (LZMA2) that might give even better results but they do not appear to be in the core distribution of python.

Python Doc for BZ2 class

EDIT: Depending on the type of image you might not get much additional compression. Many image formats are previously compressed unless it is raw, bmp, or uncompressed tiff. Testing between various compression types would be highly recommended.

EDIT2: If you do decide to do image compression. Image Magick supports python bindings and many image conversion types.

Image Magick

Image Formats Supported




回答3:


The best compression algorithm definitely depends of the kind of data you are dealing with. Unless if you are working with a list of random numbers stored as a string (in which case no compression algorithm will work) knowing the kind of data usually allows to apply much better algorithms than general purpose ones (see other answers for good ready to use general compression algorithms).

If you are dealing with images you should definitely choose a lossy compression format (ie: pixel aware) preferably to any lossless one. That will give you much better results. Recompressing with a lossless format over a lossy one is a loss of time.

I would search through PIL to see what I can use. Something like converting image to jpeg with a compression ratio compatible with researched quality before sending should be very efficient.

You should also be very cautious if using UDP, it can lose some packets, and most compression format are very sensible to missing parts of file. OK. That can be managed at application level.



来源:https://stackoverflow.com/questions/4015425/compression-with-best-ratio-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!