发表新帖

发表新帖

Compress numpy arrays efficiently

前端未结

关注

 6  1034

名媛妹妹 2020-12-03 13:30

I tried various methods to do data compression when saving to disk some numpy arrays.

These 1D arrays contain sampled data at a certain sampling rate (c

6条回答

天涯浪人 (楼主)

2020-12-03 14:14

First, for general data sets, the shuffle=True argument to create_dataset improves compression dramatically with roughly continuous datasets. It very cleverly rearranges the bits to be compressed so that (for continuous data) the bits change slowly, which means they can be compressed better. It slows the compression down a very little bit in my experience, but can substantially improve the compression ratios in my experience. It is not lossy, so you really do get the same data out as you put in.

If you don't care about the accuracy so much, you can also use the scaleoffset argument to limit the number of bits stored. Be careful, though, because this is not what it might sound like. In particular, it is an absolute precision, rather than a relative precision. For example, if you pass scaleoffset=8, but your data points are less then 1e-8 you'll just get zeros. Of course, if you've scaled the data to max out around 1, and don't think you can hear differences smaller than a part in a million, you can pass scaleoffset=6 and get great compression without much work.

But for audio specifically, I expect that you are right in wanting to use FLAC, because its developers have put in huge amounts of thought, balancing compression with preservation of distinguishable details. You can convert to WAV with scipy, and thence to FLAC.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题