I tried various methods to do data compression when saving to disk some numpy arrays.
These 1D arrays contain sampled data at a certain sampling rate (c
First, for general data sets, the shuffle=True argument to create_dataset improves compression dramatically with roughly continuous datasets. It very cleverly rearranges the bits to be compressed so that (for continuous data) the bits change slowly, which means they can be compressed better. It slows the compression down a very little bit in my experience, but can substantially improve the compression ratios in my experience. It is not lossy, so you really do get the same data out as you put in.
If you don't care about the accuracy so much, you can also use the scaleoffset argument to limit the number of bits stored. Be careful, though, because this is not what it might sound like. In particular, it is an absolute precision, rather than a relative precision. For example, if you pass scaleoffset=8, but your data points are less then 1e-8 you'll just get zeros. Of course, if you've scaled the data to max out around 1, and don't think you can hear differences smaller than a part in a million, you can pass scaleoffset=6 and get great compression without much work.
But for audio specifically, I expect that you are right in wanting to use FLAC, because its developers have put in huge amounts of thought, balancing compression with preservation of distinguishable details. You can convert to WAV with scipy, and thence to FLAC.