Best Compression algorithm for a sequence of integers

前端 未结 15 1631
离开以前
离开以前 2020-11-29 16:41

I have a large array with a range of integers that are mostly continuous, eg 1-100, 110-160, etc. All integers are positive. What would be the best algorithm to compress thi

15条回答
  •  猫巷女王i
    2020-11-29 17:30

    I couldn't get my compression to be much better than about .11 for this. I generated my test data via python interpreter and it's a newline delimited list of integers from 1-100, and 110-160. I use the actual program as a compressed representation of the data. My compressed file is as follows:

    main=mapM_ print [x|x<-[1..160],x`notElem`[101..109]]
    

    It's just a Haskell script that generates the the file you can run via:

    $ runhaskell generator.hs >> data
    

    The size of the g.hs file is 54 bytes, and the python generated data is 496 bytes. This gives 0.10887096774193548 as the compression ratio. I think with more time one could shrink the program, or you could compress the compressed file (i.e. the haskell file).

    One other approach might be to save 4 bytes of data. The min and max of each sequence, then give those to a generating function. Albeit, the loading of files will add more characters to the decompresser adding more complexity and more bytes to the decompresser. Again, I represented this very specific sequence via a program and it doesn't generalize, it's a compression that's specific to the data. Furthermore, adding generality makes the decompresser larger.

    Another concern is that one must have the Haskell interpreter to run this. When I compiled the program it made it much larger. I don't really know why. There is the same problem with python, so maybe the best approach is to give the ranges, so that a some program could decompress the file.

提交回复
热议问题