发表新帖

发表新帖

Compression formats with good support for random access within archives?

前端未结

关注

 13  2361

滥情空心 2020-11-27 11:45

This is similar to a previous question, but the answers there don\'t satisfy my needs and my question is slightly different:

I currently use gzip compression for som

13条回答

时光取名叫无心 (楼主)

2020-11-27 12:08

I am the author of an open-source tool for compressing a particular type of biological data. This tool, called starch, splits the data by chromosome and uses those divisions as indices for fast access to compressed data units within the larger archive.

Per-chromosome data are transformed to remove redundancy in genomic coordinates, and the transformed data are compressed with either bzip2 or gzip algorithms. The offsets, metadata and compressed genomic data are concatenated into one file.

Source code is available from our GitHub site. We have compiled it under Linux and Mac OS X.

For your case, you could store (10 MB, or whatever) offsets in a header to a custom archive format. You parse the header, retrieve the offsets, and incrementally fseek through the file by current_offset_sum + header_size.

0 讨论(0)

查看其它13个回答
发布评论:

提交评论
- 加载中...

热议问题