Tarfile in Python: Can I untar more efficiently by extracting only some of the data?

前端未结

关注

 2  1561

无人共我 2020-12-10 06:53

I am ordering a huge pile landsat scenes from the USGS, which come as tar.gz archives. I am writing a simple python script to unpack them. Each archive contains 15 tiff imag

2条回答

借酒劲吻你 (楼主)

2020-12-10 07:45

The problem is that a tar file does not have a central file list, but stores files sequentially with a header before each file. The tar file is then compressed via gzip to give you tar.gz. With a tar file, if you don't want to extract a certain file, you simply skip the next header->size bytes in an archive and then read the next header. If the archive is additionally compressed, you'll still have to skip that many bytes, only not within the archive file but within the decompressed data stream - which for some compression formats works, but for others requires you to decompress everything in between.

gzip belongs to the latter class of compression schemes. So while you save some time by not writing the undesired files to the disk, your code still decompresses them. You might be able to overcome that problem by overriding the _Stream class for non-gzip archives, but for your gz files, there is nothing you can do about it.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...