Is it possible to extract single file from tar bundle in python

喜欢而已 提交于 2020-02-25 04:31:05

问题


I need to fetch a couple of files from a huge svn repo. Whole repo takes almost an hour to be fetched. Files I am looking for are part of tar bundle.

Is it possible to fetch only those two files from tar bundle without extracting the whole bundle through Python Code?

If so, can anybody let me know how should I go about it?


回答1:


Here is one way to get a tar file from svn and extract one file from it all:

import tarfile
from subprocess import check_output
# Capture the tar file from subversion
tmp='/home/me/tempfile.tar'
open(tmp, 'wb').write(check_output(["svn", "cat", "svn://url/some.tar"]))
# Extract the file we want, saving to current directory
tarfile.open(tmp).extract('dir1/fname.ext', path='dir2')

where 'dir1/fname.ext' is the full path to the file that you want within the tar archive. It will be saved in 'dir2/dir1/fname.ext'. If you omit the path argument, it will be saved in 'dir1/fname.ext' under the current directory.

The above can be understood as follows. On a normal shell command line, svn cat url tells subversion to send the file defined by url to stdout (see svn help cat for more info). url can be any type of url that svn understands such as svn://..., svn+ssh://..., or file://.... We run this command under python control using the subprocess module. To do this the svn cat url command is broken up into a list: ["svn", "cat", "url"]. The output from this svn command is saved to a local file defined by the tmp variable. We then use the tarfile module to extract the file you want.

Alternatively, you could use the extractfile method to capture the file data to a python variable:

handle = t.extractfile('dir1/fname.ext')
print handle.readlines() # show file contents

According to the documentation, tarfile should accept a subprocess's stdout as a file handle. This would simplify the code and eliminate the need to save the tar file locally. However, due to a bug, Issue 10436, that will not work.




回答2:


Perhaps you want something like this?

#!/usr/local/cpython-3.3/bin/python

import tarfile as tarfile_mod

def main():
    tarfile = tarfile_mod.TarFile('tar-archive.tar', 'r')
    if False:
        file_ = tarfile.extractfile('etc/protocols')
        print(file_.read())
    else:
        tarfile.extract('etc/protocols')
    tarfile.close()

main()



回答3:


It sounds like you have two parts to your question:

  1. Fetching a single tar bundle from the SVN repo, without the rest of the repo's files.
  2. Using Python to extract two files from the retrieved bundle.

For the first part, I'll simply refer to this post on svn export and sparse checkouts.

For the second part, here is a solution for extracting the two files from the retrieved tarball:

import tarfile

files_i_want = ['path/to/file1','path/to/file2']

tar = tarfile.open("bundle.tar")
tar.extractall(members=[x for x in tar.getmembers() if x.name in files_i_want])


来源:https://stackoverflow.com/questions/20434912/is-it-possible-to-extract-single-file-from-tar-bundle-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!