This is my first post on Stack Overflow, I have a question regarding extracting a single file from a TAR file using GZ compression. I'm not the best at Python so I may be doing this incorrectly, any help would be much appreciated.
Scenario:
Corrupted *.tar.gz file comes in, the first file in the GZ contains important information for obtaining the SN of the system. This can be used to identify the machine so that we can issue a notification to it's administrator that the file was corrupted.
The Problem:
Using the regular UNIX tar binary I am able to extract just the README file from the archive even though the archive is not complete and would return an error upon extracting it fully. However, in Python I am unable to extract just one file, it always returns an exception even if I'm specifying just the single file.
Current Workaround:
I'm using "os.popen" to use the UNIX tar binary in order to obtain just the README file.
Desired Solution:
To use the Python tarfile package to extract just the single file.
Example Error:
UNIX (Works):
[root@athena tmp]# tar -xvzf bundle.tar.gz README README gzip: stdin: unexpected end of file tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now [root@athena tmp]# [root@athena tmp]# ls bundle.tar.gz README
Python:
>>> import tarfile >>> tar = tarfile.open("bundle.tar.gz") >>> data = tar.extractfile("README").read() Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/lib64/python2.4/tarfile.py", line 1364, in extractfile tarinfo = self.getmember(member) File "/usr/lib64/python2.4/tarfile.py", line 1048, in getmember tarinfo = self._getmember(name) File "/usr/lib64/python2.4/tarfile.py", line 1762, in _getmember members = self.getmembers() File "/usr/lib64/python2.4/tarfile.py", line 1059, in getmembers self._load() # all members, we first have to File "/usr/lib64/python2.4/tarfile.py", line 1778, in _load tarinfo = self.next() File "/usr/lib64/python2.4/tarfile.py", line 1588, in next self.fileobj.seek(self.offset) File "/usr/lib64/python2.4/gzip.py", line 377, in seek self.read(1024) File "/usr/lib64/python2.4/gzip.py", line 225, in read self._read(readsize) File "/usr/lib64/python2.4/gzip.py", line 273, in _read self._read_eof() File "/usr/lib64/python2.4/gzip.py", line 309, in _read_eof raise IOError, "CRC check failed" IOError: CRC check failed >>> print data Traceback (most recent call last): File "<stdin>", line 1, in ? NameError: name 'data' is not defined
Python (Handling Exception):
>>> tar = tarfile.open("bundle.tar.gz") >>> try: ... data = tar.extractfile("README").read() ... except: ... pass ... >>> print(data) Traceback (most recent call last): File "<stdin>", line 1, in ? NameError: name 'data' is not defined