Extract zip to memory, parse contents

非 Y 不嫁゛ 提交于 2019-12-02 10:57:58

问题


I want to read the contents of a zip file into memory rather than extracting them to disc, find a particular file in the archive, open the file and extract a line from it.

Can a StringIO instance be opened and parsed? Suggestions? Thanks in advance.

zfile = ZipFile('name.zip', 'r')

    for name in zfile.namelist():
        if fnmatch.fnmatch(name, '*_readme.xml'):
            name = StringIO.StringIO()
            print name # prints StringIO instances
            open(name, 'r')  # IO Error: No such file or directory...

I found a few similar posts, but none that seem to address this issue: Extracting a zipfile to memory?


回答1:


IMO just using read is enough:

zfile = ZipFile('name.zip', 'r')
files = []
for name in zfile.namelist():
  if fnmatch.fnmatch(name, '*_readme.xml'):
    files.append(zfile.read(name))

This will make a list with contents of files that math the pattern.

Test: You can then parse contents after wards by iterating through list:

for file in files:
  print(file[0:min(35,len(file))].decode()) # "parsing"

Or better use a functor:

import zipfile as zip
import os
import fnmatch

zip_name = os.sys.argv[1]
zfile = zip.ZipFile(zip_name, 'r')

def parse(contents, member_name = ""):
  if len(member_name) > 0:
    print( "Parsed `{}`:".format(member_name) )  
  print(contents[0:min(35, len(contents))].decode()) # "parsing"

for name in zfile.namelist():
  if fnmatch.fnmatch(name, '*.cpp'):
    parse(zfile.read(name), name)

This way there is no data kept in memory for no reason and memory foot print is smaller. It might be important if the files are big.




回答2:


The question you link shows you that you need to read the file. Depending on your use case that may already be enough. In your code you replace the loop variable holding a filename with an empty string buffer. Try something like this:

zfile = ZipFile('name.zip', 'r')

for name in zfile.namelist():
    if fnmatch.fnmatch(name, '*_readme.xml'):
        ex_file = zfile.open(name) # this is a file like object
        content = ex_file.read() # now file-contents are a single string

If you really want a buffer that you can manipulate, then simply instantiate it with the contents:

buf = StringIO(zfile.open(name).read())

You may also want to look at BytesIO and note that there are differences between Python 2 and 3.




回答3:


Don't overthink it. It Just Works:

import zipfile

# 1) I want to read the contents of a zip file ...
with zipfile.ZipFile('A-Zip-File.zip') as zipper:
  # 2) ... find a particular file in the archive, open the file ...
  with zipper.open('A-Particular-File.txt') as fp:
    # 3) ... and extract a line from it.
    first_line = fp.readline()

print first_line



回答4:


Thank you to everyone that contributed solutions. This is what ended up working for me:

zfile = ZipFile('name.zip', 'r')

        for name in zfile.namelist():
            if fnmatch.fnmatch(name, '*_readme.xml'):
                zopen = zfile.open(name)
                for line in zopen:
                    if re.match('(.*)<foo>(.*)</foo>(.*)', line):
                        print line


来源:https://stackoverflow.com/questions/23569659/extract-zip-to-memory-parse-contents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!