Zipping files in python

夙愿已清 提交于 2019-12-04 02:43:34
Cédric Julien

When you do this :

for list in dirlist:
        get_file = os.path.join(absolute_path, list)
        zip_name = zipfile.ZipFile(filepath, 'w')
        zip_name.write(get_file, 'Project2b\\' + list)

you recreate a ZipFile for each file you want to zip, the "w" mode means you recreate it from scratch.

Try this (create the zip file before the loop) :

zip_name = zipfile.ZipFile(filepath, 'w')
for list in dirlist:
        get_file = os.path.join(absolute_path, list)
        zip_name.write(get_file, 'Project2b\\' + list)

Or this, it will open the zipfile in append mode:

for list in dirlist:
        get_file = os.path.join(absolute_path, list)
        zip_name = zipfile.ZipFile(filepath, 'a')
        zip_name.write(get_file, 'Project2b\\' + list)

Have a look at the shutil module. There is an example using shutil.make_archive():

http://docs.python.org/library/shutil.html

If you have a lot of files you can zip them in parallel:

import zipfile
from pathlib import Path, WindowsPath
from typing import List, Text
import logging
from time import time
from concurrent.futures import ThreadPoolExecutor

logging.basicConfig(
    format="%(asctime)s - %(message)s", datefmt="%H:%M:%S", level=logging.DEBUG
)

PATH = (r"\\some_directory\subdirectory\zipped")


def file_names() -> List[WindowsPath]:
    p = Path(PATH)
    file_names = list(p.glob("./*.csv"))
    logging.info("There are %d files", len(file_names))
    return file_names


def zip_file(file: WindowsPath) -> None:
    zip_file_name = Path(PATH, f"{file.stem}.zip")
    with zipfile.ZipFile(zip_file_name, "w") as zip:
        zip.write(file, arcname=file.name, compress_type=zipfile.ZIP_DEFLATED)


def main(files: List[Text]) -> None:
    t0 = time()
    number_of_files = len(files)
    with ThreadPoolExecutor() as executor:
        for counter, _ in enumerate(executor.map(zip_file, files), start=1):
            # update progress every 100 files
            if counter % 100 == 0:
                logging.info(
                    "Processed %d/%d. TT: %d:%d",
                    counter,
                    number_of_files,
                    *divmod(int(time() - t0), 60),
                )

    logging.info(
        "Finished zipping %d files. Total time: %d:%d",
        len(files),
        *divmod(int(time() - t0), 60),
    )


if __name__ == "__main__":
    files = file_names()
    main(files)

Best way to do this is by putting debug statements at your for loops, there are two possibilities;

one is that the first forloop only downloads one file from the ftp folder

two is that the first loop downloads all files but second loop zips only one of them

use print statements to see which files are downloaded/zipped at the loops, good luck

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!