Compress a file into different parts in python

a 夏天 提交于 2020-07-31 03:20:49

问题


Is it a way in Python (2.7 preferably) to compress a file in several equally-sized .zip files??

The result would be something like: (lets assume 200MB selected and compressing a file of 1100MB)

compressed_file.zip.001 (200MB)
compressed_file.zip.002 (200MB)
compressed_file.zip.003 (200MB)
compressed_file.zip.004 (200MB)
compressed_file.zip.005 (200MB)
compressed_file.zip.006 (100MB)

回答1:


I think you can do it in shell command. Somthing like

gzip -c /path/to/your/large/file | split -b 150000000 - compressed.gz

and you can execute shell from python.

Regards

Ganesh J




回答2:


NB: This is based on assumption that the result is just a chopped up ZIP file without any extra headers or anything.

If you check the docs, ZipFile can be passed a file-like object to use for the I/O. Hence, we should be able to give it our own object which implements the necessary subset of the protocol, and which splits the output into multiple files.

As it turns out, we only need to implement 3 functions:

  • tell() -- just return number of bytes written so far
  • write(str) -- write to file until max capacity, once full open a new file, repeat until all data written
  • flush() -- flush the currently open file

Prototype Script

import random
import zipfile


def get_random_data(length):
    return "".join([chr(random.randrange(256)) for i in range(length)])


class MultiFile(object):
    def __init__(self, file_name, max_file_size):
        self.current_position = 0
        self.file_name = file_name
        self.max_file_size = max_file_size
        self.current_file = None        
        self.open_next_file()

    @property
    def current_file_no(self):
        return self.current_position / self.max_file_size

    @property
    def current_file_size(self):
        return self.current_position % self.max_file_size

    @property
    def current_file_capacity(self):
        return self.max_file_size - self.current_file_size

    def open_next_file(self):
        file_name = "%s.%03d" % (self.file_name, self.current_file_no + 1)
        print "* Opening file '%s'..." % file_name
        if self.current_file is not None:
            self.current_file.close()
        self.current_file = open(file_name, 'wb')

    def tell(self):
        print "MultiFile::Tell -> %d" % self.current_position
        return self.current_position

    def write(self, data):
        start, end = 0, len(data)
        print "MultiFile::Write (%d bytes)" % len(data)
        while start < end:
            current_block_size = min(end - start, self.current_file_capacity)
            self.current_file.write(data[start:start+current_block_size])
            print "* Wrote %d bytes." % current_block_size
            start += current_block_size
            self.current_position += current_block_size
            if self.current_file_capacity == self.max_file_size:
                self.open_next_file()
            print "* Capacity = %d" % self.current_file_capacity

    def flush(self):
        print "MultiFile::Flush"
        self.current_file.flush()


mfo = MultiFile('splitzip.zip', 2**18)

zf = zipfile.ZipFile(mfo,  mode='w', compression=zipfile.ZIP_DEFLATED)


for i in range(4):
    filename = 'test%04d.txt' % i
    print "Adding file '%s'..." % filename
    zf.writestr(filename, get_random_data(2**17))

Trace Output

* Opening file 'splitzip.zip.001'...
Adding file 'test0000.txt'...
MultiFile::Tell -> 0
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 262102
MultiFile::Write (131112 bytes)
* Wrote 131112 bytes.
* Capacity = 130990
MultiFile::Flush
Adding file 'test0001.txt'...
MultiFile::Tell -> 131154
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 130948
MultiFile::Write (131112 bytes)
* Wrote 130948 bytes.
* Opening file 'splitzip.zip.002'...
* Capacity = 262144
* Wrote 164 bytes.
* Capacity = 261980
MultiFile::Flush
Adding file 'test0002.txt'...
MultiFile::Tell -> 262308
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 261938
MultiFile::Write (131112 bytes)
* Wrote 131112 bytes.
* Capacity = 130826
MultiFile::Flush
Adding file 'test0003.txt'...
MultiFile::Tell -> 393462
MultiFile::Write (42 bytes)
* Wrote 42 bytes.
* Capacity = 130784
MultiFile::Write (131112 bytes)
* Wrote 130784 bytes.
* Opening file 'splitzip.zip.003'...
* Capacity = 262144
* Wrote 328 bytes.
* Capacity = 261816
MultiFile::Flush
MultiFile::Tell -> 524616
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261770
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261758
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261712
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261700
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261654
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261642
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Write (46 bytes)
* Wrote 46 bytes.
* Capacity = 261596
MultiFile::Write (12 bytes)
* Wrote 12 bytes.
* Capacity = 261584
MultiFile::Write (0 bytes)
MultiFile::Write (0 bytes)
MultiFile::Tell -> 524848
MultiFile::Write (22 bytes)
* Wrote 22 bytes.
* Capacity = 261562
MultiFile::Write (0 bytes)
MultiFile::Flush

Directory Listing

-rw-r--r-- 1   2228 Feb 21 23:44 splitzip.py
-rw-r--r-- 1 262144 Feb 22 00:07 splitzip.zip.001
-rw-r--r-- 1 262144 Feb 22 00:07 splitzip.zip.002
-rw-r--r-- 1    582 Feb 22 00:07 splitzip.zip.003

Validation

>7z l splitzip.zip.001

7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18

Listing archive: splitzip.zip.001

--
Path = splitzip.zip.001
Type = Split
Volumes = 3
----
Path = splitzip.zip
Size = 524870
--
Path = splitzip.zip
Type = zip
Physical Size = 524870

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2019-02-22 00:07:34 .....       131072       131112  test0000.txt
2019-02-22 00:07:34 .....       131072       131112  test0001.txt
2019-02-22 00:07:36 .....       131072       131112  test0002.txt
2019-02-22 00:07:36 .....       131072       131112  test0003.txt
------------------- ----- ------------ ------------  ------------------------
                                524288       524448  4 files, 0 folders


来源:https://stackoverflow.com/questions/54809238/compress-a-file-into-different-parts-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!