Python - How to gzip a large text file without MemoryError?

前端 未结 3 646
温柔的废话
温柔的废话 2020-12-16 01:21

I use the following simple Python script to compress a large text file (say, 10GB) on an EC2 m3.large instance. However, I always got a MemoryError

3条回答
  •  死守一世寂寞
    2020-12-16 02:00

    It is weird to get a memory error even when reading a file line by line. I suppose it is because you have very little available memory and very large lines. You should then use binary reads :

    import gzip
    
    #adapt size value : small values will take more time, high value could cause memory errors
    size = 8096
    
    with open('test_large.csv', 'rb') as f_in:
        with gzip.open('test_out.csv.gz', 'wb') as f_out:
            while True:
                data = f_in.read(size)
                if data == '' : break
                f_out.write(data)
    

提交回复
热议问题