I use the following simple Python script to compress a large text file (say, 10GB) on an EC2 m3.large instance. However, I always got a MemoryError
It is weird to get a memory error even when reading a file line by line. I suppose it is because you have very little available memory and very large lines. You should then use binary reads :
import gzip
#adapt size value : small values will take more time, high value could cause memory errors
size = 8096
with open('test_large.csv', 'rb') as f_in:
with gzip.open('test_out.csv.gz', 'wb') as f_out:
while True:
data = f_in.read(size)
if data == '' : break
f_out.write(data)