I have about 50 GB of text file and I am checking the first few characters each line and writing those to other files specified for that beginning text.
For example.
IO operations consume too much time. Open and close the file, also.
It's much faster if you open both files(the input and the output), use a memory buffer with, let's say, 10MB size for your text processing and then write this to the output file. For example:
file = {} # just initializing dicts
filename = {}
with open(file) as f:
file['dog'] = None
buffer = ''
...
#maybe there is a loop here
if writeflag:
if file['dog'] == None:
file['dog'] = open(filename['dog'], 'a')
buffer += remline + '\n'
if len(buffer) > 1024*1000*10: # 10MB of text
files['dog'].write(buffer)
buffer = ''
for v in files.values():
v.close()