I have a large text file that I need to parse into a pipe delimited text file using python. The file looks like this (basically):
product/productId: D7SDF9S9
Don't read the whole file into memory at once, instead iterate over it line by line, also use Python's csv module to parse the records:
import csv
with open('hugeinputfile.txt', 'rb') as infile, open('outputfile.txt', 'wb') as outfile:
writer = csv.writer(outfile, delimiter='|')
for record in csv.reader(infile, delimiter='\n', lineterminator='\n\n'):
values = [item.split(':')[-1].strip() for item in record[:-1]] + [record[-1]]
writer.writerow(values)
A couple things to note here:
with to open files. Why? Because using with ensures that the file is close()d, even if an exception interrupts the script.Thus:
with open('myfile.txt') as f:
do_stuff_to_file(f)
is equivalent to:
f = open('myfile.txt')
try:
do_stuff_to_file(f)
finally:
f.close()
To be continued... (I'm out of time ATM)