Parsing large (9GB) file using python

后端 未结 3 1369
陌清茗
陌清茗 2021-01-03 05:57

I have a large text file that I need to parse into a pipe delimited text file using python. The file looks like this (basically):

product/productId: D7SDF9S9         


        
3条回答
  •  太阳男子
    2021-01-03 06:57

    Don't read the whole file into memory at once, instead iterate over it line by line, also use Python's csv module to parse the records:

    import csv
    
    with open('hugeinputfile.txt', 'rb') as infile, open('outputfile.txt', 'wb') as outfile:
    
        writer = csv.writer(outfile, delimiter='|')
    
        for record in csv.reader(infile, delimiter='\n', lineterminator='\n\n'):
            values = [item.split(':')[-1].strip() for item in record[:-1]] + [record[-1]]
            writer.writerow(values)
    

    A couple things to note here:

    • Use with to open files. Why? Because using with ensures that the file is close()d, even if an exception interrupts the script.

    Thus:

    with open('myfile.txt') as f:
        do_stuff_to_file(f)
    

    is equivalent to:

    f = open('myfile.txt')
    try:
        do_stuff_to_file(f)
    finally:
        f.close()
    

    To be continued... (I'm out of time ATM)

提交回复
热议问题