Parse a plain text file into a CSV file using Python

后端 未结 2 1556
星月不相逢
星月不相逢 2020-12-09 23:08

I have a series of HTML files that are parsed into a single text file using Beautiful Soup. The HTML files are formatted such that their output is always three lines within

2条回答
  •  甜味超标
    2020-12-09 23:49

    I'm not entirely sure what CSV library you're using, but it doesn't look like Python's built-in one. Anyway, here's how I'd do it:

    import csv
    import itertools
    
    with open('extracted.txt', 'r') as in_file:
        stripped = (line.strip() for line in in_file)
        lines = (line for line in stripped if line)
        grouped = itertools.izip(*[lines] * 3)
        with open('extracted.csv', 'w') as out_file:
            writer = csv.writer(out_file)
            writer.writerow(('title', 'intro', 'tagline'))
            writer.writerows(grouped)
    

    This sort of makes a pipeline. It first gets data from the file, then removes all the whitespace from the lines, then removes any empty lines, then groups them into groups of three, and then (after writing the CSV header) writes those groups to the CSV file.

    To combine the last two columns as you mentioned in the comments, you could change the writerow call in the obvious way and the writerows to:

    writer.writerows((title, intro + tagline) for title, intro, tagline in grouped)
    

提交回复
热议问题