Merging multiple CSV files without headers being repeated (using Python)

前端 未结 5 726
长情又很酷
长情又很酷 2020-12-08 01:38

I am a beginner with Python. I have multiple CSV files (more than 10), and all of them have same number of columns. I would like to merge all of them into a single CSV file,

5条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-08 02:02

    Your attempt is almost working, but the issues are:

    • you're opening the file for reading but closing it before writing the rows.
    • you're never writing the title. You have to write it once
    • Also you have to exclude output.csv from the "glob" else the output is also in input!

    Here's the corrected code, passing the csv object direcly to csv.writerows method for shorter & faster code. Also writing the title from the first file to the output file.

    import glob
    import csv
    
    output_file = 'output.csv'
    header_written = False
    
    with open(output_file,'w',newline="") as fout:  # just "wb" in python 2
        wout = csv.writer(fout,delimiter=',')
        # filter out output
        interesting_files = [x for x in glob.glob("*.csv") if x != output_file]
        for filename in interesting_files:
            print('Processing {}'.format(filename))
            with open(filename) as fin:
                cr = csv.reader(fin,delmiter=",")
                header = cr.next() #skip header
                if not header_written:
                    wout.writerow(header)
                    header_written = True
                wout.writerows(cr)
    

    Note that solutions using raw line-by-line processing miss an important point: if the header is multi-line, they miserably fail, botching the title line/repeating part of it several time, efficiently corrupting the file.

    csv module (or pandas, too) handle those cases gracefully.

提交回复
热议问题