Concatenating multiple csv files into a single csv with the same header - Python

后端 未结 4 734
萌比男神i
萌比男神i 2020-12-13 21:00

I am currently using the below code to import 6,000 csv files (with headers) and export them into a single csv file (with a single header row).

#import csv f         


        
4条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-13 21:33

    If you don't need the CSV in memory, just copying from input to output, it'll be a lot cheaper to avoid parsing at all, and copy without building up in memory:

    import shutil
    import glob
    
    
    #import csv files from folder
    path = r'data/US/market/merged_data'
    allFiles = glob.glob(path + "/*.csv")
    allFiles.sort()  # glob lacks reliable ordering, so impose your own if output order matters
    with open('someoutputfile.csv', 'wb') as outfile:
        for i, fname in enumerate(allFiles):
            with open(fname, 'rb') as infile:
                if i != 0:
                    infile.readline()  # Throw away header on all but first file
                # Block copy rest of file from input to output without parsing
                shutil.copyfileobj(infile, outfile)
                print(fname + " has been imported.")
    

    That's it; shutil.copyfileobj handles efficiently copying the data, dramatically reducing the Python level work to parse and reserialize.

    This assumes all the CSV files have the same format, encoding, line endings, etc., and the header doesn't contain embedded newlines, but if that's the case, it's a lot faster than the alternatives.

提交回复
热议问题