Merging multiple CSV files without headers being repeated (using Python)

前端 未结 5 740
长情又很酷
长情又很酷 2020-12-08 01:38

I am a beginner with Python. I have multiple CSV files (more than 10), and all of them have same number of columns. I would like to merge all of them into a single CSV file,

5条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-08 02:06

    If you dont mind the overhead, you could use pandas which is shipped with common python distributions. If you plan do more with speadsheet tables, I recommend using pandas rather than trying to write your own libraries.

    import pandas as pd
    import glob
    interesting_files = glob.glob("*.csv")
    df_list = []
    for filename in sorted(interesting_files):
        df_list.append(pd.read_csv(filename))
    full_df = pd.concat(df_list)
    
    full_df.to_csv('output.csv')
    

    Just a little more on pandas. Because it is made to deal with spreadsheet like data, it knows the first line is a header. When reading a CSV it separates the data table from the header which is kept as metadata of the dataframe, the standard datatype in pandas. If you concat several of these dataframes it concatenates only the dataparts if their headers are the same. If the headers are not the same it fails and gives you an error. Probably a good thing in case your directory is polluted with CSV files from another source.

    Another thing: I just added sorted() around the interesting_files. I assume your files are named in order and this order should be kept. I am not sure about glob, but the os functions are not necessarily returning files sorted by their name.

提交回复
热议问题