Import multiple nested csv files and concatenate into one DataFrame

天大地大妈咪最大 提交于 2020-02-27 13:10:43

问题


I'm trying to read multiple CSV files that have the same structure(column names)and located in several folders, My main purpose is to concatenate these files into one panda data frame. please find attached below files location distribution of folders, thus each folder contains 5 CSV files. Is there any predefined function or smth that can help ??


回答1:


Using the os.walk() and pd.concat():

import os
import pandas as pd
outdir = [YOUR_INITIAL_PATH]
df_final = pd.DataFrame(columns=['column1', 'column2', 'columnN']) # creates an empty df with the desired structure
for root, dirs, filenames in os.walk(outdir):
    for f in filenames:
        if f.endswith('.csv'):
            df_temp = pd.read_csv(root + '\\' + f)
            df_final = pd.concat([df_final, df_temp])



回答2:


You might use glob.glob('*.csv') to find all csvs and then concat them all.

import glob
import pandas as pd

csv_paths = glob.glob('*.csv')
dfs = [pd.read_csv(path) for path in csv_paths]
df = pd.concat(dfs)



回答3:


You can use os.walk() to iterate over files in directory tree (example). pd.read_csv() will read a single file into a dataframe. pd.concat(df_list) will concatenate all dataframes in df_list together.

I don't believe there is a single method that combines all the above for your convenience.




回答4:


Frenzy Kiwi gave you the right answer. An alternative could be using dask let's say your folder structure is

data
├── 2016
│   ├── file01.csv
│   ├── file02.csv
│   └── file03.csv
├── 2017
│   ├── file01.csv
│   ├── file02.csv
│   └── file03.csv
└── 2018
   ├── file01.csv
   ├── file02.csv

Then you can just read all of them via

import dask.dataframe as dd
import pandas as pd

df = dd.read_csv("data/*/*.csv")
# convert to pandas via
df = df.compute()



回答5:


This is the best solution to this problem :

import os
import glob
import pandas as pd


def nested_files_to_df(path,ext): 

    paths = []
    all_data = pd.DataFrame()

    #--- Putting all files name  in one list ---#

    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith(tuple(ext)):
                s = os.path.join(root, file)
                paths.append(s)

    #--- Reading and merging all the  existing  excel files  into one  dataframe  ---#

    for f in paths:
        df = pd.read_excel(f)     
        all_data = all_data.append(df,ignore_index=True)

    return all_data

Calling the function :

df= nested_files_to_df('Your main folder root',[".xls",".XLS",".xlsx"])


来源:https://stackoverflow.com/questions/54811006/import-multiple-nested-csv-files-and-concatenate-into-one-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!