Read in all csv files from a directory using Python

后端 未结 4 917
傲寒
傲寒 2020-12-13 10:44

I hope this is not trivial but I am wondering the following:

If I have a specific folder with n csv files, how could I iteratively read all of

相关标签:
4条回答
  • 2020-12-13 11:16

    According to the documentation of numpy.genfromtxt(), the first argument can be a

    File, filename, or generator to read.

    That would mean that you could write a generator that yields the lines of all the files like this:

    def csv_merge_generator(pattern):
        for file in glob.glob(pattern):
            for line in file:
                yield line
    
    # then using it like this
    
    numpy.genfromtxt(csv_merge_generator('*.csv')) 
    

    should work. (I do not have numpy installed, so cannot test easily)

    0 讨论(0)
  • 2020-12-13 11:21

    Using pandas and glob as the base packages

    import glob
    import pandas as pd
    
    glued_data = pd.DataFrame()
    for file_name in glob.glob(directoryPath+'*.csv'):
        x = pd.read_csv(file_name, low_memory=False)
        glued_data = pd.concat([glued_data,x],axis=0)
    
    0 讨论(0)
  • 2020-12-13 11:24

    I think you look for something like this

    import glob
    
    for file_name in glob.glob(directoryPath+'*.csv'):
        x = np.genfromtxt(file_name,delimiter=',')[:,2]
        # do your calculations
    

    Edit

    If you want to get all csv files from a folder (including subfolder) you could use subprocess instead of glob (note that this code only works on linux systems)

    import subprocess
    file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]
    
    for i,file_name in enumerate(file_list):
        x = np.genfromtxt(file_name,delimiter=',')[:,2]
        # do your calculations
        # now you can use i as an index
    

    It first searches the folder and sub-folders for all file_names using the find command from the shell and applies your calculations afterwards.

    0 讨论(0)
  • 2020-12-13 11:31

    That's how I'd do it:

    import os
    
    directory = os.path.join("c:\\","path")
    for root,dirs,files in os.walk(directory):
        for file in files:
           if file.endswith(".csv"):
               f=open(file, 'r')
               #  perform calculation
               f.close()
    
    0 讨论(0)
提交回复
热议问题