Read in all csv files from a directory using Python

后端未结

关注

 4  927

傲寒

I hope this is not trivial but I am wondering the following:

If I have a specific folder with n csv files, how could I iteratively read all of

相关标签:

4条回答

鱼传尺愫

2020-12-13 11:16
According to the documentation of numpy.genfromtxt(), the first argument can be a

File, filename, or generator to read.

That would mean that you could write a generator that yields the lines of all the files like this:
```
def csv_merge_generator(pattern):
    for file in glob.glob(pattern):
        for line in file:
            yield line

# then using it like this

numpy.genfromtxt(csv_merge_generator('*.csv')) 
```
should work. (I do not have numpy installed, so cannot test easily)
0 讨论(0)
发布评论:

提交评论
- 加载中...

南旧

2020-12-13 11:21

Using pandas and glob as the base packages

import glob
import pandas as pd

glued_data = pd.DataFrame()
for file_name in glob.glob(directoryPath+'*.csv'):
    x = pd.read_csv(file_name, low_memory=False)
    glued_data = pd.concat([glued_data,x],axis=0)

0 讨论(0)

夕颜

2020-12-13 11:24

I think you look for something like this

import glob

for file_name in glob.glob(directoryPath+'*.csv'):
    x = np.genfromtxt(file_name,delimiter=',')[:,2]
    # do your calculations

Edit

If you want to get all csv files from a folder (including subfolder) you could use subprocess instead of glob (note that this code only works on linux systems)

import subprocess
file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]

for i,file_name in enumerate(file_list):
    x = np.genfromtxt(file_name,delimiter=',')[:,2]
    # do your calculations
    # now you can use i as an index

It first searches the folder and sub-folders for all file_names using the find command from the shell and applies your calculations afterwards.

0 讨论(0)

名媛妹妹

2020-12-13 11:31

That's how I'd do it:

import os

directory = os.path.join("c:\\","path")
for root,dirs,files in os.walk(directory):
    for file in files:
       if file.endswith(".csv"):
           f=open(file, 'r')
           #  perform calculation
           f.close()

0 讨论(0)