I hope this is not trivial but I am wondering the following:
If I have a specific folder with n csv
files, how could I iteratively read all of
According to the documentation of numpy.genfromtxt()
, the first argument can be a
File, filename, or generator to read.
That would mean that you could write a generator that yields the lines of all the files like this:
def csv_merge_generator(pattern):
for file in glob.glob(pattern):
for line in file:
yield line
# then using it like this
numpy.genfromtxt(csv_merge_generator('*.csv'))
should work. (I do not have numpy installed, so cannot test easily)
Using pandas and glob as the base packages
import glob
import pandas as pd
glued_data = pd.DataFrame()
for file_name in glob.glob(directoryPath+'*.csv'):
x = pd.read_csv(file_name, low_memory=False)
glued_data = pd.concat([glued_data,x],axis=0)
I think you look for something like this
import glob
for file_name in glob.glob(directoryPath+'*.csv'):
x = np.genfromtxt(file_name,delimiter=',')[:,2]
# do your calculations
Edit
If you want to get all csv
files from a folder (including subfolder) you could use subprocess
instead of glob
(note that this code only works on linux systems)
import subprocess
file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]
for i,file_name in enumerate(file_list):
x = np.genfromtxt(file_name,delimiter=',')[:,2]
# do your calculations
# now you can use i as an index
It first searches the folder and sub-folders for all file_names using the find
command from the shell and applies your calculations afterwards.
That's how I'd do it:
import os
directory = os.path.join("c:\\","path")
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
f=open(file, 'r')
# perform calculation
f.close()