问题
I have a large CSV with multiple years of electricity load data, and I would like to split it into multiple files on a month and year basis - i.e to return individual CSVs for Jan, Feb, Mar etc for 2013, 2014, 2015 etc.
I have reviewed a lot of the solutions in the forums, and have not had any luck. My current file is structured as follows;
01-Jan-11,1,34606,34677,35648,35685,31058,484,1730
01-Jan-11,2,35092,35142,36089,36142,31460,520,1730
01-Jan-11,3,34725,34761,36256,36234,31109,520,1730
01-Jan-11,4,33649,33698,35628,35675,30174,512,1730
The name of the column header in the date column is "SETTLEMENT DATE". I am familiar with using pandas and df's but cannot seem to wrap my head around how to do this one!
Thanks!
回答1:
Here is how you can do it.
First you read your data from your csv_file and save your cols like this:
df = pd.read_csv('path/input.csv')
cols = df.columns
then you add two columns to your dataframe which are Month and Year:
df['Year'] = df['SETTLEMENT DATE'].apply(lambda x: x.split('-')[-1])
df['Month'] = df['SETTLEMENT DATE'].apply(lambda x: x.split('-')[1])
All you have to do left is to write your dataframe in separate csv files either by month or year with each file carrying the name of the month or year as follows:
for i in set(df.Year): # for classified by years files
filename = "path/"+i+".csv"
df.loc[df.Year == i].to_csv(filename,index=False,columns=cols)
for i in set(df.Month): # for classified by months files
filename = "path/"+i+".csv"
df.loc[df.Month == i].to_csv(filename,index=False,columns=cols)
Hope this was helpful.
来源:https://stackoverflow.com/questions/45001847/how-to-split-csv-file-into-respective-csv-files-based-on-date-in-first-column-p