问题
I have a data set with a date range from January 12th to August 3rd of 2018 with some values:
The dimensionality of my_df
DataFrame is:
my_df.shape
(9752, 2)
Each row contains half hour frequency
The first row begins at 2018-01-12
my_df.iloc[0]
Date: 2018-01-12 00:17:28
Value 1
Name: 0, dtype: object
And the last row ending at 2018-08-03
my_df.tail(1)
Date: Value
9751 2018-08-03 23:44:59 1
My goal is to select the data rows corresponding to each day and export it to a CSV file.
To get only the January 12th data and save to readable file, I perform:
# Selecting data value of each day
my_df_Jan12 = my_df[(my_df['Fecha:']>='2018-01-12 00:00:00')
&
(my_df['Fecha:']<='2018-01-12 23:59:59')
]
my_df_Jan12.to_csv('Data_Jan_12.csv', sep=',', header=True, index=False)
From January 12 to August 03 there are 203 days (28 weeks)
I don't want to perform this query by each day manually, then I am trying the following basic analysis:
- I need to generate 203 files (1 file by each day)
- The day on January starting on 12 (January 12)
- January is the first month (01) and August is the eighth month(08)
Then:
- I need to iterate over the 203 days totality
- and is necessary in each date row value check the month and day value date with the order to check the change of each one of them
According to the above, I am trying this approach:
# Selecting data value of each day (203 days)
for i in range(203):
for j in range(1,9): # month
for k in range(12,32): # days of the month
values = my_df[(my_df['Fecha:']>='2018-0{}-{} 00:00:00'.format(j,k))
&
(my_df['Fecha:']<='2018-0{}-{} 23:59:59'.format(j,k))]
values.to_csv('Values_day_{}.csv'.format(i), sep=',', header=True, index=False)
But I have the problem in the sense of when I iterate of range(12,32)
in the days of the months, this range(12,32)
only apply to first January month, I think so ...
Finally, I get 203 empty CSV files, due to something I am doing wrong ...
How to can I address this small challenge of the suited way? Any orientation is highly appreciated
回答1:
Something like this? I renamed your original column of Date:
to Timestamp
. I am also assuming that the Date:
Series you have is a pandas DateTime
series.
my_df.columns = ['Timestamp', 'Value']
my_df['Date'] = my_df['Timestamp'].apply(lambda x: x.date())
dates = my_df['Date'].unique()
for date in dates:
f_name = str(date) + '.csv'
my_df[my_df['Date'] == date].to_csv(f_name)
回答2:
groupby
for date, d in df.groupby(pd.Grouper(key='Date', freq='D')):
d.to_csv(f"Data_{date:%b_%d}.csv", index=False)
Notice I used an f-string which is Python 3.6+
Otherwise, use this
for date, d in df.groupby(pd.Grouper(key='Date', freq='D')):
d.to_csv("Data_{:%b_%d}.csv".format(date), index=False)
Consider the df
df = pd.DataFrame(dict(
Date=pd.date_range('2010-01-01', periods=10, freq='12H'),
Value=range(10)
))
Then
for date, d in df.groupby(pd.Grouper(key='Date', freq='D')):
d.to_csv(f"Data_{date:%b_%d}.csv", index=False)
And verify
from pathlib import Path
print(*map(Path.read_text, Path('.').glob('Data*.csv')), sep='\n')
Date,Value
2010-01-05 00:00:00,8
2010-01-05 12:00:00,9
Date,Value
2010-01-04 00:00:00,6
2010-01-04 12:00:00,7
Date,Value
2010-01-02 00:00:00,2
2010-01-02 12:00:00,3
Date,Value
2010-01-01 00:00:00,0
2010-01-01 12:00:00,1
Date,Value
2010-01-03 00:00:00,4
2010-01-03 12:00:00,5
来源:https://stackoverflow.com/questions/52265151/extracting-data-belonging-to-a-day-from-a-given-range-of-dates-on-a-dataset