问题
I have a data column like below, in which some dates are missing.
obstime
2012-01-01
2012-01-02
2012-01-03
2012-01-04
....
2016-12-28
2016-12-29
2016-12-30
2016-12-31
I want to check for all dates for each month for available years. Like in the following image
回答1:
Use:
#sample data
df = pd.DataFrame({'obstime':pd.date_range('2012-01-01', '2016-12-31')})
removed = ['2013-09-01', '2013-09-02', '2013-09-03','2014-10-09','2016-12-30']
removed1 = pd.date_range('2016-12-16', '2016-12-22')
removed2 = pd.date_range('2016-10-10', '2016-12-03')
df = df[~df['obstime'].isin(pd.to_datetime(removed).append(removed1).append(removed2))]
#print (df)
#add missing values
df1 = df.set_index('obstime', drop=False).reindex(pd.date_range('2012-01-01', '2016-12-31'))
#create mask for start and end missing values and for start and end months with NaT
m = df1['obstime'].isnull()
start_NaT = m.ne(m.shift())
end_NaT = m.ne(m.shift(-1))
start_months = df1.index.day == 1
end_months = df1.index.isin(df1.index + pd.offsets.MonthEnd(0))
mask = (start_NaT | end_NaT | start_months | end_months) & m
#mask for separated missing values
s = start_NaT.cumsum()
m1 = s.map(s.value_counts()) == 1
#for start and end days join -
df2 = df1[mask & ~m1].reset_index().rename(columns={'index':'date'})
df2['day'] = df2['date'].dt.day.astype(str)
df2 = df2.groupby(np.arange(len(df2.index)) // 2).agg({'date':'first', 'day':'-'.join})
#separate days
df3 = df1[mask & m1].copy()
df3['day'] = df3.index.day.astype(str)
#join together
df3 = pd.concat([df2.set_index('date'), df3])
#join days by , add missing months and years
df4 = (df3.groupby([df3.index.month, df3.index.year])['day']
.agg(','.join)
.unstack(fill_value='yes')
.reindex(index=range(1, 13), columns=range(2008, 2017),fill_value='yes'))
print (df4)
2008 2009 2010 2011 2012 2013 2014 2015 2016
1 yes yes yes yes yes yes yes yes yes
2 yes yes yes yes yes yes yes yes yes
3 yes yes yes yes yes yes yes yes yes
4 yes yes yes yes yes yes yes yes yes
5 yes yes yes yes yes yes yes yes yes
6 yes yes yes yes yes yes yes yes yes
7 yes yes yes yes yes yes yes yes yes
8 yes yes yes yes yes yes yes yes yes
9 yes yes yes yes yes 1-3 yes yes yes
10 yes yes yes yes yes yes 9 yes 10-31
11 yes yes yes yes yes yes yes yes 1-30
12 yes yes yes yes yes yes yes yes 1-3,16-22,30
回答2:
My solution is based on Pandas, without any use of databases.
The idea is to reindex the source Dataframe, using "full" index (with all dates from the year range). For this test purpose, I used dates from year 2016 and 2017.
Then we leave only "just added" rows, with dates for "absent" measurements.
The remaining operations are:
- Group by months, applying a function generating day ranges.
- Convert to a DataFrame with "extracted" year and month.
- Pivot the DataFrame (month as index, year as columns).
- Add month names and set them as the index.
So the whole script can be as follows:
import pandas as pd
import calendar
# Function to be applied to date groups for each month
def fun(x):
dt = x.result
day = pd.Timedelta('1d')
startDates = dt[dt.diff() != day]
if startDates.size > 0:
endDates = dt[(dt - dt.shift(-1)).abs() != day]
return '&'.join([(f'{s.day}-{e.day}') for s, e in zip(startDates, endDates)])
else:
return 'OK'
# Source dates
dates = pd.date_range('2016-01-01', '2016-01-13')\
.append(pd.date_range('2016-01-20', '2016-01-29'))\
.append(pd.date_range('2016-02-10', '2016-02-20'))\
.append(pd.date_range('2016-03-11', '2017-11-20'))\
.append(pd.date_range('2017-11-25', '2017-12-31'))
# Source DataFrame with random results for dates given
df = pd.DataFrame(data={ 'result': np.random.randint(10, 30, len(dates))},
index=dates)
# Index for full range of dates
idxFull = pd.date_range('2016-01-01', '2017-12-31')
# "Expand" to all dates
df2 = df.reindex(idxFull)
# Leave only "empty" rows
df2.drop(df2[df2.result.notna()].index, inplace=True)
# Copy index to result
df2.result = df2.index
# Group by months
gr = df2.groupby(pd.Grouper(freq='M'))
# Result - Series
res = gr.apply(fun)
# Result - DataFrame with year/month "extracted" from date
res2 = pd.DataFrame(data={'res': res, 'year': res.index.year,
'month': res.index.month })
# Result - pivot'ed res2
res3 = res2.pivot(index='month', columns='year').fillna('OK')
# Add month names
res3['MonthName'] = list(calendar.month_name)[1:]
# Set month names as index
res3.set_index('MonthName', inplace=True)
When you print(res3)
, the result is:
res
year 2016 2017
MonthName
January 14-19&30-31 OK
February 1-9&21-29 OK
March 1-10 OK
April OK OK
May OK OK
June OK OK
July OK OK
August OK OK
September OK OK
October OK OK
November OK 21-24
December OK OK
来源:https://stackoverflow.com/questions/54061029/check-whether-all-dates-are-present-in-a-year-in-pandas-python