how can I resample pandas dataframe by day on period time?

爷,独闯天下 提交于 2021-01-07 04:00:13

问题


i have a dataframe like this:

df.head()
Out[2]: 
         price   sale_date 
0  477,000,000  1396/10/30 
1  608,700,000  1396/10/30 
2  580,000,000  1396/10/03 
3  350,000,000  1396/10/03 
4  328,000,000  1396/03/18

that it has out of bounds datetime
so then i follow below to make them as period time

df['sale_date']=df['sale_date'].str.replace('/','').astype(int)

def conv(x):
    return pd.Period(year=x // 10000,
                     month=x // 100 % 100,
                     day=x % 100, freq='D')
 
df['sale_date'] = df['sale_date'].str.replace('/','').astype(int).apply(conv)

now i want to resample them by day like below:

df.resample(freq='d', on='sale_date').sum()

but it gives me this error:

resample() got an unexpected keyword argument 'freq'

回答1:


It seems here not working resample and Grouper with Periods for me in pandas 1.1.3 (I guess bug):

df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)

def conv(x):
    return pd.Period(year=x // 10000,
                     month=x // 100 % 100,
                     day=x % 100, freq='D')
 
df['sale_date'] = df['sale_date'].apply(conv)

# df = df.set_index('sale_date').resample('D')['price'].sum()
#OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1396-03-18 00:00:00

# df = df.set_index('sale_date').groupby(pd.Grouper(freq='D'))['price'].sum()
#OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1396-03-18 00:00:00

Possible solution is aggregate by sum, so if duplicated sale_date then price values are summed:

df = df.groupby('sale_date')['price'].sum().reset_index()
print (df)
    sale_date      price
0  1396-03-18  328000000
1  1396-10-03  580000000
2  1396-10-30  477000000
3  1396-11-25  608700000
4  1396-12-05  350000000

EDIT: It is possible by Series.reindex with period_range:

s = df.groupby('sale_date')['price'].sum()
rng = pd.period_range(s.index.min(), s.index.max(), name='sale_date')
df = s.reindex(rng, fill_value=0).reset_index()
print (df)
      sale_date      price
0    1396-03-18  328000000
1    1396-03-19          0
2    1396-03-20          0
3    1396-03-21          0
4    1396-03-22          0
..          ...        ...
258  1396-12-01          0
259  1396-12-02          0
260  1396-12-03          0
261  1396-12-04          0
262  1396-12-05  350000000

[263 rows x 2 columns]


来源:https://stackoverflow.com/questions/64731501/how-can-i-resample-pandas-dataframe-by-day-on-period-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!