pandas fill missing dates in time series

后端 未结 2 1494
[愿得一人]
[愿得一人] 2020-12-10 12:00

I have a dataframe which has aggregated data for some days. I want to add in the missing days

I was following another post, Add missing dates to pandas dataframe, u

相关标签:
2条回答
  • 2020-12-10 12:38

    You need to use period_range rather than date_range:

    In [11]: idx = pd.period_range(min(df.date), max(df.date))
        ...: results.reindex(idx, fill_value=0)
        ...:
    Out[11]:
                      f1        f2        f3        f4
    2000-01-01  2.049157  1.962635  2.756154  2.224751
    2000-01-02  2.675899  2.587217  1.540823  1.606150
    2000-01-03  0.000000  0.000000  0.000000  0.000000
    2000-01-04  0.000000  0.000000  0.000000  0.000000
    2000-01-05  0.000000  0.000000  0.000000  0.000000
    2000-01-06  0.000000  0.000000  0.000000  0.000000
    2000-01-07  0.000000  0.000000  0.000000  0.000000
    2000-01-08  0.000000  0.000000  0.000000  0.000000
    2000-01-09  0.000000  0.000000  0.000000  0.000000
    2000-01-10  0.000000  0.000000  0.000000  0.000000
    2000-01-11  0.000000  0.000000  0.000000  0.000000
    2000-01-12  0.000000  0.000000  0.000000  0.000000
    2000-01-13  0.000000  0.000000  0.000000  0.000000
    2000-01-14  0.000000  0.000000  0.000000  0.000000
    2000-01-15  0.000000  0.000000  0.000000  0.000000
    2000-01-16  0.000000  0.000000  0.000000  0.000000
    2000-01-17  0.000000  0.000000  0.000000  0.000000
    2000-01-18  0.000000  0.000000  0.000000  0.000000
    2000-01-19  0.000000  0.000000  0.000000  0.000000
    2000-01-20  0.000000  0.000000  0.000000  0.000000
    2000-01-21  0.000000  0.000000  0.000000  0.000000
    2000-01-22  0.000000  0.000000  0.000000  0.000000
    2000-01-23  0.000000  0.000000  0.000000  0.000000
    2000-01-24  0.000000  0.000000  0.000000  0.000000
    2000-01-25  0.000000  0.000000  0.000000  0.000000
    2000-01-26  0.000000  0.000000  0.000000  0.000000
    2000-01-27  0.000000  0.000000  0.000000  0.000000
    2000-01-28  0.000000  0.000000  0.000000  0.000000
    2000-01-29  0.000000  0.000000  0.000000  0.000000
    2000-01-30  0.000000  0.000000  0.000000  0.000000
    2000-01-31  0.000000  0.000000  0.000000  0.000000
    2000-02-01  0.000000  0.000000  0.000000  0.000000
    2000-02-02  0.000000  0.000000  0.000000  0.000000
    2000-02-03  0.000000  0.000000  0.000000  0.000000
    2000-02-04  1.856158  2.892620  2.986166  2.793448
    

    This is because your groupby uses PeriodIndex, rather than datetime:

    df.groupby(pd.PeriodIndex(data=df.date, freq='D'))
    

    You could have instead used a pd.Grouper:

    df.groupby(pd.Grouper(key="date", freq='D'))
    

    which would have give a datetime index.

    0 讨论(0)
  • 2020-12-10 12:44

    From cᴏʟᴅsᴘᴇᴇᴅ's hints in the comments:


    resample fits well here.

    Resample: Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

    import random
    import datetime as dt
    import numpy as np
    import pandas as pd
    
    def generate_row(year, month, day):
        while True:
            date = dt.datetime(year=year, month=month, day=day)
            data = np.random.random(size=4)
            yield [date] + list(data)
    
    # days I have data for
    dates = [(2000, 1, 1), (2000, 1, 2), (2000, 2, 4)]
    generators = [generate_row(*date) for date in dates]
    
    # get 5 points for each
    data = [next(generator) for generator in generators for _ in range(5)]
    
    # make dataframe
    df = pd.DataFrame(data, columns=['date'] + ['f'+str(i) for i in range(1,5)])
    
    # using the resample method
    df.set_index(df.date, inplace=True)
    df = df.resample('D').sum().fillna(0)
    

    0 讨论(0)
提交回复
热议问题