Add missing dates to pandas dataframe

前端 未结 5 2188
春和景丽
春和景丽 2020-11-22 09:47

My data can have multiple events on a given date or NO events on a date. I take these events, get a count by date and plot them. However, when I plot them, my two series do

5条回答
  •  滥情空心
    2020-11-22 10:22

    One issue is that reindex will fail if there are duplicate values. Say we're working with timestamped data, which we want to index by date:

    df = pd.DataFrame({
        'timestamps': pd.to_datetime(
            ['2016-11-15 1:00','2016-11-16 2:00','2016-11-16 3:00','2016-11-18 4:00']),
        'values':['a','b','c','d']})
    df.index = pd.DatetimeIndex(df['timestamps']).floor('D')
    df
    

    yields

                timestamps             values
    2016-11-15  "2016-11-15 01:00:00"  a
    2016-11-16  "2016-11-16 02:00:00"  b
    2016-11-16  "2016-11-16 03:00:00"  c
    2016-11-18  "2016-11-18 04:00:00"  d
    

    Due to the duplicate 2016-11-16 date, an attempt to reindex:

    all_days = pd.date_range(df.index.min(), df.index.max(), freq='D')
    df.reindex(all_days)
    

    fails with:

    ...
    ValueError: cannot reindex from a duplicate axis
    

    (by this it means the index has duplicates, not that it is itself a dup)

    Instead, we can use .loc to look up entries for all dates in range:

    df.loc[all_days]
    

    yields

                timestamps             values
    2016-11-15  "2016-11-15 01:00:00"  a
    2016-11-16  "2016-11-16 02:00:00"  b
    2016-11-16  "2016-11-16 03:00:00"  c
    2016-11-17  NaN                    NaN
    2016-11-18  "2016-11-18 04:00:00"  d
    

    fillna can be used on the column series to fill blanks if needed.

提交回复
热议问题