Add missing dates to pandas dataframe

前端 未结 5 2185
春和景丽
春和景丽 2020-11-22 09:47

My data can have multiple events on a given date or NO events on a date. I take these events, get a count by date and plot them. However, when I plot them, my two series do

5条回答
  •  挽巷
    挽巷 (楼主)
    2020-11-22 10:34

    An alternative approach is resample, which can handle duplicate dates in addition to missing dates. For example:

    df.resample('D').mean()
    

    resample is a deferred operation like groupby so you need to follow it with another operation. In this case mean works well, but you can also use many other pandas methods like max, sum, etc.

    Here is the original data, but with an extra entry for '2013-09-03':

                 val
    date           
    2013-09-02     2
    2013-09-03    10
    2013-09-03    20    <- duplicate date added to OP's data
    2013-09-06     5
    2013-09-07     1
    

    And here are the results:

                 val
    date            
    2013-09-02   2.0
    2013-09-03  15.0    <- mean of original values for 2013-09-03
    2013-09-04   NaN    <- NaN b/c date not present in orig
    2013-09-05   NaN    <- NaN b/c date not present in orig
    2013-09-06   5.0
    2013-09-07   1.0
    

    I left the missing dates as NaNs to make it clear how this works, but you can add fillna(0) to replace NaNs with zeroes as requested by the OP or alternatively use something like interpolate() to fill with non-zero values based on the neighboring rows.

提交回复
热议问题