Compute daily climatology using pandas python

后端 未结 3 1613
孤街浪徒
孤街浪徒 2020-12-10 17:34

I am trying to use pandas to compute daily climatology. My code is:

import pandas as pd

dates      = pd.date_range(\'1950-01-01\', \'1953-12-31\', freq=\'D\         


        
相关标签:
3条回答
  • 2020-12-10 18:23

    Hoping it can be of any help, I want to post my solution to get a climatology series with the same index and length of the original time series.

    I use joris' solution to get a "model climatology" of 365/366 elements, then I build my desired series taking values from this model climatology and time index from my original time series. This way, things like leap years are automatically taken care of.

    #I start with my time series named 'serData'.
    #I apply joris' solution to it, getting a 'model climatology' of length 365 or 366.
    serClimModel = serData.groupby([serData.index.month, serData.index.day]).mean()
    
    #Now I build the climatology series, taking values from serClimModel depending on the index of serData.
    serClimatology = serClimModel[zip(serData.index.month, serData.index.day)]
    
    #Now serClimatology has a time index like this: [1,1] ... [12,31].
    #So, as a final step, I take as time index the one of serData.
    serClimatology.index = serData.index
    
    0 讨论(0)
  • 2020-12-10 18:25

    @joris. Thanks. Your answer was just what I needed to use pandas to calculate daily climatologies, but you stopped short of the final step. Re-mapping the month,day index back to an index of day of the year for all years, including leap years, i.e. 1 thru 366. So I thought I'd share my solution for other users. 1950 thru 1953 is 4 years with one leap year, 1952. Note since random values are used each run will give different results.

    ...   
    from datetime import date
    doy = []
    doy_mean = []
    doy_size = []
    for name, group in cum_data.groupby([cum_data.index.month, cum_data.index.day]):
      (mo, dy) = name
      # Note: can use any leap year here.
      yrday = (date(1952, mo, dy)).timetuple().tm_yday
      doy.append(yrday)
      doy_mean.append(group.mean())
      doy_size.append(group.count())
      # Note: useful climatology stats are also available via group.describe() returned as dict
      #desc = group.describe()
      # desc["mean"], desc["min"], desc["max"], std,quartiles, etc.
    
    # we lose the counts here.
    new_cum_data  = pd.Series(doy_mean, index=doy)
    print new_cum_data.ix[366]
    >> 634.5
    
    pd_dict = {}
    pd_dict["mean"] = doy_mean
    pd_dict["size"] = doy_size
    cum_data_df = pd.DataFrame(data=pd_dict, index=doy)
    
    print cum_data_df.ix[366]
    >> mean    634.5
    >> size      4.0
    >> Name: 366, dtype: float64
    # and just to check Feb 29
    print cum_data_df.ix[60]
    >> mean    343
    >> size      1
    >> Name: 60, dtype: float64
    
    0 讨论(0)
  • 2020-12-10 18:26

    You can groupby the day of the year, and the calculate the mean for these groups:

    cum_data.groupby(cum_data.index.dayofyear).mean()
    

    However, you have the be aware of leap years. This will cause problems with this approach. As alternative, you can also group by the month and the day:

    In [13]: cum_data.groupby([cum_data.index.month, cum_data.index.day]).mean()
    Out[13]:
    1  1     462.25
       2     631.00
       3     615.50
       4     496.00
    ...
    12  28    378.25
        29    427.75
        30    528.50
        31    678.50
    Length: 366, dtype: float64
    
    0 讨论(0)
提交回复
热议问题