Expanding pandas data frame with date range in columns

后端 未结 3 1713
Happy的楠姐
Happy的楠姐 2020-12-01 16:44

I have a pandas dataframe with dates and strings similar to this:

Start        End           Note    Item
2016-10-22   2016-11-05    Z       A
2017-02-11   2         


        
3条回答
  •  孤街浪徒
    2020-12-01 17:15

    If the number of unique values of df['End'] - df['Start'] is not too large, but the number of rows in your dataset is large, then the following function will be much faster than looping over your dataset:

    def date_expander(dataframe: pd.DataFrame,
                      start_dt_colname: str,
                      end_dt_colname: str,
                      time_unit: str,
                      new_colname: str,
                      end_inclusive: bool) -> pd.DataFrame:
        td = pd.Timedelta(1, time_unit)
    
        # add a timediff column:
        dataframe['_dt_diff'] = dataframe[end_dt_colname] - dataframe[start_dt_colname]
    
        # get the maximum timediff:
        max_diff = int((dataframe['_dt_diff'] / td).max())
    
        # for each possible timediff, get the intermediate time-differences:
        df_diffs = pd.concat([pd.DataFrame({'_to_add': np.arange(0, dt_diff + end_inclusive) * td}).assign(_dt_diff=dt_diff * td)
                              for dt_diff in range(max_diff + 1)])
    
        # join to the original dataframe
        data_expanded = dataframe.merge(df_diffs, on='_dt_diff')
    
        # the new dt column is just start plus the intermediate diffs:
        data_expanded[new_colname] = data_expanded[start_dt_colname] + data_expanded['_to_add']
    
        # remove start-end cols, as well as temp cols used for calculations:
        to_drop = [start_dt_colname, end_dt_colname, '_to_add', '_dt_diff']
        if new_colname in to_drop:
            to_drop.remove(new_colname)
        data_expanded = data_expanded.drop(columns=to_drop)
    
        # don't modify dataframe in place:
        del dataframe['_dt_diff']
    
        return data_expanded
    

提交回复
热议问题