Efficient solution for forward filling missing values in a pandas dataframe column?

前端 未结 3 1887
盖世英雄少女心
盖世英雄少女心 2020-12-18 13:00

I need to forward fill values in a column of a dataframe within groups. I should note that the first value in a group is never missing by construction. I have the following

相关标签:
3条回答
  • 2020-12-18 13:07

    Using ffill() directly will give the best results. Here is the comparison

    %timeit df.b.ffill(inplace = True)
    best of 3: 311 µs per loop
    
    %timeit df['b'] = df.groupby('a')['b'].transform(lambda x: x.fillna(method='ffill'))
    best of 3: 2.34 ms per loop
    
    %timeit df['b'] = df.groupby('a')['b'].fillna(method='ffill')
    best of 3: 4.41 ms per loop
    
    0 讨论(0)
  • 2020-12-18 13:24

    what about this

    df.groupby('a').b.transform('ffill')
    
    0 讨论(0)
  • 2020-12-18 13:27

    You need to sort by both columns df.sort_values(['a', 'b']).ffill() to ensure robustness. If an np.nan is left in the first position within a group, ffill will fill that with a value from the prior group. Because np.nan will be placed at the end of any sort, sorting by both a and b ensures that you will not have np.nan at the front of any group. You can then .loc or .reindex with the initial index to get back your original order.

    This will obviously be a tad slower than the other proposals... However, I contend it will be correct where the others are not.

    demo

    Consider the dataframe df

    df = pd.DataFrame({'a': [1,1,2,2,2], 'b': [1, np.nan, np.nan, 2, np.nan]})
    
    print(df)
    
       a    b
    0  1  1.0
    1  1  NaN
    2  2  NaN
    3  2  2.0
    4  2  NaN
    

    Try

    df.sort_values('a').ffill()
    
       a    b
    0  1  1.0
    1  1  1.0
    2  2  1.0  # <--- this is incorrect
    3  2  2.0
    4  2  2.0
    

    Instead do

    df.sort_values(['a', 'b']).ffill().loc[df.index]
    
       a    b
    0  1  1.0
    1  1  1.0
    2  2  2.0
    3  2  2.0
    4  2  2.0
    

    special note
    This is still incorrect if an entire group has missing values

    0 讨论(0)
提交回复
热议问题