Getting last non na value across rows in a pandas dataframe

前端 未结 3 2089
北海茫月
北海茫月 2020-12-11 05:22

I have a dataframe of shape (40,500). Each row in the dataframe has some numerical values till some variable column number k, and all the entries after that are nan.

<
3条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-11 05:35

    Here's a NumPy based solution -

    In [113]: a
    Out[113]: 
    array([[ 17.,  53.,  nan,  63.,  66.,  nan,  nan,  nan,  nan,  nan],
           [ 54.,  96.,  71.,  20.,  70.,  58.,  91.,  nan,  nan,  nan],
           [ 58.,  26.,  72.,  93.,  58.,  29.,  44.,  28.,  36.,  88.],
           [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
           [ 94.,  23.,  nan,  nan,  92.,  81.,  40.,  30.,  84.,  nan]])
    
    In [114]: m = ~np.isnan(a)
    
    In [115]: a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
    Out[115]: array([ 66.,  91.,  88.,  nan,  84.])
    

    To port this for dataframe, first off we can extract the values as an array : a = df.values and finally make the output dataframe :

    vals = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
    df_out = pd.DataFrame(vals,index=df.index)
    

提交回复
热议问题