Get first non-null value per row

后端 未结 4 1830
轻奢々
轻奢々 2020-12-20 21:04

I have a sample dataframe show as below. For each line, I want to check the c1 first, if it is not null, then check c2. By this way, find the first notnull column and store

4条回答
  •  被撕碎了的回忆
    2020-12-20 21:34

    Setup

    df = df.set_index('ID') # if necessary
    df
         c1   c2  c3   c4
    ID                   
    1     a    b   a  NaN
    2   NaN   cc  dd   cc
    3   NaN   ee  ff   ee
    4   NaN  NaN  gg   gg
    

    Solution
    stack + groupby + first
    stack implicitly drops NaNs, so groupby.first is guarantee to give you the first non-null value if it exists. Assigning the result back will expose any NaNs at missing indices which you can fillna with a subsequent call.

    df['result'] = df.stack().groupby(level=0).first()
    # df['result'] = df['result'].fillna('unknown') # if necessary 
    df
         c1   c2  c3   c4 result
    ID                          
    1     a    b   a  NaN      a
    2   NaN   cc  dd   cc     cc
    3   NaN   ee  ff   ee     ee
    4   NaN  NaN  gg   gg     gg
    

    (beware, this is slow for larger dataframes, for performance you may use @jezrael's solution)

提交回复
热议问题