First column name with non null value by row pandas

前端 未结 3 2169
青春惊慌失措
青春惊慌失措 2021-01-03 08:59

I want know the first year with incoming revenue for various projects.

Given the following, dataframe:

ID  Y1      Y2      Y3
0   NaN     8       4
1         


        
3条回答
  •  情歌与酒
    2021-01-03 09:23

    Avoiding apply is preferable as its not vectorized. The following is vectorized. It was tested with Pandas 1.1.

    Setup

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({'Y1':[np.nan, np.nan, np.nan, 5],'Y2':[8, np.nan, np.nan, 3], 'Y3':[4, 1, np.nan, np.nan]})
    
    # df.dropna(how='all', inplace=True)  # Optional but cleaner
    
    # For ranking only:
    col_ranks = pd.DataFrame(index=df.columns, data=np.arange(1, 1 + len(df.columns)), columns=['first_notna_rank'], dtype='UInt8') # UInt8 supports max value of 255.
    

    To find the name of the first non-null column

    df['first_notna_name'] = df.dropna(how='all').notna().idxmax(axis=1).astype('string')
    

    If df has no rows with all nulls, dropna(how='all) above can be removed.

    To then find the first non-null value

    If df has no rows with all nulls:

    df['first_notna_value'] = df.lookup(row_labels=df.index, col_labels=df['first_notna_name'])
    

    If df may have rows with all nulls: (inefficient)

    df['first_notna_value'] = df.drop(columns='first_notna_name').bfill(axis=1).iloc[:, 0]
    

    To rank the name

    df = df.merge(col_ranks, how='left', left_on='first_notna_name', right_index=True)
    

    Is there a better way?

    Output

        Y1   Y2   Y3 first_notna_name  first_notna_value  first_notna_rank
    0  NaN  8.0  4.0               Y2                8.0                 2
    1  NaN  NaN  1.0               Y3                1.0                 3
    2  NaN  NaN  NaN                             NaN              
    3  5.0  3.0  NaN               Y1                5.0                 1
    

    Partial credit: answers by piRSquared and Andy

提交回复
热议问题