I have a sample dataframe show as below. For each line, I want to check the c1 first, if it is not null, then check c2. By this way, find the first notnull column and store
Use back filling NaNs first and then select first column by iloc:
df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')
Or:
df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')
print (df)
ID c1 c2 c3 c4 result
0 1 a b a NaN a
1 2 NaN cc dd cc cc
2 3 NaN ee ff ee ee
3 4 NaN NaN gg gg gg
Performance:
df = pd.concat([df] * 1000, ignore_index=True)
In [220]: %timeit df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')
100 loops, best of 3: 2.78 ms per loop
In [221]: %timeit df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')
100 loops, best of 3: 2.7 ms per loop
#jpp solution
In [222]: %%timeit
...: cols = df.iloc[:, 1:].T.apply(pd.Series.first_valid_index)
...:
...: df['result'] = [df.loc[i, cols[i]] for i in range(len(df.index))]
...:
1 loop, best of 3: 180 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ' s solution
In [223]: %timeit df['result'] = df.stack().groupby(level=0).first()
1 loop, best of 3: 606 ms per loop