How to implement sql coalesce in pandas

后端 未结 5 1447
Happy的楠姐
Happy的楠姐 2020-12-10 11:54

I have a data frame like

df = pd.DataFrame({\"A\":[1,2,np.nan],\"B\":[np.nan,10,np.nan], \"C\":[5,10,7]})
     A     B   C
0  1.0   NaN   5
1  2.0  10.0  10         


        
5条回答
  •  春和景丽
    2020-12-10 12:38

    Another approach is to use the combine_first method of a pd.Series. Using your example df,

    >>> import pandas as pd
    >>> import numpy as np
    >>> df = pd.DataFrame({"A":[1,2,np.nan],"B":[np.nan,10,np.nan], "C":[5,10,7]})
    >>> df
         A     B   C
    0  1.0   NaN   5
    1  2.0  10.0  10
    2  NaN   NaN   7
    

    we have

    >>> df.A.combine_first(df.B).combine_first(df.C)
    0    1.0
    1    2.0
    2    7.0
    

    We can use reduce to abstract this pattern to work with an arbitrary number of columns.

    >>> from functools import reduce
    >>> cols = [df[c] for c in df.columns]
    >>> reduce(lambda acc, col: acc.combine_first(col), cols)
    0    1.0
    1    2.0
    2    7.0
    Name: A, dtype: float64
    

    Let's put this all together in a function.

    >>> def coalesce(*args):
    ...     return reduce(lambda acc, col: acc.combine_first(col), args)
    ...
    >>> coalesce(*cols)
    0    1.0
    1    2.0
    2    7.0
    Name: A, dtype: float64
    

提交回复
热议问题