Pandas Use Value if Not Null, Else Use Value From Next Column

后端 未结 4 1327
猫巷女王i
猫巷女王i 2020-12-25 12:23

Given the following dataframe:

import pandas as pd
df = pd.DataFrame({\'COL1\': [\'A\', np.nan,\'A\'], 
                   \'COL2\' : [np.nan,\'A\',\'A\']})
         


        
相关标签:
4条回答
  • 2020-12-25 12:50
    In [8]: df
    Out[8]:
      COL1 COL2
    0    A  NaN
    1  NaN    B
    2    A    B
    
    In [9]: df["COL3"] = df["COL1"].fillna(df["COL2"])
    
    In [10]: df
    Out[10]:
      COL1 COL2 COL3
    0    A  NaN    A
    1  NaN    B    B
    2    A    B    A
    
    0 讨论(0)
  • 2020-12-25 12:58

    You can use np.where to conditionally set column values.

    df = df.assign(COL3=np.where(df.COL1.isnull(), df.COL2, df.COL1))
    
    >>> df
      COL1 COL2 COL3
    0    A  NaN    A
    1  NaN    A    A
    2    A    A    A
    

    If you don't mind mutating the values in COL2, you can update them directly to get your desired result.

    df = pd.DataFrame({'COL1': ['A', np.nan,'A'], 
                       'COL2' : [np.nan,'B','B']})
    
    >>> df
      COL1 COL2
    0    A  NaN
    1  NaN    B
    2    A    B
    
    df.COL2.update(df.COL1)
    
    >>> df
      COL1 COL2
    0    A    A
    1  NaN    B
    2    A    A
    
    0 讨论(0)
  • 2020-12-25 13:04

    If we mod your df slightly then you will see that this works and in fact will work for any number of columns so long as there is a single valid value:

    In [5]:
    df = pd.DataFrame({'COL1': ['B', np.nan,'B'], 
                       'COL2' : [np.nan,'A','A']})
    df
    
    Out[5]:
      COL1 COL2
    0    B  NaN
    1  NaN    A
    2    B    A
    
    In [6]:    
    df.apply(lambda x: x[x.first_valid_index()], axis=1)
    
    Out[6]:
    0    B
    1    A
    2    B
    dtype: object
    

    first_valid_index will return the index value (in this case column) that contains the first non-NaN value:

    In [7]:
    df.apply(lambda x: x.first_valid_index(), axis=1)
    
    Out[7]:
    0    COL1
    1    COL2
    2    COL1
    dtype: object
    

    So we can use this to index into the series

    0 讨论(0)
  • 2020-12-25 13:16

    Using .combine_first, which gives precedence to non-null values in the Series or DataFrame calling it:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'COL1': ['A', np.nan,'A'], 
                       'COL2' : [np.nan,'B','B']})
    
    df['COL3'] = df.COL1.combine_first(df.COL2)
    

    Output:

      COL1 COL2 COL3
    0    A  NaN    A
    1  NaN    B    B
    2    A    B    A
    
    0 讨论(0)
提交回复
热议问题