how to compare two columns in pandas to make a third column ?

前端 未结 3 1183
甜味超标
甜味超标 2020-12-17 17:22

i have two columns age and sex in a pandas dataframe

sex = [\'m\', \'f\' , \'m\', \'f\', \'f\', \'f\', \'f\']
age = [16 ,  15 , 14 , 9  , 8   , 2   , 56 ]
<         


        
相关标签:
3条回答
  • 2020-12-17 17:30

    You could use pandas.DataFrame.where. For example

    child.where(age<=9, sex)
    
    0 讨论(0)
  • 2020-12-17 17:30
    df = pd.DataFrame({'sex':['m', 'f' , 'm', 'f', 'f', 'f', 'f'],
        'age':[16, 15, 14, 9, 8, 2, 56]})
    df['yes'] = df.apply(lambda x: 'child' if x['age'] <= 9 else x['sex'], axis=1)
    

    Result:

       age sex    yes
    0   16   m      m
    1   15   f      f
    2   14   m      m
    3    9   f  child
    4    8   f  child
    5    2   f  child
    6   56   f      f
    
    0 讨论(0)
  • 2020-12-17 17:50

    Use numpy.where:

    df['col3'] = np.where(df['age'] <= 9, 'child', df['sex'])
    

    The resulting output:

       age sex   col3
    0   16   m      m
    1   15   f      f
    2   14   m      m
    3    9   f  child
    4    8   f  child
    5    2   f  child
    6   56   f      f
    

    Timings

    Using the following setup to get a larger sample DataFrame:

    np.random.seed([3,1415])
    n = 10**5
    df = pd.DataFrame({'sex': np.random.choice(['m', 'f'], size=n), 'age': np.random.randint(0, 100, size=n)})
    

    I get the following timings:

    %timeit np.where(df['age'] <= 9, 'child', df['sex'])
    1000 loops, best of 3: 1.26 ms per loop
    
    %timeit df['sex'].where(df['age'] > 9, 'child')
    100 loops, best of 3: 3.25 ms per loop
    
    %timeit df.apply(lambda x: 'child' if x['age'] <= 9 else x['sex'], axis=1)
    100 loops, best of 3: 3.92 ms per loop
    
    0 讨论(0)
提交回复
热议问题