How to properly apply a lambda function into a pandas data frame column

前端 未结 2 424
孤街浪徒
孤街浪徒 2020-12-13 00:26

I have a pandas data frame, sample, with one of the columns called PR to which am applying a lambda function as follows:

sample[\'P         


        
相关标签:
2条回答
  • 2020-12-13 01:08

    You need mask:

    sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
    

    Another solution with loc and boolean indexing:

    sample.loc[sample['PR'] < 90, 'PR'] = np.nan
    

    Sample:

    import pandas as pd
    import numpy as np
    
    sample = pd.DataFrame({'PR':[10,100,40] })
    print (sample)
        PR
    0   10
    1  100
    2   40
    
    sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
    print (sample)
          PR
    0    NaN
    1  100.0
    2    NaN
    
    sample.loc[sample['PR'] < 90, 'PR'] = np.nan
    print (sample)
          PR
    0    NaN
    1  100.0
    2    NaN
    

    EDIT:

    Solution with apply:

    sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)
    

    Timings len(df)=300k:

    sample = pd.concat([sample]*100000).reset_index(drop=True)
    
    In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x)
    10 loops, best of 3: 102 ms per loop
    
    In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan)
    The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached.
    100 loops, best of 3: 3.71 ms per loop
    
    0 讨论(0)
  • 2020-12-13 01:18

    You need to add else in your lambda function. Because you are telling what to do in case your condition(here x < 90) is met, but you are not telling what to do in case the condition is not met.

    sample['PR'] = sample['PR'].apply(lambda x: 'NaN' if x < 90 else x) 
    
    0 讨论(0)
提交回复
热议问题