Comparing previous row values in Pandas DataFrame

后端 未结 4 490
野趣味
野趣味 2020-11-27 13:48
import pandas as pd
data={\'col1\':[1,3,3,1,2,3,2,2]}
df=pd.DataFrame(data,columns=[\'col1\'])
print df


         col1  
    0     1          
    1     3                   


        
4条回答
  •  没有蜡笔的小新
    2020-11-27 14:36

    Here's a NumPy arrays based approach using slicing that lets us use the views into the input array for efficiency purposes -

    def comp_prev(a):
        return np.concatenate(([False],a[1:] == a[:-1]))
    
    df['match'] = comp_prev(df.col1.values)
    

    Sample run -

    In [48]: df['match'] = comp_prev(df.col1.values)
    
    In [49]: df
    Out[49]: 
       col1  match
    0     1  False
    1     3  False
    2     3   True
    3     1  False
    4     2  False
    5     3  False
    6     2  False
    7     2   True
    

    Runtime test -

    In [56]: data={'col1':[1,3,3,1,2,3,2,2]}
        ...: df0=pd.DataFrame(data,columns=['col1'])
        ...: 
    
    #@jezrael's soln1
    In [57]: df = pd.concat([df0]*10000).reset_index(drop=True)
    
    In [58]: %timeit df['match'] = df.col1 == df.col1.shift() 
    1000 loops, best of 3: 1.53 ms per loop
    
    #@jezrael's soln2
    In [59]: df = pd.concat([df0]*10000).reset_index(drop=True)
    
    In [60]: %timeit df['match'] = df.col1.eq(df.col1.shift())
    1000 loops, best of 3: 1.49 ms per loop
    
    #@Nickil Maveli's soln1   
    In [61]: df = pd.concat([df0]*10000).reset_index(drop=True)
    
    In [64]: %timeit df['match'] = df['col1'].diff().eq(0) 
    1000 loops, best of 3: 1.02 ms per loop
    
    #@Nickil Maveli's soln2
    In [65]: df = pd.concat([df0]*10000).reset_index(drop=True)
    
    In [66]: %timeit df['match'] = np.ediff1d(df['col1'].values, to_begin=np.NaN) == 0
    1000 loops, best of 3: 1.52 ms per loop
    
    # Posted approach in this post
    In [67]: df = pd.concat([df0]*10000).reset_index(drop=True)
    
    In [68]: %timeit df['match'] = comp_prev(df.col1.values)
    1000 loops, best of 3: 376 µs per loop
    

提交回复
热议问题