How do I drop duplicates and keep the first value on pandas?

后端未结

关注

 3  716

野性不改 2021-01-27 11:24

I want to drop duplicates and keep the first value. The duplicates that want to be dropped is A = \'df\' .Here\'s my data

A   B   C   D   E
qw  1   3   1   1
er


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   無奈伤痛
                                             
                
                
                (楼主)
            
              
              
                2021-01-27 11:35
              

            
            
                        
Using cumcount()

import pandas as pd
import numpy as np
df['cum'] = df.groupby(['A']).cumcount()
df['cum2'] = np.append([0],np.diff(df.cum))
df.query("~((A == 'df') & (cum2 == 1))").drop(['cum','cum2'],axis=1)


df looks like:

In [6]: df
Out[6]: 
     A   B   C   D   E  cum
0   qw   1   3   1   1    0
1   er   2   4   2   6    0
2   ew   4   8  44   4    0
3   df  34  34  34  34    0
4   df   2   5   2   2    1
5   df   3   3   7   3    2
6   df   4   4   7   4    3
7   we   2   5   5   2    0
8   we   4   4   4   4    1
9   df  34   9  34  34    4
10  df   3   3   9   3    5
11  we   4   7   4   4    2
12  qw   2   2   7   2    1


np.diff

In [7]: df['cum2'] = np.append([0],np.diff(df.cum))

In [8]: df
Out[8]: 
     A   B   C   D   E  cum  cum2
0   qw   1   3   1   1    0     0
1   er   2   4   2   6    0     0
2   ew   4   8  44   4    0     0
3   df  34  34  34  34    0     0
4   df   2   5   2   2    1     1
5   df   3   3   7   3    2     1
6   df   4   4   7   4    3     1
7   we   2   5   5   2    0    -3
8   we   4   4   4   4    1     1
9   df  34   9  34  34    4     3
10  df   3   3   9   3    5     1
11  we   4   7   4   4    2    -3
12  qw   2   2   7   2    1    -1


output

In [12]: df.query("~((A == 'df') & (cum2 == 1))").drop(['cum','cum2'],axis=1)
Out[12]: 
     A   B   C   D   E
0   qw   1   3   1   1
1   er   2   4   2   6
2   ew   4   8  44   4
3   df  34  34  34  34
7   we   2   5   5   2
8   we   4   4   4   4
9   df  34   9  34  34
11  we   4   7   4   4
12  qw   2   2   7   2


reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.cumcount.html
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复