Pandas iterate over DataFrame row pairs

前端未结

关注

 4  2001

How can I iterate over pairs of rows of a Pandas DataFrame?

For example:

content = [(1,2,[1,3]),(3,4,[2,4]),(5,6,[6,9]),(7,8,[9,10])]
df = pd.DataFra


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2020-12-10 21:48
              
            
            
                                                                       
To get the output you've shown use:

for row in df.index[:-1]:
    print 'row 1:'
    print df.iloc[row].squeeze()
    print 'row 2:'
    print df.iloc[row+1].squeeze()
    print

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2020-12-10 21:58
              
            
            
                                                                       
shift the dataframe & concat it back to the original using axis=1 so that each interval & the next interval are in the same row

df_merged = pd.concat([df, df.shift(-1).add_prefix('next_')], axis=1)
df_merged
#Out:
   a  b interval     next_a     next_b    next_interval
0  1  2   [1, 3]        3.0        4.0           [2, 4]
1  3  4   [2, 4]        5.0        6.0           [6, 9]
2  5  6   [6, 9]        7.0        8.0          [9, 10]
3  7  8  [9, 10]        NaN        NaN              NaN


define an intersects function that works with your lists representation & apply on the merged data frame ignoring the last row where the shifted_interval is null

def intersects(left, right):
    return left[1] > right[0]

df_merged[:-1].apply(lambda x: intersects(x.interval, x.next_interval), axis=1)
#Out:
0     True
1    False
2    False
dtype: bool

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长发绾君心        
                
              
                            
                2020-12-10 22:01
              
            
            
                                                                       
You could try the iloc indexing. 

Exmaple:

for i in range(df.shape[0] - 1):                        
    idx1,idx2=i,i+1                         
    row1,row2=df.iloc[idx1],df.iloc[idx2]   
    print(row1)                             
    print(row2)                             
    print()                                                                            

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2020-12-10 22:06
              
            
            
                                                                       
If you want to keep the loop for, using zip and iterrows could be a way

for (indx1,row1),(indx2,row2) in zip(df[:-1].iterrows(),df[1:].iterrows()):
    print "row1:\n", row1
    print "row2:\n", row2
    print "\n"


To access the next row at the same time, start the second iterrow one row after with df[1:].iterrows(). and you get the output the way you want.

row1:
a    1
b    2
Name: 0, dtype: int64
row2:
a    3
b    4
Name: 1, dtype: int64


row1:
a    3
b    4
Name: 1, dtype: int64
row2:
a    5
b    6
Name: 2, dtype: int64


row1:
a    5
b    6
Name: 2, dtype: int64
row2:
a    7
b    8
Name: 3, dtype: int64


But as said @RafaelC, doing for loop might not be the best method for your general problem.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复