Iterate over pandas dataframe columns containing nested arrays

前端未结

关注

 4  1733

I hope you can help me with this issue,

I\'ve this data below (Columns names whatever)

data=([[\'file0090\',
    ([[ 84,  55, 189],
   [248, 100,  18],


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2021-01-21 02:07
              
            
            
                                                                       
You can try this:-

data_f = [[i[0]]+j for i in data for j in i[1]]
df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3'])


Output:-

col0          col1  col2   col3 
file0090      84     55     189
file0090      248    100      1
file0090      68     115    88
file6565      86     58    189
file6565      24    10     118
file6565      68    11      8

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  广开言路        
                
              
                            
                2021-01-21 02:12
              
            
            
                                                                       
We can do explode with row the do it explode with column again 

s = pd.DataFrame(data).set_index(0)[1].explode()
df = pd.DataFrame(s.tolist(), index = s.index.values)

df
Out[396]: 
            0    1    2
file0090   84   55  189
file0090  248  100   18
file0090   68  115   88
file6565   86   58  189
file6565   24   10  118
file6565   68   11    8

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2021-01-21 02:14
              
            
            
                                                                       
You can create a custom function to output the correct form of data. 

from itertools import chain
def transform(d):
    for l in d:
        *x, y = l
        yield list(map(lambda s: x+s, y))

df = pd.DataFrame(chain(*transform(data)))
df
          0    1    2    3
0  file0090   84   55  189
1  file0090  248  100   18
2  file0090   68  115   88
3  file6565   86   58  189
4  file6565   24   10  118
5  file6565   68   11    8


Timeit results of all the solutions:

# YOBEN_S's answer
In [275]: %%timeit
     ...: s = pd.DataFrame(data).set_index(0)[1].explode()
     ...: df = pd.DataFrame(s.tolist(), index = s.index.values)
     ...:
     ...:
1.52 ms ± 59.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

#Anky's answer
In [276]: %%timeit
     ...: df = pd.DataFrame(data).add_prefix('col')
     ...: out = df.explode('col1').reset_index(drop=True)
     ...: out = out.join(pd.DataFrame(out.pop('col1').tolist()).add_prefix('col_'))
     ...:
     ...:
3.71 ms ± 606 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#Dhaval's answer
In [277]: %%timeit
     ...: data_f = []
     ...: for i in data:
     ...:     for j in i[1]:
     ...:         data_f.append([i[0]]+j)
     ...: df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3'])
     ...:
     ...:
712 µs ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

#My answer
In [280]: %%timeit
     ...: pd.DataFrame(chain(*transform(data)))
     ...:
     ...:
489 µs ± 8.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

#Using List comp of Dhaval's answer

In [306]: %%timeit
     ...: data_f = [[i[0]]+j for i in data for j in i[1]]
     ...: df = pd.DataFrame(data_f, columns =['col0','col1','col2','col3'])
     ...:
     ...:
586 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

#Anky's 2nd solution

In [308]: %%timeit
     ...: l = [*chain.from_iterable(data)]
     ...: pd.DataFrame(np.vstack(l[1::2]),index = np.repeat(l[::2],len(l[1])))
     ...:
     ...:
221 µs ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  面向向阳花        
                
              
                            
                2021-01-21 02:20
              
            
            
                                                                       
You can do explode with a join after crreating another df from the series of lists:

df = pd.DataFrame(data).add_prefix('col')

out = df.explode('col1').reset_index(drop=True)
out = out.join(pd.DataFrame(out.pop('col1').tolist()).add_prefix('col_'))


Adding another solution if the list structure is similar:

l = [*itertools.chain.from_iterable(data)]
pd.DataFrame(np.vstack(l[1::2]),index = np.repeat(l[::2],len(l[1])))




      col0  col_0  col_1  col_2
0  file0090     84     55    189
1  file0090    248    100     18
2  file0090     68    115     88
3  file6565     86     58    189
4  file6565     24     10    118
5  file6565     68     11      8

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复