pyspark's flatMap in pandas

前端未结

关注

 3  1297

南旧 2021-02-04 12:06

Is there an operation in pandas that does the same as flatMap in pyspark?

flatMap example:

>>> rdd = sc.parallelize([2, 3, 4])
>>> sort


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   自闭症患者
                                             
                
                
                (楼主)
            
              
              
                2021-02-04 12:59
              

            
            
                        
I suspect that the answer is "no, not efficiently."

Pandas isn't built for nested data like this.  I suspect that the case you're considering in Pandas looks a bit like the following:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': [[1, 2], [3, 4, 5]]})

In [3]: df
Out[3]: 
           x
0     [1, 2]
1  [3, 4, 5]


And that you want something like the following

    x
0   1
0   2
1   3
1   4
1   5


It is far more typical to normalize your data in Python before you send it to Pandas.  If Pandas did do this then it would probably only be able to operate at slow Python speeds rather than fast C speeds.

Generally one does a bit of munging of data before one uses tabular computation.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复