GroupBy and concat array columns pyspark

前端未结

关注

 5  566

挽巷 2021-01-31 20:27

I have this data frame

df = sc.parallelize([(1, [1, 2, 3]), (1, [4, 5, 6]) , (2,[2]),(2,[3])]).toDF([\"store\", \"values\"])

+-----+---------+
|store|   values|


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   情深已故
                                             
                
                
                (楼主)
            
              
              
                2021-01-31 20:48
              

            
            
                        
You need a flattening UDF; starting from your own df:



spark.version
# u'2.2.0'

from pyspark.sql import functions as F
import pyspark.sql.types as T

def fudf(val):
    return reduce (lambda x, y:x+y, val)

flattenUdf = F.udf(fudf, T.ArrayType(T.IntegerType()))

df2 = df.groupBy("store").agg(F.collect_list("values"))
df2.show(truncate=False)
# +-----+----------------------------------------------+ 
# |store|                         collect_list(values) | 
# +-----+----------------------------------------------+ 
# |1    |[WrappedArray(1, 2, 3), WrappedArray(4, 5, 6)]| 
# |2    |[WrappedArray(2), WrappedArray(3)]            | 
# +-----+----------------------------------------------+

df3 = df2.select("store", flattenUdf("collect_list(values)").alias("values"))
df3.show(truncate=False)
# +-----+------------------+
# |store|           values |
# +-----+------------------+
# |1    |[1, 2, 3, 4, 5, 6]|
# |2    |[2, 3]            |
# +-----+------------------+


UPDATE (after comment):

The above snippet will work only with Python 2. With Python 3, you should modify the UDF as follows:

import functools

def fudf(val):
    return functools.reduce(lambda x, y:x+y, val)


Tested with Spark 2.4.4.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复