Using reduceByKey in Apache Spark (Scala)

前端未结

关注

 3  1664

囚心锁ツ 2020-12-24 02:23

I have a list of Tuples of type : (user id, name, count).

For example,

val x = sc.parallelize(List(
    (\"a\", \"b\", 1),
    (\"a\", \"b\", 1),


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   生来不讨喜
                                             
                
                
                (楼主)
            
              
              
                2020-12-24 03:13
              

            
            
                        
Following your code:

val byKey = x.map({case (id,uri,count) => (id,uri)->count})


You could do:

val reducedByKey = byKey.reduceByKey(_ + _)

scala> reducedByKey.collect.foreach(println)
((a,d),1)
((a,b),2)
((c,b),1)


PairRDDFunctions[K,V].reduceByKey takes an associative reduce function that can be applied to the to type V of the RDD[(K,V)]. In other words, you need a function f[V](e1:V, e2:V) : V . In this particular case with sum on Ints: (x:Int, y:Int) => x+y or _ + _ in short underscore notation.

For the record: reduceByKey performs better than groupByKey because it attemps to apply the reduce function locally before the shuffle/reduce phase. groupByKey will force a  shuffle of all elements before grouping.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复