Spark - repartition() vs coalesce()

前端未结
关注
 14  1967
误落风尘 2020-11-22 17:11
According to Learning Spark
Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of

      
      
        
          14条回答        

        
                    
            
            
                         
                
              
              
                
                   深忆病人
                                             
                
                
                (楼主)
            
              
              
                2020-11-22 17:29
              

            
            
                        
The repartition algorithm does a full shuffle of the data and creates equal sized partitions of data. coalesce combines existing partitions to avoid a full shuffle.

Coalesce works well for taking an RDD with a lot of partitions and combining partitions on a single worker node to produce a final RDD with less partitions.

Repartition will reshuffle the data in your RDD to produce the final number of partitions you request.
The partitioning of DataFrames seems like a low level implementation detail that should be managed by the framework, but it’s not. When filtering large DataFrames into smaller ones, you should almost always repartition the data.
You’ll probably be filtering large DataFrames into smaller ones frequently, so get used to repartitioning.

Read this blog post if you'd like even more details.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它14个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复