Spark: Difference between Shuffle Write, Shuffle spill (memory), Shuffle spill (disk)?

前端未结

关注

 3  683

暖寄归人 2021-02-01 17:29

I have the following spark job, trying to keep everything in memory:

val myOutRDD = myInRDD.flatMap { fp =>
  val tuple2List: ListBuffer[(String, myClass)] =


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   花落未央
                                             
                
                
                (楼主)
            
              
              
                2021-02-01 18:10
              

            
            
                        
The persist calls in your code are entirely wasted if you don't access the RDD multiple times. What's the point of storing something if you never access it? Caching has no bearing on shuffle behavior other than you can avoid re-doing shuffles by keeping their output cached.

Shuffle spill is controlled by the spark.shuffle.spill and spark.shuffle.memoryFraction configuration parameters. If spill is enabled (it is by default) then shuffle files will spill to disk if they start using more than given by memoryFraction (20% by default).

The metrics are very confusing. My reading of the code is that "Shuffle spill (memory)" is the amount of memory that was freed up as things were spilled to disk. The code for "Shuffle spill (disk)" looks like it's the amount actually written to disk. By the code for "Shuffle write" I think it's the amount written to disk directly — not as a spill from a sorter.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复