Understanding Spark's caching

前端未结

关注

 3  1788

天命终不由人 2020-12-02 08:41

I\'m trying to understand how Spark\'s cache work.

Here is my naive understanding, please let me know if I\'m missing something:

val rdd1 = sc.textF


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   夕颜
                                             
                
                
                (楼主)
            
              
              
                2020-12-02 09:13
              

            
            
                        
Option B is an optimal approach with small tweak-in. Make use of less expensive action methods. In the approach mentioned by your code, saveAsTextFile is an expensive operation, replace it by count method.

Idea here is to remove the big rdd1 from DAG, if it's not relevant for further computation (after rdd2 and rdd3 are created)

Updated approach from code

val rdd1 = sc.textFile("some data").cache()
val rdd2 = rdd1.filter(...).cache() 
val rdd3 = rdd1.map(...).cache()

rdd2.count
rdd3.count

rdd1.unpersist()

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复