Spark broadcast error: exceeds spark.akka.frameSize Consider using broadcast

前端未结

关注

 1  1840

I have a large data called \"edges\"

org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[(String, Int)]] = MappedRDD[27] at map at :52


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  感动是毒        
                
              
                            
                2020-12-19 13:30
              
            
            
                                                                       
The "Consider using broadcast variables for large values" error message usually indicates that you've captured some large variables in function closures.  For example, you might have written something like

val someBigObject = ...
rdd.mapPartitions { x => doSomething(someBigObject, x) }.count()


which causes someBigObject to be captured and serialized with your task.  If you're doing something like that, you can use a broadcast variable instead, which will cause only a reference to the object to be stored in the task itself, while the actual object data will be sent separately.

In Spark 1.1.0+, it isn't strictly necessary to use broadcast variables for this, since tasks will automatically be broadcast (see SPARK-2521 for more details).  There are still reasons to use broadcast variables (such as sharing a big object across multiple actions / jobs), but you won't need to use it to avoid frame size errors.

Another option is to increase the Akka frame size.  In any Spark version, you should be able to set the spark.akka.frameSize setting in SparkConf prior to creating your SparkContext.  As you may have noticed, though, this is a little harder in spark-shell, where the context is created for you.  In newer versions of Spark (1.1.0 and higher), you can pass --conf spark.akka.frameSize=16 when launching spark-shell.  In Spark 1.0.1 or 1.0.2, you should be able to pass --driver-java-options "-Dspark.akka.frameSize=16" instead.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复