Spark using python: How to resolve Stage x contains a task of very large size (xxx KB). The maximum recommended task size is 100 KB

后端未结

关注

 3  454

一生所求 2020-12-01 10:32

I\'ve just created python list of range(1,100000).

Using SparkContext done the following steps:

a = sc.parallelize([i for i in range(1,


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   轻奢々
                                             
                
                
                (楼主)
            
              
              
                2020-12-01 10:57
              

            
            
                        
Spark natively ships a copy of each variable over during the shipping of the task. For large sizes of such variables you may want to use Broadcast Variables

If you are still facing size problems, Then perhaps this data should be an RDD in itself

edit: Updated the link
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复