Spark dataframe: collect () vs select ()

前端未结
关注
 6  485
情话喂你 2020-12-13 06:33
Calling collect() on an RDD will return the entire dataset to the driver which can cause out of memory and we should avoid that.
Will collect()

      
      
        
          6条回答        

        
                    
            
            
                         
                
              
              
                
                   感动是毒
                                             
                
                
                (楼主)
            
              
              
                2020-12-13 07:08
              

            
            
                        
To answer the questions directly:


  Will collect() behave the same way if called on a dataframe?


Yes, spark.DataFrame.collect is functionally the same as spark.RDD.collect. They serve the same purpose on these different objects.


  What about the select() method?


There is no such thing as spark.RDD.select, so it cannot be the same as spark.DataFrame.select.


  Does it also work the same way as collect() if called on a dataframe?


The only thing that is similar between select and collect is that they are both functions on a DataFrame. They have absolutely zero overlap in functionality.

Here's my own description: collect is the opposite of sc.parallelize. select is the same as the SELECT in any SQL statement.

If you are still having trouble understanding what collect actually does (for either RDD or DataFrame), then you need to look up some articles about what spark is doing behind the scenes. e.g.: 


https://dzone.com/articles/how-spark-internally-executes-a-program 
https://data-flair.training/blogs/spark-rdd-operations-transformations-actions/

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它6个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复