Why is Spark saveAsTable with bucketBy creating thousands of files?

后端未结

关注

 3  583

野的像风 2020-12-24 02:38

Context

Spark 2.0.1, spark-submit in cluster mode. I am reading a parquet file from hdfs:

val spark = SparkSession.builder
      .ap


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   遥遥无期
                                             
                
                
                (楼主)
            
              
              
                2020-12-24 03:28
              

            
            
                        
In my mind also these questions poped up when I saw too many files so searched and found this 

"Unlike bucketing in Apache Hive, Spark SQL creates the bucket files per the number of buckets and partitions. In other words, the number of bucketing files is the number of buckets multiplied by the number of task writers (one per partition)."

Source: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-bucketing.html

I think this answers your question why this no. of files

Your question no. 2 can be answered like If we could manage no. of partitions by repartition, provided the resource available, we can limit the files created.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复