What is the meaning of partitionColumn, lowerBound, upperBound, numPartitions parameters?

后端未结
关注
 4  1801
名媛妹妹 2020-11-27 05:05
While fetching data from SQL Server via a JDBC connection in Spark, I found that I can set some parallelization parameters like partitionColumn, lowerBoun

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   清酒与你
                                             
                
                
                (楼主)
            
              
              
                2020-11-27 05:31
              

            
            
                        
Creating partitions doesn't result in loss of data due to filtering. 
The upperBound, lowerbound along with numPartitions just defines how the partitions are to be created. The upperBound and lowerbound don't define the range (filter) for the values of the partitionColumn to be fetched.

For a given input of lowerBound (l), upperBound (u) and numPartitions (n) 
The partitions are created as follows:

stride, s= (u-l)/n

**SELECT * FROM table WHERE partitionColumn < l+s or partitionColumn is null**
SELECT * FROM table WHERE partitionColumn >= l+s AND <2s  
SELECT * FROM table WHERE partitionColumn >= l+2s AND <3s
...
**SELECT * FROM table WHERE partitionColumn >= l+(n-1)s**


For instance, for upperBound = 500, lowerBound = 0 and numPartitions = 5. The partitions will be as per the following queries:

SELECT * FROM table WHERE partitionColumn < 100 or partitionColumn is null
SELECT * FROM table WHERE partitionColumn >= 100 AND <200 
SELECT * FROM table WHERE partitionColumn >= 200 AND <300
SELECT * FROM table WHERE partitionColumn >= 300 AND <400
...
SELECT * FROM table WHERE partitionColumn >= 400


Depending on the actual range of values of the partitionColumn, the result size of each partition will vary.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复