What is the meaning of partitionColumn, lowerBound, upperBound, numPartitions parameters?

后端 未结 4 1799
名媛妹妹
名媛妹妹 2020-11-27 05:05

While fetching data from SQL Server via a JDBC connection in Spark, I found that I can set some parallelization parameters like partitionColumn, lowerBoun

4条回答
  •  無奈伤痛
    2020-11-27 05:33

    Actually the list above misses a couple of things, specifically the first and the last query.

    Without them you would loose some data (the data before the lowerBound and that after upperBound). From the example is not clear because the lower bound is 0.

    The complete list should be:

    SELECT * FROM table WHERE partitionColumn < 100
    
    SELECT * FROM table WHERE partitionColumn BETWEEN 0 AND 100  
    SELECT * FROM table WHERE partitionColumn BETWEEN 100 AND 200  
    

    ...

    SELECT * FROM table WHERE partitionColumn > 9000
    

提交回复
热议问题