While fetching data from SQL Server via a JDBC connection in Spark, I found that I can set some parallelization parameters like partitionColumn
, lowerBoun
Actually the list above misses a couple of things, specifically the first and the last query.
Without them you would loose some data (the data before the lowerBound
and that after upperBound
). From the example is not clear because the lower bound is 0.
The complete list should be:
SELECT * FROM table WHERE partitionColumn < 100
SELECT * FROM table WHERE partitionColumn BETWEEN 0 AND 100
SELECT * FROM table WHERE partitionColumn BETWEEN 100 AND 200
...
SELECT * FROM table WHERE partitionColumn > 9000