What is the meaning of partitionColumn, lowerBound, upperBound, numPartitions parameters?

后端 未结 4 1830
名媛妹妹
名媛妹妹 2020-11-27 05:05

While fetching data from SQL Server via a JDBC connection in Spark, I found that I can set some parallelization parameters like partitionColumn, lowerBoun

4条回答
  •  失恋的感觉
    2020-11-27 05:12

    Would just like to add to the verified answer since the words,

    Without them you would loose some data is misleading..

    From the documentation, Notice that lowerBound and upperBound are just used to decide the partition stride, not for filtering the rows in table. So all rows in the table will be partitioned and returned. This option applies only to reading.

    Which means say your table has a 1100 rows, and you specify

    lowerBound 0

    upperBound 1000 and

    numPartitions: 10 , you won't loose the 1000 to 1100 rows. You'll just end up with some of the partitions having more rows than intended instead.(the stride value is 100).

提交回复
热议问题