Is there a way to limit the number of records fetched from the jdbc source using spark sql 2.2.0?
I am dealing with a task of moving (and transforming) a large number of
I have not tested this, but you should try using limit instead of take. take calls head under the covers which has the following note:
this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.
whereas limit results in a LIMIT pushed into the sql query as it is a lazy evaluation:
The difference between this function and
headis thatheadis an action and returns an array (by triggering query execution) whilelimitreturns a new Dataset.
If you want the data without pulling it in first then you could even do something like:
...load.limit(limitNum).take(limitNum)