发表新帖

发表新帖

spark, scala & jdbc - how to limit number of records

前端未结

关注

 3  1777

春和景丽 2021-01-27 04:42

Is there a way to limit the number of records fetched from the jdbc source using spark sql 2.2.0?

I am dealing with a task of moving (and transforming) a large number of

3条回答

鱼传尺愫 (楼主)

2021-01-27 05:11
I have not tested this, but you should try using limit instead of take. take calls head under the covers which has the following note:

this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

whereas limit results in a LIMIT pushed into the sql query as it is a lazy evaluation:

The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset.

If you want the data without pulling it in first then you could even do something like:
```
...load.limit(limitNum).take(limitNum)
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题