发表新帖

发表新帖

How to force Spark to evaluate DataFrame operations inline

后端未结

关注

 2  928

情深已故 2020-12-07 01:48

According to the Spark RDD docs:

All transformations in Spark are lazy, in that they do not compute their results right away...This design enables Spa

2条回答

南方客 (楼主)

2020-12-07 01:50

I agree with you that at some point you want to do the action when YOU NEED IT. For .e.g if you are streaming data with Spark streaming, and you want to evaluate transformations done on every RDD, rather than accumulating transformations for every RDD, and all of a sudden run a action on this large set of data.

Now, lets say if you have a DataFrame, and you have done all transformations on it, then you can use sparkContext.sql("CACHE table ").

This cache is eager cache, this will trigger action on this DataFrame , and evaluate all transformations on this DataFrame.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题