How to force Spark to evaluate DataFrame operations inline

后端 未结 2 928
情深已故
情深已故 2020-12-07 01:48

According to the Spark RDD docs:

All transformations in Spark are lazy, in that they do not compute their results right away...This design enables Spa

2条回答
  •  南方客
    南方客 (楼主)
    2020-12-07 01:50

    I agree with you that at some point you want to do the action when YOU NEED IT. For .e.g if you are streaming data with Spark streaming, and you want to evaluate transformations done on every RDD, rather than accumulating transformations for every RDD, and all of a sudden run a action on this large set of data.

    Now, lets say if you have a DataFrame, and you have done all transformations on it, then you can use sparkContext.sql("CACHE table ").

    This cache is eager cache, this will trigger action on this DataFrame , and evaluate all transformations on this DataFrame.

提交回复
热议问题