According to the Spark RDD docs:
All transformations in Spark are lazy, in that they do not compute their results right away...This design enables Spa
I agree with you that at some point you want to do the action when YOU NEED IT. For .e.g if you are streaming data with Spark streaming, and you want to evaluate transformations done on every RDD, rather than accumulating transformations for every RDD, and all of a sudden run a action on this large set of data.
Now, lets say if you have a DataFrame, and you have done all transformations on it, then you can use sparkContext.sql("CACHE table
.
This cache is eager cache, this will trigger action on this DataFrame , and evaluate all transformations on this DataFrame.