问题
Lets say i have the following:
val dataset2 = dataset1.persist(StorageLevel.MEMORY_AND_DISK)
val dataset3 = dataset2.map(.....)
If you do a transformation on the dataset2 then you have to persist it and pass it to dataset3 and unpersist the previous or not?
I am trying to figure out when to persist and unpersist RDDs. With every new rdd that is created do i have to persist it?
Thanks
回答1:
Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.
Refrence from: http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence
来源:https://stackoverflow.com/questions/33859915/when-to-persist-and-when-to-unpersist-rdd-in-spark