When to persist and when to unpersist RDD in Spark

问题

Lets say i have the following:

 val dataset2 = dataset1.persist(StorageLevel.MEMORY_AND_DISK) 
 val dataset3 = dataset2.map(.....)

If you do a transformation on the dataset2 then you have to persist it and pass it to dataset3 and unpersist the previous or not?

I am trying to figure out when to persist and unpersist RDDs. With every new rdd that is created do i have to persist it?

Thanks

回答1:

Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.

Refrence from: http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence

来源：https://stackoverflow.com/questions/33859915/when-to-persist-and-when-to-unpersist-rdd-in-spark

标签

scala

Hadoop

apache-spark

rdd

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!