Would Spark unpersist the RDD itself when it realizes it won't be used anymore?

后端 未结 2 1868
粉色の甜心
粉色の甜心 2020-12-05 07:02

We can persist an RDD into memory and/or disk when we want to use it more than once. However, do we have to unpersist it ourselves later on, or does Spark does some kind of

2条回答
  •  醉酒成梦
    2020-12-05 07:25

    As pointed out by @Daniel, Spark will remove partitions from the cache. This will happen once there is no more memory available, and will be done using a least-recently-used algorithm. It is not a smart system, as pointed out by @eliasah.

    If you are not caching too many objects you don't have to worry about it. If you cache too many objects, the JVM collection times will become excessive, so it is a good idea to unpersist them in this case.

提交回复
热议问题