If I have an RDD that I no longer need, how do I delete it from memory? Would the following be enough to get this done:
del thisRDD
Thanks!
Short answer: it depends.
According to pyspark v.1.3.0 source code, del thisRDD should be enough for PipelinedRDD, which is an RDD generated by Python mapper/reducer:
class PipelinedRDD(RDD):
# ...
def __del__(self):
if self._broadcast:
self._broadcast.unpersist()
self._broadcast = None
RDD class on the other hand, doesn't have __del__ method (while it probably should), so you should call unpersist method on your own.
Edit: __del__ method was deleted in this commit.