How to delete an RDD in PySpark for the purpose of releasing resources?

前端 未结 4 903
别跟我提以往
别跟我提以往 2020-12-31 00:36

If I have an RDD that I no longer need, how do I delete it from memory? Would the following be enough to get this done:

del thisRDD

Thanks!

4条回答
  •  醉话见心
    2020-12-31 01:13

    Short answer: it depends.

    According to pyspark v.1.3.0 source code, del thisRDD should be enough for PipelinedRDD, which is an RDD generated by Python mapper/reducer:

    class PipelinedRDD(RDD):
        # ...
        def __del__(self):
            if self._broadcast:
                self._broadcast.unpersist()
                self._broadcast = None
    

    RDD class on the other hand, doesn't have __del__ method (while it probably should), so you should call unpersist method on your own.

    Edit: __del__ method was deleted in this commit.

提交回复
热议问题