发表新帖

发表新帖

What is the difference between spark checkpoint and persist to a disk

后端未结

关注

 4  1364

北海茫月 2020-11-29 16:29

What is the difference between spark checkpoint and persist to a disk. Are both these store in the local disk?

4条回答

爱一瞬间的悲伤 (楼主)

2020-11-29 17:00
I think you can find a very detailed answer here

While it is very hard to summarize all in that page, I will say

Persist
- Persisting or caching with StorageLevel.DISK_ONLY cause the generation of RDD to be computed and stored in a location such that subsequent use of that RDD will not go beyond that points in recomputing the linage.
- After persist is called, Spark still remembers the lineage of the RDD even though it doesn't call it.
- Secondly, after the application terminates, the cache is cleared or file destroyed
Checkpointing
- Checkpointing stores the rdd physically to hdfs and destroys the lineage that created it.
- The checkpoint file won't be deleted even after the Spark application terminated.
- Checkpoint files can be used in subsequent job run or driver program
- Checkpointing an RDD causes double computation because the operation will first call a cache before doing the actual job of computing and writing to the checkpoint directory.
You may want to read the article for more of the details or internals of Spark's checkpointing or Cache operations.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题