What is the difference between cache and persist?
In terms of RDD persistence, what are the differences between cache() and persist() in spark ? ahars With cache() , you use only the default storage level MEMORY_ONLY . With persist() , you can specify which storage level you want,( rdd-persistence ). From the official docs: You can mark an RDD to be persisted using the persist () or cache () methods on it. each persisted RDD can be stored using a different storage level The cache () method is a shorthand for using the default storage level, which is StorageLevel.MEMORY_ONLY (store deserialized objects in memory). Use persist() if you want to