I\'m building a generic function which receives a RDD and does some calculations on it. Since I run more than one calculation on the input RDD I would like to cache it. For
Nothing. If you call cache on a cached RDD, nothing happens, RDD will be cached (once). Caching, like many other transformations, is lazy:
cache, the RDD's storageLevel is set to MEMORY_ONLYcache again, it's set to the same value (no change)storageLevel and if it requires caching, it will cache it. So you're safe.