How is the behavior of memory_only and memory_and_disk caching level in spark differ?
As explained in the documentation, Persistence levels in terms of efficiency:
Level Space used CPU time In memory On disk Serialized ------------------------------------------------------------------------- MEMORY_ONLY High Low Y N N MEMORY_ONLY_SER Low High Y N Y MEMORY_AND_DISK High Medium Some Some Some MEMORY_AND_DISK_SER Low High Some Some Y DISK_ONLY Low High N Y Y
MEMORY_AND_DISK and MEMORY_AND_DISK_SER spill to disk if there is too much data to fit in memory.