I am a spark application with several points where I would like to persist the current state. This is usually after a large step, or caching a state that I would like to use
Spark 2.x
You can use Catalog.clearCache
:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate
...
spark.catalog.clearCache()
Spark 1.x
You can use SQLContext.clearCache method which
Removes all cached tables from the in-memory cache.
from pyspark.sql import SQLContext
from pyspark import SparkContext
sqlContext = SQLContext.getOrCreate(SparkContext.getOrCreate())
...
sqlContext.clearCache()