I\'m looking for a way to checkpoint DataFrames. Checkpoint is currently an operation on RDD but I can\'t find how to do it with DataFrames. persist and cache (which are synon
As of spark 2.1, dataframe has a checkpoint method (see http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset) you can use directly, no need to go through RDD.