I have a recursive spark algorithm that applies a sliding window of 10 days to a Dataset.
The original dataset is loaded from a Hive table partitioned by date.
Check pointing and converting back to RDD are indeed the best/only ways to truncate lineage.
Many (all?) of the Spark ML Dataset/DataFrame algorithms are actually implemented using RDDs, but the APIs exposed are DS/DF due to the optimizer not being parallelized and lineage size from iterative/recursive implementations.
There is a cost to converting to and from RDD, but smaller than the file system checkpointing option.