Spark Dataframe.cache() behavior for changing source
问题 My use case: Create a dataframe from a cassandra table. Create a output dataframe by filtering on a column and modify that column's value. Write the output dataframe to cassandra with a TTL set, so all the modified records are deleted after a short period (2s) Return the output dataframe to a caller that writes it to filesystem after some time. I can only return a dataframe to the caller and I don't have further control. Also, i can't increase the TTL. By the time, step 4 is executed, the