How to unpersist in Sparklyr?

问题

I am using Sparklyr for a project and have understood that persisting is very useful. I am using sdf_persist for this, with the following syntax (correct me if I am wrong):

data_frame <- sdf_persist(data_frame)

Now I am reaching a point where I have too many RDDs stored in memory, so I need to unpersist some. However I cannot seem to find the function to do this in Sparklyr. Note that I have tried:

dplyr::db_drop_table(sc, "data_frame")
dplyr::db_drop_table(sc, data_frame)
unpersist(data_frame)
sdf_unpersist(data_frame)

But none of those work.

Also, I am trying to avoid using tbl_cache (in which case it seems that db_drop_table works) as it seems that sdf_persist offers more liberty on the storage level. It might be that I am missing the big picture of how to use persistence here, in which case, I'll be happy to learn more.

回答1:

If you don't care about granularity then the simplest solution is to invoke Catalog.clearCache:

spark_session(sc) %>% invoke("catalog") %>% invoke("clearCache")

Uncaching specific object is much less straightforward due to sparklyr indirection. If you check the object returned by sdf_cache you'll see that the persisted table is not exposed directly:

df <- copy_to(sc, iris, memory=FALSE, overwrite=TRUE) %>% sdf_persist()

spark_dataframe(df) %>% 
  invoke("storageLevel") %>% 
  invoke("equals", invoke_static(sc, "org.apache.spark.storage.StorageLevel", "NONE"))

[1] TRUE

That's beacuase you don't get registered table directly, but rather a result of subquery like SELECT * FROM ....

It means you cannot simply call unpersist:

spark_dataframe(df) %>% invoke("unpersist")

as you would in one of the official API's.

Instead you can try to retrieve the name of the source table, for example like this

src_name <- as.character(df$ops$x)

and then invoke Catalog.uncacheTable:

spark_session(sc) %>% invoke("catalog") %>% invoke("uncacheTable", src_name)

That is likely not the most robust solution, so please use with caution.

来源：https://stackoverflow.com/questions/56342887/how-to-unpersist-in-sparklyr

标签

apache-spark

persistence

sparklyr