发表新帖

发表新帖

Drop spark dataframe from cache

后端未结

关注

 2  848

I am using Spark 1.3.0 with python api. While transforming huge dataframes, I cache many DFs for faster execution;

df1.cache()
df2.cache()

相关标签:

2条回答

长情又很酷

2020-12-23 21:35
just do the following:
```
df1.unpersist()
df2.unpersist()
```
Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2020-12-23 21:35
If the dataframe registered as a table for SQL operations, like
```
df.createGlobalTempView(tableName) // or some other way as per spark verision
```
then the cache can be dropped with following commands, off-course spark also does it automatically

Spark >= 2.x

Here spark is an object of SparkSession
- Drop a specific table/df from cache
```
 spark.catalog.uncacheTable(tableName)
```
- Drop all tables/dfs from cache
```
 spark.catalog.clearCache()
```
Spark <= 1.6.x
- Drop a specific table/df from cache
```
 sqlContext.uncacheTable(tableName)
```
- Drop all tables/dfs from cache
```
 sqlContext.clearCache()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题