Drop spark dataframe from cache

后端 未结 2 848
南旧
南旧 2020-12-23 21:02

I am using Spark 1.3.0 with python api. While transforming huge dataframes, I cache many DFs for faster execution;

df1.cache()
df2.cache()

相关标签:
2条回答
  • 2020-12-23 21:35

    just do the following:

    df1.unpersist()
    df2.unpersist()
    

    Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.

    0 讨论(0)
  • 2020-12-23 21:35

    If the dataframe registered as a table for SQL operations, like

    df.createGlobalTempView(tableName) // or some other way as per spark verision
    

    then the cache can be dropped with following commands, off-course spark also does it automatically

    Spark >= 2.x

    Here spark is an object of SparkSession

    • Drop a specific table/df from cache

       spark.catalog.uncacheTable(tableName)
      
    • Drop all tables/dfs from cache

       spark.catalog.clearCache()
      

    Spark <= 1.6.x

    • Drop a specific table/df from cache

       sqlContext.uncacheTable(tableName)
      
    • Drop all tables/dfs from cache

       sqlContext.clearCache()
      
    0 讨论(0)
提交回复
热议问题