How to estimate dataframe real size in pyspark?

前端 未结 2 1310
夕颜
夕颜 2020-12-05 00:55

How to determine a dataframe size?

Right now I estimate the real size of a dataframe as follows:

headers_size = key for key in df.first().asDict()
ro         


        
2条回答
  •  囚心锁ツ
    2020-12-05 01:43

    Currently I am using the below approach, but not sure if this is the best way:

    df.persist(StorageLevel.Memory)
    df.count()
    

    On the spark-web UI under the Storage tab you can check the size which is displayed in MB's and then I do unpersist to clear the memory:

    df.unpersist()
    

提交回复
热议问题