How to determine a dataframe size?
Right now I estimate the real size of a dataframe as follows:
headers_size = key for key in df.first().asDict() ro
Currently I am using the below approach, but not sure if this is the best way:
df.persist(StorageLevel.Memory) df.count()
On the spark-web UI under the Storage tab you can check the size which is displayed in MB's and then I do unpersist to clear the memory:
df.unpersist()