Apache Spark does not delete temporary directories

前端 未结 6 521
庸人自扰
庸人自扰 2020-11-27 15:48

After a spark program completes, there are 3 temporary directories remain in the temp directory. The directory names are like this: spark-2e389487-40cc-4a82-a5c7-353c0feefbb

6条回答
  •  失恋的感觉
    2020-11-27 16:33

    I don't know how to make Spark cleanup those temporary directories, but I was able to prevent the creation of the snappy-XXX files. This can be done in two ways:

    1. Disable compression. Properties: spark.broadcast.compress, spark.shuffle.compress, spark.shuffle.spill.compress. See http://spark.apache.org/docs/1.3.1/configuration.html#compression-and-serialization
    2. Use LZF as a compression codec. Spark uses native libraries for Snappy and lz4. And because of the way JNI works, Spark has to unpack these libraries before using them. LZF seems to be implemented natively in Java.

    I'm doing this during development, but for production it is probably better to use compression and have a script to clean up the temp directories.

提交回复
热议问题