Extremely slow S3 write times from EMR/ Spark

前端 未结 6 1121
梦如初夏
梦如初夏 2020-12-23 12:20

I\'m writing to see if anyone knows how to speed up S3 write times from Spark running in EMR?

My Spark Job takes over 4 hours to complete, however the cluster is onl

6条回答
  •  感情败类
    2020-12-23 12:38

    We experienced the same on Azure using Spark on WASB. We finally decided to not use the distrbitued storage directly with spark. We did spark.write to a real hdfs:// destination and develop a specific tool that do : hadoop copyFromLocal hdfs:// wasb:// The HDFS is then our temporary buffer before archiving to WASB (or S3).

提交回复
热议问题