发表新帖

发表新帖

Extremely slow S3 write times from EMR/ Spark

前端未结

关注

 6  1121

梦如初夏 2020-12-23 12:20

I\'m writing to see if anyone knows how to speed up S3 write times from Spark running in EMR?

My Spark Job takes over 4 hours to complete, however the cluster is onl

6条回答

感情败类 (楼主)

2020-12-23 12:38

We experienced the same on Azure using Spark on WASB. We finally decided to not use the distrbitued storage directly with spark. We did spark.write to a real hdfs:// destination and develop a specific tool that do : hadoop copyFromLocal hdfs:// wasb:// The HDFS is then our temporary buffer before archiving to WASB (or S3).

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题