I\'m writing to see if anyone knows how to speed up S3 write times from Spark running in EMR?
My Spark Job takes over 4 hours to complete, however the cluster is onl
We experienced the same on Azure using Spark on WASB. We finally decided to not use the distrbitued storage directly with spark. We did spark.write to a real hdfs:// destination and develop a specific tool that do : hadoop copyFromLocal hdfs:// wasb:// The HDFS is then our temporary buffer before archiving to WASB (or S3).