Using Spark to write a parquet file to s3 over s3a is very slow
问题 I'm trying to write a parquet file out to Amazon S3 using Spark 1.6.1 . The small parquet that I'm generating is ~2GB once written so it's not that much data. I'm trying to prove Spark out as a platform that I can use. Basically what I'm going is setting up a star schema with dataframes , then I'm going to write those tables out to parquet. The data comes in from csv files provided by a vendor and I'm using Spark as an ETL platform. I currently have a 3 node cluster in ec2(r3.2xlarge) So