Once in 2 hours, spark job is running to convert some tgz files to parquet. The job appends the new data into an existing parquet in s3:
df.write.mode(\"appe
I resolved this issue by writing the dataframe to EMR HDFS and then using s3-dist-cp uploading the parquets to S3