Amazon S3 file size limit is supposed to be 5T according to this announcement, but I am getting the following error when uploading a 5G file
\'/mahler%2Fparq
The trick usually seems to be figuring out how to tell S3 to do a multipart upload. For copying data from HDFS to S3, this can be done by using the s3n filesystem and specifically enabling multipart uploads with fs.s3n.multipart.uploads.enabled=true
This can be done like:
hdfs dfs -Dfs.s3n.awsAccessKeyId=ACCESS_KEY -Dfs.s3n.awsSecretAccessKey=SUPER_SECRET_KEY -Dfs.s3n.multipart.uploads.enabled=true -cp hdfs:///path/to/source/data s3n://bucket/folder/
And further configuration can be found here: https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html