Amazon Elastic MapReduce - mass insert from S3 to DynamoDB is incredibly slow

后端 未结 1 1331
旧时难觅i
旧时难觅i 2020-12-14 09:35

I need to perform an initial upload of roughly 130 million items (5+ Gb total) into a single DynamoDB table. After I faced problems with uploading them using the API from my

相关标签:
1条回答
  • 2020-12-14 10:12

    Here is the answer I finally got from AWS support recently. Hope that helps someone in a similar situation:

    EMR workers are currently implemented as single threaded workers, where each worker writes items one-by-one (using Put, not BatchWrite). Therefore, each write consumes 1 write capacity unit (IOP).

    This means that you are establishing a lot of connections which decreases performance to some degree. If BatchWrites were used, it would mean you could commit up to 25 rows in a single operation which would be less costly performance wise (but same price if I understand it right). This is something we are aware of and will probably implement in the future in EMR. We can't offer a timeline though.

    As stated before, the main problem here is that your table in DynamoDB is reaching the provisioned throughput so try to increase it temporarily for the import and then feel free to decrease it to whatever level you need.

    This may sound a bit convenient but there was a problem with the alerts when you were doing this which was why you never received an alert. The problem has been fixed since.

    0 讨论(0)
提交回复
热议问题