问题
The package boto3
- Amazon's official AWS API wrapper for python - has great support for uploading items to DynamoDB in bulk. It looks like this:
db = boto3.resource("dynamodb", region_name = "my_region").Table("my_table")
with db.batch_writer() as batch:
for item in my_items:
batch.put_item(Item = item)
Here my_items
is a list of Python dictionaries each of which must have the table's primary key(s). The situation isn't perfect - for instance, there is no safety mechanism to prevent you from exceeding your throughput limits - but it's still pretty good.
However, there does not appear to be any counterpart for reading from the database. The closest I can find is DynamoDB.Client.batch_get_item()
, but here the API is extremely complicated. Here's what requesting two items looks like:
db_client = boto3.client("dynamodb", "my_region")
db_client.batch_get_item(
RequestItems = {
"my_table": {
"Keys": [
{"my_primary_key": {"S": "my_key1"}},
{"my_primary_key": {"S": "my_key2"}}
]
}
}
)
This might be tolerable, but the response has the same problem: all values are dictionaries whose keys are data types ("S"
for string, "N"
for number, "M"
for mapping, etc.) and it is more than a little annoying to have to parse everything. So my questions are:
Is there any native
boto3
support for batch reading from DynamoDb, similar to thebatch_writer
function above?
Failing that,
Does
boto3
provide any built-in way to automatically deserialize the responses to theDynamoDB.Client.batch_get_item()
function?
I'll also add that the function boto3.resource("dynamodb").Table().get_item()
has what I would consider to be the "correct" API, in that no type-parsing is necessary for inputs or outputs. So it seems that this is some sort of oversight by the developers, and I suppose I'm looking for a workaround.
回答1:
So thankfully there is something that you might find useful - much like the json
module which has json.dumps
and json.loads
, boto3 has a types module that includes a serializer and deserializer. See TypeSerializer/TypeDeserializer. If you look at the source code, the serialization/deserialization is recursive and should be perfect for your use case.
Note: Its recommended that you use Binary
/Decimal
instead of just using a regular old python float/int for round trip conversions.
serializer = TypeSerializer()
serializer.serialize('awesome') # returns {'S' : 'awesome' }
deser = TypeDeserializer()
deser.deserialize({'S' : 'awesome'}) # returns u'awesome'
Hopefully this helps!
回答2:
There's the service resource level batch_get_item. Maybe you could do something like that :
def batch_query_wrapper(table, key, values):
results = []
response = dynamo.batch_get_item(RequestItems={table: {'Keys': [{key: val} for val in values]}})
results.extend(response['Responses'][table])
while response['UnprocessedKeys']:
# Implement some kind of exponential back off here
response = dynamo.batch_get_item(RequestItems={table: {'Keys': [{key: val} for val in values]}})
results.extend(response['Response'][table])
return results
It will return your result as python objects.
回答3:
I find this to be an effective way to convert a Boto 3 DynamoDB item to a Python dict.
https://github.com/Alonreznik/dynamodb-json
来源:https://stackoverflow.com/questions/37872542/is-there-a-python-api-for-submitting-batch-get-requests-to-aws-dynamodb