Is there a Python API for submitting batch get requests to AWS DynamoDB?

问题

The package boto3 - Amazon's official AWS API wrapper for python - has great support for uploading items to DynamoDB in bulk. It looks like this:

db = boto3.resource("dynamodb", region_name = "my_region").Table("my_table")

with db.batch_writer() as batch:
    for item in my_items:
        batch.put_item(Item = item)

Here my_items is a list of Python dictionaries each of which must have the table's primary key(s). The situation isn't perfect - for instance, there is no safety mechanism to prevent you from exceeding your throughput limits - but it's still pretty good.

However, there does not appear to be any counterpart for reading from the database. The closest I can find is DynamoDB.Client.batch_get_item(), but here the API is extremely complicated. Here's what requesting two items looks like:

db_client = boto3.client("dynamodb", "my_region")

db_client.batch_get_item(
    RequestItems = {
        "my_table": {
            "Keys": [
                {"my_primary_key": {"S": "my_key1"}},
                {"my_primary_key": {"S": "my_key2"}}
            ]
        }
    }
)

This might be tolerable, but the response has the same problem: all values are dictionaries whose keys are data types ("S" for string, "N" for number, "M" for mapping, etc.) and it is more than a little annoying to have to parse everything. So my questions are:

Is there any native boto3 support for batch reading from DynamoDb, similar to the batch_writer function above?

Failing that,

Does boto3 provide any built-in way to automatically deserialize the responses to the DynamoDB.Client.batch_get_item() function?

I'll also add that the function boto3.resource("dynamodb").Table().get_item() has what I would consider to be the "correct" API, in that no type-parsing is necessary for inputs or outputs. So it seems that this is some sort of oversight by the developers, and I suppose I'm looking for a workaround.

回答1:

So thankfully there is something that you might find useful - much like the json module which has json.dumps and json.loads, boto3 has a types module that includes a serializer and deserializer. See TypeSerializer/TypeDeserializer. If you look at the source code, the serialization/deserialization is recursive and should be perfect for your use case.

Note: Its recommended that you use Binary/Decimal instead of just using a regular old python float/int for round trip conversions.

serializer = TypeSerializer()
serializer.serialize('awesome') # returns {'S' : 'awesome' }

deser = TypeDeserializer()
deser.deserialize({'S' : 'awesome'}) # returns u'awesome'

Hopefully this helps!

回答2:

There's the service resource level batch_get_item. Maybe you could do something like that :

def batch_query_wrapper(table, key, values):

    results = []

    response = dynamo.batch_get_item(RequestItems={table: {'Keys': [{key: val} for val in values]}})
    results.extend(response['Responses'][table])

    while response['UnprocessedKeys']:

        # Implement some kind of exponential back off here
        response = dynamo.batch_get_item(RequestItems={table: {'Keys': [{key: val} for val in values]}})
        results.extend(response['Response'][table])

    return results

It will return your result as python objects.