Complete scan of dynamoDb with boto3

前端 未结 8 1980
有刺的猬
有刺的猬 2020-11-29 04:08

My table is around 220mb with 250k records within it. I\'m trying to pull all of this data into python. I realize this needs to be a chunked batch process and looped throug

8条回答
  •  臣服心动
    2020-11-29 05:04

    The 2 approaches suggested above both have problems: Either writing lengthy and repetitive code that handles paging explicitly in a loop, or using Boto paginators with low-level sessions, and foregoing the advantages of higher-level Boto objects.

    A solution using Python functional code to provide a high-level abstraction allows higher-level Boto methods to be used, while hiding the complexity of AWS paging:

    import itertools
    import typing
    
    def iterate_result_pages(function_returning_response: typing.Callable, *args, **kwargs) -> typing.Generator:
        """A wrapper for functions using AWS paging, that returns a generator which yields a sequence of items for
        every response
    
        Args:
            function_returning_response: A function (or callable), that returns an AWS response with 'Items' and optionally 'LastEvaluatedKey'
            This could be a bound method of an object.
    
        Returns:
            A generator which yields the 'Items' field of the result for every response
        """
        response = function_returning_response(*args, **kwargs)
        yield response["Items"]
        while "LastEvaluatedKey" in response:
            kwargs["ExclusiveStartKey"] = response["LastEvaluatedKey"]
            response = function_returning_response(*args, **kwargs)
            yield response["Items"]
    
        return
    
    def iterate_paged_results(function_returning_response: typing.Callable, *args, **kwargs) -> typing.Iterator:
        """A wrapper for functions using AWS paging, that returns an iterator of all the items in the responses.
        Items are yielded to the caller as soon as they are received.
    
        Args:
            function_returning_response: A function (or callable), that returns an AWS response with 'Items' and optionally 'LastEvaluatedKey'
            This could be a bound method of an object.
    
        Returns:
            An iterator which yields one response item at a time
        """
        return itertools.chain.from_iterable(iterate_result_pages(function_returning_response, *args, **kwargs))
    
    # Example, assuming 'table' is a Boto DynamoDB table object:
    all_items = list(iterate_paged_results(ProjectionExpression = 'my_field'))
    

提交回复
热议问题