Export data from DynamoDB

后端 未结 17 1825
离开以前
离开以前 2020-12-13 04:02

Is it possible to export data from DynamoDB table in some format?

The concrete use case is that I want to export data from my production dynamodb database and import

相关标签:
17条回答
  • 2020-12-13 04:34

    Expanding on @Ivailo Bardarov's answer I wrote the following script duplicate tables that are in a remote DynamoDB to a local one:

    #!/bin/bash
    declare -a arr=("table1" "table2" "table3" "table4")
    for i in "${arr[@]}"
    do
        TABLE=$i
        maxItems=25
        index=0
        echo "Getting table description of $TABLE from remote database..."
        aws dynamodb describe-table --table-name $TABLE > table-description.json
        echo
        echo "Creating table $TABLE in the local database..."
        ATTRIBUTE_DEFINITIONS=$(jq .Table.AttributeDefinitions table-description.json)
        KEY_SCHEMA=$(jq .Table.KeySchema table-description.json)
        BILLING_MODE=$(jq .Table.BillingModeSummary.BillingMode table-description.json)
        READ_CAPACITY_UNITS=$(jq .Table.ProvisionedThroughput.ReadCapacityUnits table-description.json)
        WRITE_CAPACITY_UNITS=$(jq .Table.ProvisionedThroughput.WriteCapacityUnits table-description.json)
        TABLE_DEFINITION=""
    
        if [[ "$READ_CAPACITY_UNITS" > 0 && "$WRITE_CAPACITY_UNITS" > 0 ]]
        then
        TABLE_DEFINITION="{\"AttributeDefinitions\":$ATTRIBUTE_DEFINITIONS,\"TableName\":\"$TABLE\",\"KeySchema\":$KEY_SCHEMA,\"ProvisionedThroughput\":{\"ReadCapacityUnits\":$READ_CAPACITY_UNITS,\"WriteCapacityUnits\":$WRITE_CAPACITY_UNITS}}"
        else
        TABLE_DEFINITION="{\"AttributeDefinitions\":$ATTRIBUTE_DEFINITIONS,\"TableName\":\"$TABLE\",\"KeySchema\":$KEY_SCHEMA,\"BillingMode\":$BILLING_MODE}"
        fi
    
        echo $TABLE_DEFINITION > create-table.json
        aws dynamodb create-table --cli-input-json file://create-table.json --endpoint-url http://localhost:8000
        echo "Querying table $TABLE from remote..."
        DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
        ((index+=1))
        echo "Saving remote table [$TABLE] contents to inserts.json file..."
        echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.json
        echo "Inserting rows to $TABLE in local database..."
        aws dynamodb batch-write-item --request-items file://inserts.json --endpoint-url http://localhost:8000
    
        nextToken=$(echo $DATA | jq '.NextToken')        
        while [[ "$nextToken" != "" && "$nextToken" != "null" ]]
        do
          echo "Querying table $TABLE from remote..."
          DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
          ((index+=1))
          echo "Saving remote table [$TABLE] contents to inserts.json file..."
          echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.json
          echo "Inserting rows to $TABLE in local database..."
          aws dynamodb batch-write-item --request-items file://inserts.json --endpoint-url http://localhost:8000
          nextToken=$(echo "$DATA" | jq '.NextToken')
        done
    done
    
    echo "Deleting temporary files..."
    rm -f table-description.json
    rm -f create-table.json
    rm -f inserts.json
    
    echo "Database sync complete!"
    

    This script loops over the string array and for each table name it first gets the description of the table and builds a create JSON file with the minimum required parameters and creates the table. Then it uses rest of the @Ivailo Bardarov's logic to generate inserts and pushes them to the created table. Finally it cleans up the generated JSON files.

    Keep in mind, my purpose was to just create a rough duplicate (hence the minimum required parameters) of tables for development purposes.

    0 讨论(0)
  • 2020-12-13 04:36

    Export it from the DynamoDB interface to S3.

    Then convert it to Json using sed:

    sed -e 's/$/}/' -e $'s/\x02/,"/g' -e $'s/\x03/":/g' -e 's/^/{"/' <exported_table> > <exported_table>.json
    

    Source

    0 讨论(0)
  • 2020-12-13 04:36

    Here is a way to export some datas (oftentime we just want to get a sample of our prod data locally) from a table using aws cli and jq. Let's assume we have a prod table called unsurprisingly my-prod-table and a local table called my-local-table

    To export the data run the following:

    aws dynamodb scan --table-name my-prod-table \
    | jq '{"my-local-table": [.Items[] | {PutRequest: {Item: .}}]}' > data.json
    

    Basically what happens is that we scan our prod table, transform the output of the scan to shape into the format of the batchWriteItem and dump the result into a file.

    To import the data in your local table run:

    aws dynamodb batch-write-item \
    --request-items file://data.json \
    --endpoint-url http://localhost:8000
    

    Note: There are some restriction with the batch-write-item request - The BatchWriteItem operation can contain up to 25 individual PutItem and DeleteItem requests and can write up to 16 MB of data. (The maximum size of an individual item is 400 KB.).

    0 讨论(0)
  • 2020-12-13 04:36

    DynamoDB now has a native Export to S3 feature (in JSON and Amazon Ion formats) https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/

    0 讨论(0)
  • 2020-12-13 04:38

    Try my simple node.js script dynamo-archive. It exports and imports in JSON format.

    0 讨论(0)
  • 2020-12-13 04:38

    I found the best current tool for simple import/exports (including round-tripping through DynamoDB Local) is this Python script:

    https://github.com/bchew/dynamodump

    This script supports schema export/import as well as data import/export. It also uses the batch APIs for efficient operations.

    I have used it successfully to take data from a DynamoDB table to DynamoDB local for development purposes and it worked pretty well for my needs.

    0 讨论(0)
提交回复
热议问题