Is it possible to export data from DynamoDB table in some format?
The concrete use case is that I want to export data from my production dynamodb database and import
I extend Valy dia solution to allow all the process of exporting with only aws-cli | jq
aws dynamodb scan --max-items 3 --table-name <TABLE_NAME> \
| jq '{"<TABLE_NAME>": [.Items[] | {PutRequest: {Item: .}}]}' > data.json
aws dynamodb describe-table --table-name <TABLE_NAME> > describe.json | jq ' .Table | {"TableName": .TableName, "KeySchema": .KeySchema, "AttributeDefinitions": .AttributeDefinitions, "ProvisionedThroughput": {
"ReadCapacityUnits": 5,
"WriteCapacityUnits": 5
}}' > table-definition.json
aws dynamodb create-table --cli-input-json file://table-definition.json --endpoint-url http://localhost:8000 --region us-east-1
aws dynamodb batch-write-item --request-items file://data.json --endpoint-url http://localhost:8000
aws dynamodb scan --table-name <TABLE_NAME> --endpoint-url http://localhost:8000
if you need you can convert Dynamo data into JSON with this https://2json.net/dynamo
In a similar use-case, I have used DynamoDB Streams to trigger AWS Lambda which basically wrote to my DW instance. You could probably write your Lambda to write each of the table changes to a table in your non-production account. This way your Devo table would remain quite close to Prod as well.
There is a tool named DynamoDBtoCSV
that can be used for export all the data to a CSV file. However, for the other way around you will have to build your own tool. My suggestion is that you add this functionality to the tool, and contribuite it to the Git repository.
Another way is use AWS Data Pipeline for this task (you will save all the costs of reading the data from outside AWS infraestructure). The approach is similar:
This will export all items as jsons documents
aws dynamodb scan --table-name TABLE_NAME > export.json
This script will read from remote dynamodb table and import into the local the full table.
TABLE=YOURTABLE
maxItems=25
index=0
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1))
echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
((index+=1))
echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
nextToken=$(echo $DATA | jq '.NextToken')
done
Here are a version of the script using files to keep the exported data on disk.
TABLE=YOURTABLE
maxItems=25
index=0
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1))
echo $DATA | cat > "$TABLE-$index.json"
nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
((index+=1))
echo $DATA | cat > "$TABLE-$index.json"
nextToken=$(echo $DATA | jq '.NextToken')
done
for x in `ls *$TABLE*.json`; do
cat $x | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
done