问题
I'm using Python boto3 to upload data to AWS.
I have a dedicated connection to AWS of 350 Mbps.
I have a large JSON file and I would like to know if it is better to upload this information directly to DynamoDB or instead it is better to upload this on S3 first and then using data pipeline to upload this to DynamodDB?
My data is already clean and it doesn't need to be processed. I just need to expose this information to DynamoDB in the most efficient and reliable way.
My script will be run on a sever with the following specifications: 512 GB RAM 48 CPU cores
Here is some sample data:
Sample1:
{
"updated":{
"n":"20181226"
},
"periodo":{
"n":"20180823"
},
"tipos":{
"m":{
"Disponible":{
"m":{
"total":{
"n":"200"
},
"Saldos de Cuentas de Ahorro":{
"n":"300"
}
}
}
}
},
"mediana_disponible":{
"n":"588"
},
"mediana_ingreso":{
"n":"658"
},
"mediana_egreso":{
"n":"200"
},
"documento":{
"s":"2-2"
}
}
For this sample, this is only one record and on average there are 68 million and the file size is 70GB.
Sample2:
{
"updated":{
"n":"20190121"
},
"zonas":{
"s":"123"
},
"tipo_doc":{
"n":"3123"
},
"cods_sai":{
"s":"3,234234,234234"
},
"cods_cb":{
"s":"234234,5435,45"
},
"cods_atm":{
"s":"54,45,345;345,5345,435"
},
"num_doc":{
"n":"345"
},
"cods_mf":{
"s":"NNN"
},
"cods_pac":{
"s":"NNN"
}
}
For this sample, this is only one record and on average there are 7 million records and the file size is 10GB.
Thanks in advance
来源:https://stackoverflow.com/questions/54410828/best-way-to-upload-data-with-boto3-to-dynamodb