How can I get the total number of items in a DynamoDB table?

前端 未结 8 2030
南旧
南旧 2020-12-05 23:33

I want to know how many items are in my dynamodb table. From the API guide, one way to do it is using a scan as follows:



        
8条回答
  •  庸人自扰
    2020-12-06 00:04

    Here's how I get the exact item count on my billion records DynamoDB table:

    hive>

    set dynamodb.throughput.write.percent = 1;
    set dynamodb.throughput.read.percent = 1;
    set hive.execution.engine = mr;
    set mapreduce.reduce.speculative=false;
    set mapreduce.map.speculative=false;
    
    CREATE EXTERNAL TABLE dynamodb_table (`ID` STRING,`DateTime` STRING,`ReportedbyName` STRING,`ReportedbySurName` STRING,`Company` STRING,`Position` STRING,`Country` STRING,`MailDomain` STRING) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "BillionData", "dynamodb.column.mapping" = "ID:ID,DateTime:DateTime,ReportedbyName:ReportedbyName,ReportedbySurName:ReportedbySurName,Company:Company,Position:Position,Country:Country,MailDomain:MailDomain");
    
    SELECT count(*) FROM dynamodb_table;
    

    *You should have a EMR cluster, which comes installed with Hive and DynamoDB record Handler. *With this command, DynamoDB handler on the hive issues "PARALLEL SCANS" with multiple Mapreduce mappers(AKA Workers) working on different partitions to get the count. This will be much efficient and faster than normal scans.
    *You must be willing to bump up Read capacity very high for certain period of time. * On a decent sized(20 node) cluster , With 10000 RCU , it took 15 minutes to get count on billion records Approx.
    * New writes on this DDB table during this period will make the count inconsistent.

提交回复
热议问题