How can I get the total number of items in a DynamoDB table?

前端未结
关注
 8  2043
南旧 2020-12-05 23:33
I want to know how many items are in my dynamodb table. From the API guide, one way to do it is using a scan as follows:

      
      
        
          8条回答        

        
                    
            
            
                         
                
              
              
                
                   庸人自扰
                                             
                
                
                (楼主)
            
              
              
                2020-12-06 00:04
              

            
            
                        
Here's how I get the exact item count on my billion records DynamoDB table: 

hive>

set dynamodb.throughput.write.percent = 1;
set dynamodb.throughput.read.percent = 1;
set hive.execution.engine = mr;
set mapreduce.reduce.speculative=false;
set mapreduce.map.speculative=false;

CREATE EXTERNAL TABLE dynamodb_table (`ID` STRING,`DateTime` STRING,`ReportedbyName` STRING,`ReportedbySurName` STRING,`Company` STRING,`Position` STRING,`Country` STRING,`MailDomain` STRING) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "BillionData", "dynamodb.column.mapping" = "ID:ID,DateTime:DateTime,ReportedbyName:ReportedbyName,ReportedbySurName:ReportedbySurName,Company:Company,Position:Position,Country:Country,MailDomain:MailDomain");

SELECT count(*) FROM dynamodb_table;


*You should have a EMR cluster, which comes installed with Hive and DynamoDB record Handler.
*With this command, DynamoDB handler on the hive issues "PARALLEL SCANS" with multiple Mapreduce mappers(AKA Workers) working on different partitions to get the count. This will be much efficient and faster than normal scans.

*You must be willing to bump up Read capacity very high for certain period of time.
* On a decent sized(20 node) cluster , With 10000 RCU , it took 15 minutes to get count on billion records Approx.

* New writes on this DDB table during this period will make the count inconsistent.   
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它8个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复