Retrieving many rows using a TableBatchOperation is not supported?

前端未结

关注

 7  921

被撕碎了的回忆 2021-01-11 11:55

Here is a piece of code that initialize a TableBatchOperation designed to retrieve two rows in a single batch:

 TableBatchOperation batch = new TableBatchOpe


      
      
        
          7条回答        

        
                    
            
            
                         
                
              
              
                
                   情深已故
                                             
                
                
                (楼主)
            
              
              
                2021-01-11 12:04
              

            
            
                        
I know that this is an old question, but as Azure STILL does not support secondary indexes, it seems it will be relevant for some time.

I was hitting the same type of problem. In my scenario, I needed to lookup hundreds of items within the same partition, where there are millions of rows (imagine GUID  as row-key). I tested a couple of options to lookup 10,000 rows


(PK && RK)
(PK && RK1) || (PK & RK2) || ...
PK && (RK1 || RK2 || ... )


I was using the Async API, with a maximum 10 degrees of parallelism (max 10 outstanding requests). I also tested a couple of different batch sizes (10 rows, 50, 100). 

Test                        Batch Size  API calls   Elapsed (sec)
(PK && RK)                  1           10000       95.76
(PK && RK1) || (PK && RK2)  10          1000        25.94
(PK && RK1) || (PK && RK2)  50          200         18.35
(PK && RK1) || (PK && RK2)  100         100         17.38
PK && (RK1 || RK2 || … )    10          1000        24.55
PK && (RK1 || RK2 || … )    50          200         14.90
PK && (RK1 || RK2 || … )    100         100         13.43


NB: These are all within the same partition - just multiple rowkeys.

I would have been happy to just reduce the number of API calls.  But as an added benefit, the elapsed time is also significantly less, saving on compute costs (at least on my end!). 

Not too surprising, the batches of 100 rows delivered the best elapsed performance. There are obviously other performance considerations, especially network usage (#1 hardly uses the network at all for example, whereas the others push it much harder)

EDIT
Be careful when querying for many rowkeys. There is (or course) a URL length limitation to the query. If you exceed the length, the query will still succeed because the service can not tell that the URL was truncated.  In our case, we limited the combined query length to about 2500 characters (URL encoded!)
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它7个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复