Reject data load attempt to BigQuery for existing data

前端未结

关注

 1  483

I\'m loading data from pandas dataframes to BigQuery using pandas-gbq package:

df.to_gbq(\'dataset.table\', project_id


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-01-17 02:53
              
            
            
                                                                       

  Is there a way to reject the loading attempt if the key already appears in the BigQuery table?


No, since BigQuery doesn't support keys in a similar way other database does. 
There are 2 typical use-cases to solve this: 

Option 1:

Upload the data with a timeStamp and use a merge command to remove duplicates 

See this link on how to do this, This is an example

MERGE `DATA` AS target
USING `DATA` AS source
ON target.key = source.key
WHEN MATCHED AND target.ts < source.ts THEN 
DELETE


Note: In this case, you pay for the merge scanning but keep your table row unique.

Option 2: 

Upload the data with a timestamp and use ROW_NUMBER window  function to fetch the latest record, This is an example with your data:

WITH DATA AS (
    SELECT 'sd3e' AS key, 0.3 as value,  1 as r_order, '2019-04-14 00:00:00' as ts  UNION ALL
    SELECT 'sd3e' AS key, 0.2 as value,  2 as r_order, '2019-04-14 01:00:00' as ts  UNION ALL
    SELECT 'sd4r' AS key, 0.1 as value,  1 as r_order, '2019-04-14 00:00:00' as ts  UNION ALL
    SELECT 'sd4r' AS key, 0.5 as value,  2 as r_order, '2019-04-14 01:00:00' as ts  
)

SELECT * 
FROM (
    SELECT * ,ROW_NUMBER() OVER(PARTITION BY key order by ts DESC) rn 
    FROM `DATA` 
)
WHERE rn = 1


This produces the expected results as follow:


Note: This case doesn't incur extra charges, however, you always have to make sure to use window function when fetching from the table
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复