Cannot query over table without a filter that can be used for partition elimination

前端 未结 1 931
悲哀的现实
悲哀的现实 2020-12-21 03:26

I have a partitioned table and would love to use a MERGE statement, but for some reason doesn\'t work out.

MERGE `wr_live.p_email_event`  t
using `wr_live.em         


        
相关标签:
1条回答
  • 2020-12-21 04:16

    What's the proper syntax?

    As you can see from error message - your partitioned wr_live.p_email_event table was created with require partition filter set to true. This mean that any query over this table must have some filter on respective partitioned field

    Assuming that timestamp IS that partitioned field - you can do something like below

    MERGE `wr_live.p_email_event`  t
    USING `wr_live.email_event` s
    ON t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
    AND DATE(t.timestamp) > CURRENT_DATE()  -- this is the filter you should tune 
    WHEN NOT MATCHED THEN
    INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
    VALUES (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)   
    

    So you need to make below line such that it in reality does not filter out whatever you need to be involved

    AND DATE(t.timestamp) <> CURRENT_DATE()  -- this is the filter you should tune 
    

    For example, I found, setting it to timestamp in future - in many cases addresses the issue, like

    AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)   
    

    Of course, if your wr_live.email_event table also partitioned with require partition filter set to true - you need to add same filter for s.timestamp

    Also is there a way I can express shorter the insert stuff? without naming all columns?

    BigQuery DML's INSERT requires column names to be specified - there is no way (at least that I am aware of) to avoid it using INSERT statement
    Meantime, you can avoid this by using DDL's CREATE TABLE from the result of the query. This will not require listing the columns

    For example, something like below

    CREATE OR REPLACE TABLE `wr_live.p_email_event`
    PARTITION BY DATE(timestamp) AS
    SELECT * FROM `wr_live.p_email_event` 
    WHERE DATE(timestamp) <> DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
    UNION ALL 
    SELECT * FROM `wr_live.email_event` s
    WHERE NOT EXISTS (
      SELECT 1 FROM `wr_live.p_email_event` t
      WHERE t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
      AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
    )
    

    You might also want to include table options list via OPTIONS() - but looks like filter attribute is not supported yet - so if you do have/need it - above will "erase" this attribute :o(

    0 讨论(0)
提交回复
热议问题