How bigquery's time_partitioning_expiration parameter work?

亡梦爱人 提交于 2019-12-11 02:08:51

问题


I've created a table with partition type day, and I have set time_partitioning_expiration to 1209600 seconds (14 days) from bq command line tool. I have verified if the settings are correct by running bq show on table and I can see

"timePartitioning": {
    "expirationMs": "1209600000", 
    "type": "DAY"
  }, 
  "type": "TABLE"

However there seems to be data in the partitions that I expected to have been deleted.

SELECT 
  count(*) as c,
  _partitiontime as pDate
FROM [poc.reporting]
group by pDate
; 

1   373800  2016-07-17 00:00:00 UTC  
2   640800  2016-07-18 00:00:00 UTC  
3   373800  2016-07-16 00:00:00 UTC  

Is my understanding that setting time_partition_expiration will remove/delete partitions internally correct? If so why hasn't partitions been deleted in my case?


回答1:


Expiration of partitions is not based on the partition's date but it is based on the time at which storage was exactly added to the table. So, for example, for a table with partition expiration of 14 days, if a partition 20160717's data was restated on 20160730, the data for that partition would expire 14days from 20160730. You can verify the write time of the oldest data in the partition using the following query:

SELECT MSEC_TO_TIMESTAMP(creation_time) from [project:dataset.table$__PARTITIONS_SUMMARY__]



回答2:


It looks like the behavior described in the top answer from Pavan changed on Dec 13, 2016, and expiration time is now based on the partition date, and NOT the last modified date.

From the most recent docs:

For example, if the partition expiration is set to 60 days and data is written to partition 20170101 on January 30, 2017, then the data expires on January 1, 2017 + 60 days.

Reference: BQ Partitioned Tables



来源:https://stackoverflow.com/questions/38717242/how-bigquerys-time-partitioning-expiration-parameter-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!