Delete data from BigQuery while streaming from Dataflow

China☆狼群 提交于 2019-12-11 07:44:14

问题


Is it possible to delete data from a BigQuery table while loading data into it from an Apache Beam pipeline.

Our use case is such that we need to delete 3 days prior data from the table on the basis of a timestamp field (time when Dataflow pulls message from Pubsub topic).

Is it recommended to do something like this? If yes, is there any way to achieve this?

Thank You.


回答1:


I think best way of doing this setup you table as partitioned (based on ingestion time) table https://cloud.google.com/bigquery/docs/partitioned-tables And you can drop old partition manually

bq rm 'mydataset.mytable$20160301'

You can also set expiration time

bq update --time_partitioning_expiration [INTEGER] [PROJECT_ID]:[DATASET].[TABLE]

If ingestion time does not work for you you can look into https://cloud.google.com/bigquery/docs/creating-column-partitions - but it is in beta - works reliably but it is your call



来源:https://stackoverflow.com/questions/50217935/delete-data-from-bigquery-while-streaming-from-dataflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!