Delete/update table entries by joining 2 tables on Google BigQuery without import/export

后端未结

关注

 2  725

盖世英雄少女心 2021-01-14 14:19

We have a usecase where we have hundreds of millions of entries in a table and have a problem splitting it up further. 99% of operations are append-only. However, we have oc

2条回答

没有蜡笔的小新 (楼主)

2021-01-14 14:38

There is relatively simple option we found efficient in similar scenarios with BigQuery.
It allows to handle queries based on any time based snapshot – as well as query current snapshot

In short, idea is in having one master table and daily history tables
During the day - current daily table is used for insertions (new, update, delete) and then daily process does merge of last completed daily table with master table writing it out back to same master table. Of course, first, backup is taken via copy of latest master table (free operation).

Daily master table update process allows to keep master table clean and fresh as of last day.
Now at any given moment you can have most recent data by querying only (junk-less) master table and today's table only.
At the same time, as you have all daily tables, you can query any historical data

Of course, classic option of adding all data (new, update, delete) into the master table with respective qualifiers still looks good both price and performance wise because your main (99%) data are new entries!

In your case, me personally, I would vote for classic approach with periodic cleaning of historical entries

Finally, in my mind, it is less about joining, but rather about union with use of table wildcard and window functions

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...