Delete/update table entries by joining 2 tables on Google BigQuery without import/export

后端 未结 2 725
盖世英雄少女心
盖世英雄少女心 2021-01-14 14:19

We have a usecase where we have hundreds of millions of entries in a table and have a problem splitting it up further. 99% of operations are append-only. However, we have oc

2条回答
  •  没有蜡笔的小新
    2021-01-14 14:38

    There is relatively simple option we found efficient in similar scenarios with BigQuery.
    It allows to handle queries based on any time based snapshot – as well as query current snapshot

    In short, idea is in having one master table and daily history tables
    During the day - current daily table is used for insertions (new, update, delete) and then daily process does merge of last completed daily table with master table writing it out back to same master table. Of course, first, backup is taken via copy of latest master table (free operation).

    Daily master table update process allows to keep master table clean and fresh as of last day.
    Now at any given moment you can have most recent data by querying only (junk-less) master table and today's table only.
    At the same time, as you have all daily tables, you can query any historical data

    Of course, classic option of adding all data (new, update, delete) into the master table with respective qualifiers still looks good both price and performance wise because your main (99%) data are new entries!

    In your case, me personally, I would vote for classic approach with periodic cleaning of historical entries

    Finally, in my mind, it is less about joining, but rather about union with use of table wildcard and window functions

提交回复
热议问题