How to integrate Google Cloud SQL with Google Big Query

前端 未结 3 1870
余生分开走
余生分开走 2020-12-10 15:17

I am designing a solution in which Google Cloud SQL will be used to store all data from the regular functioning of the app(kind of OLTP data). The data is expected to grow o

相关标签:
3条回答
  • 2020-12-10 15:43

    Another method would be to split the write process to CloudSQL and to Cloud Pub/Sub and then have a Dataflow reader to stream into BigQuery. This works well when you have materially different target schema for your BigQuery tables - which is common when denormalizing your relational data.

    The upside is that you can reduce overall latency to say a few seconds; however, the main downside is that if your transactional data is highly mutating you will have to create a versioning scheme to track changes.

    0 讨论(0)
  • 2020-12-10 15:48

    Take a look at how WePay does this:

    • https://wecode.wepay.com/posts/bigquery-wepay

    The MySQL to GCS operator executes a SELECT query against a MySQL table. The SELECT pulls all data greater than (or equal to) the last high watermark. The high watermark is either the primary key of the table (if the table is append-only), or a modification timestamp column (if the table receives updates). Again, the SELECT statement also goes back a bit in time (or rows) to catch potentially dropped rows from the last query (due to the issues mentioned above).

    With Airflow they manage to keep BigQuery synchronized to their MySQL database every 15 minutes.

    0 讨论(0)
  • 2020-12-10 15:59

    BigQuery supports Cloud SQL federated queries which lets you directly query Cloud SQL database from BigQuery. To keep Cloud SQL table in sync with BigQuery, you can write a simple script with following query to sync two tables every hour.

    INSERT
       demo.customers (column1)
    SELECT
       *
    FROM
       EXTERNAL_QUERY(
          "project.us.connection",
          "SELECT column1 FROM mysql_table WHERE timestamp > ${timestamp};");
    

    Just remember replace the ${timestamp} with the current timestamp - 1 hour.

    0 讨论(0)
提交回复
热议问题