问题
I have a legacy unpartitioned big query table that streams logs from various sources (Let's say Table BigOldA
). The aim is to transfer it to a new day
partition table (Let's say PartByDay
) which is done with the help of the following link:
https://cloud.google.com/bigquery/docs/creating-column-partitions#creating_a_partitioned_table_from_a_query_result
bq query
--allow_large_results
--replace=true
--destination_table <project>:<data-set>.<PartByDay>
--time_partitioning_field REQUEST_DATETIME
--use_legacy_sql=false 'SELECT * FROM `<project>.<data-set>.<BigOldA>`'
I have migrated the historical data to the new table but I cannot delete them in Table BigOldA
as I am running into the same problem with running DMLs on streaming buffer tables are not supported yet.
Error: UPDATE or DELETE DML statements are not supported over
table <project>:<data-set>.BigOldA with streaming buffer
I was planning to run batch jobs everyday transferring T-1 data from Table BigOldA
to Table PartByDay
and deleting them periodically so that I can still maintain the streaming buffer data in Table BigOldA
and start using PartByDay
Table for analytics. Now I am not sure if it's achievable.
I am looking for an alternative solution or best practice on how to periodically transfer & maintain stream buffering table to partitioned table. Also, as the data is streaming from independent production sources it's not possible to point all sources streaming to PartByDay
and streamingbuffer
properties from tables.get
is never null.
回答1:
You could just delete the original table and then rename the migrated table to the original name after you've run the your history job. This assumes your streaming component to BigQuery is fault tolerant. If it's designed well, you shouldn't lose any data. Whatever is streaming to BigQuery should be able to store events until the table comes back online. It shouldn't change anything for your components that are streaming once the table is partitioned.
回答2:
If anyone interested in the script, here you go.
#!/bin/sh
# This script
# 1. copies the data as the partitioned table
# 2. delete the unpartitioned table
# 3. copy the partitioned table to the same dataset table name
# TODO 4. deletes the copied table
set -e
source_project="<source-project>"
source_dataset="<source-dataset>"
source_table="<source-table-to-partition>"
destination_project="<destination-project>"
destination_dataset="<destination-dataset>"
partition_field="<timestamp-partition-field>"
destination_table="<table-copy-partition>"
source_path="$source_project.$source_dataset.$source_table"
source_l_path="$source_project:$source_dataset.$source_table"
destination_path="$destination_project:$destination_dataset.$destination_table"
echo "copying table from $source_path to $destination_path"
query=$(cat <<-END
SELECT * FROM \`$source_path\`
END
)
echo "deleting old table"
bq rm -f -t $destination_path
echo "running the query: $query"
bq query --quiet=true --use_legacy_sql=false --apilog=stderr --allow_large_results --replace=true --destination_table $destination_path --time_partitioning_field $partition_field "$query"
echo "removing the original table: $source_path"
bq rm -f -t $source_l_path
echo "table deleted"
echo "copying the partition table to the original source path"
bq cp -f -n $destination_path $source_l_path
来源:https://stackoverflow.com/questions/49754400/migrating-non-partitioned-streaming-table-to-partitioned-table-bigquery