问题
In Google's Big query, is there a way to clone (copy the structure alone) a table without data?
bq cp doesn't seem to have an option to copy structure without data. And Create table as Select (CTAS) with filter such as "1=2" does create the table without data. But, it doesn't copy the partitioning/clustering properties.
回答1:
If you want to clone structure of table along with partitioning/clustering properties w/o having need in knowing what exactly those partitioning/clustering properties - follow below steps:
Step 1: just copy your_table
to new table - let's say your_table_copy
. This will obviously copy whole table including all properties (including such like descriptions, partition's expiration etc. - which is very simple to miss if you will try to set them manually) and data. Note: copy is cost free operation
Step 2: To get rid of data in newly created table - run below query statement
SELECT * FROM `project.dataset.your_table_copy` LIMIT 0
while running above make sure you set project.dataset.your_table_copy
as destination table with 'Overwrite Table' as 'Write Preference'. Note: this is also cost free step (because of LIMIT 0)
You can easily do both above steps from within Web UI or Command Line or API or any client of your choice - whatever you are most comfortable with
回答2:
You can use DDL and limit 0, but you need to express partitioning and clustering in the query as well
#standardSQL
CREATE TABLE mydataset.myclusteredtable
PARTITION BY DATE(timestamp)
CLUSTER BY
customer_id
AS SELECT * FROM mydataset.myothertable LIMIT 0
回答3:
This is possible with the BQ CLI.
First download the schema of the existing table:
bq show --format=prettyjson project:dataset.table | jq '.schema.fields' > table.json
Then, create a new table with the provided schema and required partitioning:
bq mk \
--time_partitioning_type=DAY \
--time_partitioning_field date_field \
--require_partition_filter \
--table dataset.tablename \
table.json
See more info on bq mk
options: https://cloud.google.com/bigquery/docs/tables
Install jq with: npm install node-jq
回答4:
You can use BigQuery API to run a select, as you suggested, which will return an empty result and set the partition and cluster fields.
This is an example (Only partition but cluster works as well)
curl --request POST \
'https://www.googleapis.com/bigquery/v2/projects/myProject/jobs' \
--header 'Authorization: Bearer [YOUR_BEARER_TOKEN]' \
--header 'Accept: application/json' \
--header 'Content-Type: application/json' \
--data '{"configuration":{"query":{"query":"SELECT * FROM `Project.dataset.audit` WHERE 1 = 2","timePartitioning":{"type":"DAY"},"destinationTable":{"datasetId":"datasetId","projectId":"projectId","tableId":"test"},"useLegacySql":false}}}' \
--compressed
Result
回答5:
Finally, I went with below python script to detect the schema/partitioning/clustering properties to re-create(clone) the clustered table without data. I hope we get an out of the box feature from bigquery to clone a table structure without the need for a script such as this.
import commands
import json
BQ_EXPORT_SCHEMA = "bq show --schema --format=prettyjson %project%:%dataset%.%table% > %path_to_schema%"
BQ_SHOW_TABLE_DEF="bq show --format=prettyjson %project%:%dataset%.%table%"
BQ_MK_TABLE = "bq mk --table --time_partitioning_type=%partition_type% %optional_time_partition_field% --clustering_fields %clustering_fields% %project%:%dataset%.%table% ./%cluster_json_file%"
def create_table_with_cluster(bq_project, bq_dataset, source_table, target_table):
cmd = BQ_EXPORT_SCHEMA.replace('%project%', bq_project)\
.replace('%dataset%', bq_dataset)\
.replace('%table%', source_table)\
.replace('%path_to_schema%', source_table)
commands.getstatusoutput(cmd)
cmd = BQ_SHOW_TABLE_DEF.replace('%project%', bq_project)\
.replace('%dataset%', bq_dataset)\
.replace('%table%', source_table)
(return_value, output) = commands.getstatusoutput(cmd)
bq_result = json.loads(output)
clustering_fields = bq_result["clustering"]["fields"]
time_partitioning = bq_result["timePartitioning"]
time_partitioning_type = time_partitioning["type"]
time_partitioning_field = ""
if "field" in time_partitioning:
time_partitioning_field = "--time_partitioning_field " + time_partitioning["field"]
clustering_fields_list = ",".join(str(x) for x in clustering_fields)
cmd = BQ_MK_TABLE.replace('%project%', bq_project)\
.replace('%dataset%', bq_dataset)\
.replace('%table%', target_table)\
.replace('%cluster_json_file%', source_table)\
.replace('%clustering_fields%', clustering_fields_list)\
.replace('%partition_type%', time_partitioning_type)\
.replace('%optional_time_partition_field%', time_partitioning_field)
commands.getstatusoutput(cmd)
create_table_with_cluster('test_project', 'test_dataset', 'source_table', 'target_table')
来源:https://stackoverflow.com/questions/54053998/copy-table-structure-alone-in-bigquery