Copy table structure alone in Bigquery

问题

In Google's Big query, is there a way to clone (copy the structure alone) a table without data?

bq cp doesn't seem to have an option to copy structure without data. And Create table as Select (CTAS) with filter such as "1=2" does create the table without data. But, it doesn't copy the partitioning/clustering properties.

回答1:

If you want to clone structure of table along with partitioning/clustering properties w/o having need in knowing what exactly those partitioning/clustering properties - follow below steps:

Step 1: just copy your_table to new table - let's say your_table_copy. This will obviously copy whole table including all properties (including such like descriptions, partition's expiration etc. - which is very simple to miss if you will try to set them manually) and data. Note: copy is cost free operation

Step 2: To get rid of data in newly created table - run below query statement

SELECT * FROM `project.dataset.your_table_copy` LIMIT 0

while running above make sure you set project.dataset.your_table_copy as destination table with 'Overwrite Table' as 'Write Preference'. Note: this is also cost free step (because of LIMIT 0)

You can easily do both above steps from within Web UI or Command Line or API or any client of your choice - whatever you are most comfortable with

回答2:

You can use DDL and limit 0, but you need to express partitioning and clustering in the query as well

#standardSQL
 CREATE TABLE mydataset.myclusteredtable
 PARTITION BY DATE(timestamp)
 CLUSTER BY
   customer_id
 AS SELECT * FROM mydataset.myothertable LIMIT 0

回答3:

This is possible with the BQ CLI.

First download the schema of the existing table:

bq show --format=prettyjson project:dataset.table | jq '.schema.fields' > table.json

Then, create a new table with the provided schema and required partitioning:

bq mk \
  --time_partitioning_type=DAY \
  --time_partitioning_field date_field \
  --require_partition_filter \
  --table dataset.tablename \
  table.json

See more info on bq mk options: https://cloud.google.com/bigquery/docs/tables

Install jq with: npm install node-jq

回答4:

You can use BigQuery API to run a select, as you suggested, which will return an empty result and set the partition and cluster fields.

This is an example (Only partition but cluster works as well)

curl --request POST \
  'https://www.googleapis.com/bigquery/v2/projects/myProject/jobs' \
  --header 'Authorization: Bearer [YOUR_BEARER_TOKEN]' \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --data '{"configuration":{"query":{"query":"SELECT * FROM `Project.dataset.audit` WHERE 1 = 2","timePartitioning":{"type":"DAY"},"destinationTable":{"datasetId":"datasetId","projectId":"projectId","tableId":"test"},"useLegacySql":false}}}' \
  --compressed

Result

回答5:

Finally, I went with below python script to detect the schema/partitioning/clustering properties to re-create(clone) the clustered table without data. I hope we get an out of the box feature from bigquery to clone a table structure without the need for a script such as this.

import commands
import json

BQ_EXPORT_SCHEMA = "bq show --schema --format=prettyjson %project%:%dataset%.%table% > %path_to_schema%"
BQ_SHOW_TABLE_DEF="bq show --format=prettyjson %project%:%dataset%.%table%"
BQ_MK_TABLE = "bq mk --table --time_partitioning_type=%partition_type% %optional_time_partition_field% --clustering_fields %clustering_fields% %project%:%dataset%.%table% ./%cluster_json_file%"


def create_table_with_cluster(bq_project, bq_dataset, source_table, target_table):

    cmd = BQ_EXPORT_SCHEMA.replace('%project%', bq_project)\
        .replace('%dataset%', bq_dataset)\
        .replace('%table%', source_table)\
        .replace('%path_to_schema%', source_table)
    commands.getstatusoutput(cmd)

    cmd = BQ_SHOW_TABLE_DEF.replace('%project%', bq_project)\
        .replace('%dataset%', bq_dataset)\
        .replace('%table%', source_table)
    (return_value, output) = commands.getstatusoutput(cmd)

    bq_result = json.loads(output)

    clustering_fields = bq_result["clustering"]["fields"]
    time_partitioning = bq_result["timePartitioning"]
    time_partitioning_type = time_partitioning["type"]
    time_partitioning_field = ""
    if "field" in time_partitioning:
        time_partitioning_field = "--time_partitioning_field " + time_partitioning["field"]

    clustering_fields_list = ",".join(str(x) for x in clustering_fields)

    cmd = BQ_MK_TABLE.replace('%project%', bq_project)\
        .replace('%dataset%', bq_dataset)\
        .replace('%table%', target_table)\
        .replace('%cluster_json_file%', source_table)\
        .replace('%clustering_fields%', clustering_fields_list)\
        .replace('%partition_type%', time_partitioning_type)\
        .replace('%optional_time_partition_field%', time_partitioning_field)
    commands.getstatusoutput(cmd)


create_table_with_cluster('test_project', 'test_dataset', 'source_table', 'target_table')

来源：https://stackoverflow.com/questions/54053998/copy-table-structure-alone-in-bigquery

标签

google-bigquery