Creating/Writing to Parititoned BigQuery table via Google Cloud Dataflow

前端未结

关注

 6  1348

死守一世寂寞 2020-11-29 12:25

I wanted to take advantage of the new BigQuery functionality of time partitioned tables, but am unsure this is currently possible in the 1.6 version of the Dataflow SDK.

6条回答

北海茫月 (楼主)

2020-11-29 12:47

I have written data into bigquery partitioned tables through dataflow. These writings are dynamic as-in if the data in that partition already exists then I can either append to it or overwrite it.

I have written the code in Python. It is a batch mode write operation into bigquery.

client = bigquery.Client(project=projectName)
dataset_ref = client.dataset(datasetName)
table_ref = dataset_ref.table(bqTableName)       
job_config = bigquery.LoadJobConfig()
job_config.skip_leading_rows = skipLeadingRows
job_config.source_format = bigquery.SourceFormat.CSV
if tableExists(client, table_ref):            
    job_config.autodetect = autoDetect
    previous_rows = client.get_table(table_ref).num_rows
    #assert previous_rows > 0
    if allowJaggedRows is True:
        job_config.allowJaggedRows = True
    if allowFieldAddition is True:
        job_config._properties['load']['schemaUpdateOptions'] = ['ALLOW_FIELD_ADDITION']
    if isPartitioned is True:
        job_config._properties['load']['timePartitioning'] = {"type": "DAY"}
    if schemaList is not None:
        job_config.schema = schemaList            
    job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
else:            
    job_config.autodetect = autoDetect
    job_config._properties['createDisposition'] = 'CREATE_IF_NEEDED'
    job_config.schema = schemaList
    if isPartitioned is True:             
        job_config._properties['load']['timePartitioning'] = {"type": "DAY"}
    if schemaList is not None:
        table = bigquery.Table(table_ref, schema=schemaList)            
load_job = client.load_table_from_uri(gcsFileName, table_ref, job_config=job_config)        
assert load_job.job_type == 'load'
load_job.result()       
assert load_job.state == 'DONE'

It works fine.

0 讨论(0)

查看其它6个回答