How to create partitioned BigQuery table in Java

守給你的承諾、 提交于 2021-01-27 07:16:45

问题


https://cloud.google.com/bigquery/docs/creating-partitioned-tables shows how to create partitioned table in Python. I've been there, I've done that.

Now the question is, how to do the same thing with Java API? What is the corresponding Java code doing the same thing as the Python one below:

{
  "tableReference": {
    "projectId": "myProject",
    "tableId": "table1",
    "datasetId": "mydataset"
  },
  "timePartitioning": {
    "type": "DAY"
  }
}

Java with missing partitioning:

Job createTableJob = new Job();
JobConfiguration jobConfiguration = new JobConfiguration();
JobConfigurationLoad loadConfiguration = new JobConfigurationLoad();

createTableJob.setConfiguration(jobConfiguration);
jobConfiguration.setLoad(loadConfiguration);

TableReference tableReference = new TableReference()
    .setProjectId("myProject")
    .setDatasetId("mydataset")
    .setTableId("table1");

loadConfiguration.setDestinationTable(tableReference);
// what should be place here to set DAY timePartitioning?

I'm using the newest api version from Maven Central Repository: com.google.apis:google-api-services-bigquery:v2-rev326-1.22.0.


回答1:


https://cloud.google.com/bigquery/docs/reference/v2/tables/insert https://cloud.google.com/bigquery/docs/reference/v2/tables#resource

Example Java code:

String projectId = "";
String datasetId = "";

Table content = new Table();
TimePartitioning timePartitioning = new TimePartitioning();
timePartitioning.setType("DAY");
timePartitioning.setExpirationMs(1L);
content.setTimePartitioning(timePartitioning);

Bigquery.Tables.Insert request = bigquery.tables().insert(projectId, datasetId, content);
Table response = request.execute();



回答2:


Please let me share a more updated way to create partitioned tables (works with Java API 0.32):

Schema schema = Schema.of( newFields);
TimePartitioning timePartitioning = TimePartitioning.of(TimePartitioning.Type.DAY);
TableDefinition tableDefinition = StandardTableDefinition.newBuilder()
        .setSchema(schema)
        .setTimePartitioning(timePartitioning)
        .build();

TableId tableId = TableId.of(projectName, datasetName, tableName)
TableInfo tableInfo = TableInfo.newBuilder( tableId, tableDefinition).build();
bigQuery.create( tableInfo);

Update on 19/03/2018:

To load some data into a specific partition (or to insert the result as a Select into a specific partition), you just have to add the day of this partition (using the suffix: $yyyymmdd) to the name of the table when you construct the TableId object. Here is an example:

private void runJob(JobConfiguration jobConf) {
    BIG_QUERY.create(JobInfo.of(jobConf));
}

private TableId getTableToOverwrite(String tableToOverwrite, String partition) {
    return TableId.of(PROJECT, DATASET, tableToOverwrite  + "$" + partition);
}

void loadInDayPartition(String dayUrl, String dayPartition) {

    LoadJobConfiguration loadConf = LoadJobConfiguration.newBuilder(getTableToOverwrite(TABLE_LEGACY, dayPartition),
            dayUrl, FormatOptions.avro())
            .build();

    runJob(loadConf);
}

I don't have any example to insert data in streaming into a partition table, but I guess it is similar.




回答3:


If you want to partition by field, the code would look like as below.

Schema schema = Schema.of( fields);
Builder timeParitioningBuilder = TimePartitioning.newBuilder(TimePartitioning.Type.DAY);
timeParitioningBuilder.setField("partition_column");
TableDefinition tableDefinition = StandardTableDefinition.newBuilder()
        .setSchema(schema)
        .setTimePartitioning(timePartitioning)
        .build();

TableId tableId = TableId.of(projectName, datasetName, tableName)
TableInfo tableInfo = TableInfo.newBuilder( tableId, tableDefinition).build();
bigQuery.create( tableInfo);


来源:https://stackoverflow.com/questions/40212167/how-to-create-partitioned-bigquery-table-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!