Creating/Writing to Parititoned BigQuery table via Google Cloud Dataflow

前端 未结 6 1347
死守一世寂寞
死守一世寂寞 2020-11-29 12:25

I wanted to take advantage of the new BigQuery functionality of time partitioned tables, but am unsure this is currently possible in the 1.6 version of the Dataflow SDK.

6条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-11-29 12:57

    As Pavan says, it is definitely possible to write to partition tables with Dataflow. Are you using the DataflowPipelineRunner operating in streaming mode or batch mode?

    The solution you proposed should work. Specifically, if you pre-create a table with date partitioning set up, then you can use a BigQueryIO.Write.toTableReference lambda to write to a date partition. For example:

    /**
     * A Joda-time formatter that prints a date in format like {@code "20160101"}.
     * Threadsafe.
     */
    private static final DateTimeFormatter FORMATTER =
        DateTimeFormat.forPattern("yyyyMMdd").withZone(DateTimeZone.UTC);
    
    // This code generates a valid BigQuery partition name:
    Instant instant = Instant.now(); // any Joda instant in a reasonable time range
    String baseTableName = "project:dataset.table"; // a valid BigQuery table name
    String partitionName =
        String.format("%s$%s", baseTableName, FORMATTER.print(instant));
    

提交回复
热议问题