google-cloud-dataflow

Google Cloud Dataflow to BigQuery - UDF - convert unixTimestamp to local time

自闭症网瘾萝莉.ら 提交于 2021-02-20 03:48:05
问题 What is the best way to convert unixTimestamp to local time in the following scenario? I am using Pub/Sub Subscription to BigQuery Template. Dataflow fetches data in json format from PubSub, does the transformation, inserts into BigQuery Preferably, I want to use UDF for data transformation setup. (For simplicity,) Input data includes only unixTimestamp. Example: {"unixTimestamp": "1612325106000"} Bigquery table has 3 columns: unix_ts:INTEGER, iso_dt:DATETIME, local_dt:DATETIME where unix_ts

Google Cloud Dataflow to BigQuery - UDF - convert unixTimestamp to local time

蓝咒 提交于 2021-02-20 03:46:15
问题 What is the best way to convert unixTimestamp to local time in the following scenario? I am using Pub/Sub Subscription to BigQuery Template. Dataflow fetches data in json format from PubSub, does the transformation, inserts into BigQuery Preferably, I want to use UDF for data transformation setup. (For simplicity,) Input data includes only unixTimestamp. Example: {"unixTimestamp": "1612325106000"} Bigquery table has 3 columns: unix_ts:INTEGER, iso_dt:DATETIME, local_dt:DATETIME where unix_ts

Google Cloud Dataflow to BigQuery - UDF - convert unixTimestamp to local time

一个人想着一个人 提交于 2021-02-20 03:45:42
问题 What is the best way to convert unixTimestamp to local time in the following scenario? I am using Pub/Sub Subscription to BigQuery Template. Dataflow fetches data in json format from PubSub, does the transformation, inserts into BigQuery Preferably, I want to use UDF for data transformation setup. (For simplicity,) Input data includes only unixTimestamp. Example: {"unixTimestamp": "1612325106000"} Bigquery table has 3 columns: unix_ts:INTEGER, iso_dt:DATETIME, local_dt:DATETIME where unix_ts

Running Apache Beam pipeline in Spring Boot project on Google Data Flow

依然范特西╮ 提交于 2021-02-19 08:27:34
问题 I'm trying the run an Apache Beam pipeline in a Spring Boot project on Google Data Flow, but I keep having this error Failed to construct instance from factory method DataflowRunner#fromOptions(interfaceorg.apache.beam.sdk.options.PipelineOptions The example I'm trying to run is a basic word count provided by the official documentation, https://beam.apache.org/get-started/wordcount-example/ . The problem is that this example is using different classes for each example, and each example has

“java.lang.IllegalArgumentException: No filesystem found for scheme gs” when running dataflow in google cloud platform

淺唱寂寞╮ 提交于 2021-02-19 05:01:47
问题 I am running my google dataflow job in Google Cloud Platform(GCP). When I run this job locally it worked well, but when running it on GCP, I got this error "java.lang.IllegalArgumentException: No filesystem found for scheme gs". I have access to that google cloud URI, I can upload my jar file to that URI and I can see some temporary file for my local job. My Job id in GCP: 2019-08-08_21_47_27-162804342585245230 (beam version:2.12.0) 2019-08-09_16_41_15-11728697820819900062 (beam version:2.14

Google Cloud dataflow : How to initialize Hikari connection pool only once per worker (singleton)?

有些话、适合烂在心里 提交于 2021-02-11 17:40:14
问题 Hibernate Utils is creating the session factory along with Hikari configuration. Currently we are doing inside @Setup method of ParDo, but it opens way too many connections. So is there any good example to initialize connection pool per worker ? 回答1: If you are using @Setup method inside DoFn to create a database connection keep in mind that Apache Beam would create connection pool per worker instance thread. This might result in a lot of database connections depending on the number of

Avro Schema for GenericRecord: Be able to leave blank fields

回眸只為那壹抹淺笑 提交于 2021-02-11 17:13:34
问题 I'm using Java to convert JSON to Avro and store these to GCS using Google DataFlow. The Avro schema is created on runtime using SchemaBuilder. One of the fields I define in the schema is an optional LONG field, it is defined like this: SchemaBuilder.FieldAssembler<Schema> fields = SchemaBuilder.record(mainName).fields(); Schema concreteType = SchemaBuilder.nullable().longType(); fields.name("key1").type(concreteType).noDefault(); Now when I create a GenericRecord using the schema above, and

Avro Schema for GenericRecord: Be able to leave blank fields

夙愿已清 提交于 2021-02-11 17:12:09
问题 I'm using Java to convert JSON to Avro and store these to GCS using Google DataFlow. The Avro schema is created on runtime using SchemaBuilder. One of the fields I define in the schema is an optional LONG field, it is defined like this: SchemaBuilder.FieldAssembler<Schema> fields = SchemaBuilder.record(mainName).fields(); Schema concreteType = SchemaBuilder.nullable().longType(); fields.name("key1").type(concreteType).noDefault(); Now when I create a GenericRecord using the schema above, and

Avro Schema for GenericRecord: Be able to leave blank fields

北战南征 提交于 2021-02-11 17:11:47
问题 I'm using Java to convert JSON to Avro and store these to GCS using Google DataFlow. The Avro schema is created on runtime using SchemaBuilder. One of the fields I define in the schema is an optional LONG field, it is defined like this: SchemaBuilder.FieldAssembler<Schema> fields = SchemaBuilder.record(mainName).fields(); Schema concreteType = SchemaBuilder.nullable().longType(); fields.name("key1").type(concreteType).noDefault(); Now when I create a GenericRecord using the schema above, and

Better approach to call external API in apache beam

这一生的挚爱 提交于 2021-02-11 14:39:47
问题 I have 2 approaches to initialize the HttpClient in order to make an API call from a ParDo in Apache Beam. Approach 1: Initialise the HttpClient object in the StartBundle and close the HttpClient in FinishBundle . The code is as follows: public class ProcessNewIncomingRequest extends DoFn<String, KV<String, String>> { @StartBundle public void startBundle() { HttpClient client = HttpClient.newHttpClient(); HttpRequest request = HttpRequest.newBuilder() .uri(URI.create(<Custom_URL>)) .build();