google-cloud-dataflow | 易学教程

Google Cloud Dataflow to BigQuery - UDF - convert unixTimestamp to local time

阅读更多关于 Google Cloud Dataflow to BigQuery - UDF - convert unixTimestamp to local time

问题 What is the best way to convert unixTimestamp to local time in the following scenario? I am using Pub/Sub Subscription to BigQuery Template. Dataflow fetches data in json format from PubSub, does the transformation, inserts into BigQuery Preferably, I want to use UDF for data transformation setup. (For simplicity,) Input data includes only unixTimestamp. Example: {"unixTimestamp": "1612325106000"} Bigquery table has 3 columns: unix_ts:INTEGER, iso_dt:DATETIME, local_dt:DATETIME where unix_ts

Google Cloud Dataflow to BigQuery - UDF - convert unixTimestamp to local time

阅读更多关于 Google Cloud Dataflow to BigQuery - UDF - convert unixTimestamp to local time

Google Cloud Dataflow to BigQuery - UDF - convert unixTimestamp to local time

阅读更多关于 Google Cloud Dataflow to BigQuery - UDF - convert unixTimestamp to local time

Running Apache Beam pipeline in Spring Boot project on Google Data Flow

阅读更多关于 Running Apache Beam pipeline in Spring Boot project on Google Data Flow

问题 I'm trying the run an Apache Beam pipeline in a Spring Boot project on Google Data Flow, but I keep having this error Failed to construct instance from factory method DataflowRunner#fromOptions(interfaceorg.apache.beam.sdk.options.PipelineOptions The example I'm trying to run is a basic word count provided by the official documentation, https://beam.apache.org/get-started/wordcount-example/ . The problem is that this example is using different classes for each example, and each example has

“java.lang.IllegalArgumentException: No filesystem found for scheme gs” when running dataflow in google cloud platform

阅读更多关于 “java.lang.IllegalArgumentException: No filesystem found for scheme gs” when running dataflow in google cloud platform

问题 I am running my google dataflow job in Google Cloud Platform(GCP). When I run this job locally it worked well, but when running it on GCP, I got this error "java.lang.IllegalArgumentException: No filesystem found for scheme gs". I have access to that google cloud URI, I can upload my jar file to that URI and I can see some temporary file for my local job. My Job id in GCP: 2019-08-08_21_47_27-162804342585245230 (beam version:2.12.0) 2019-08-09_16_41_15-11728697820819900062 (beam version:2.14

Google Cloud dataflow : How to initialize Hikari connection pool only once per worker (singleton)?

阅读更多关于 Google Cloud dataflow : How to initialize Hikari connection pool only once per worker (singleton)?

问题 Hibernate Utils is creating the session factory along with Hikari configuration. Currently we are doing inside @Setup method of ParDo, but it opens way too many connections. So is there any good example to initialize connection pool per worker ? 回答1: If you are using @Setup method inside DoFn to create a database connection keep in mind that Apache Beam would create connection pool per worker instance thread. This might result in a lot of database connections depending on the number of

Avro Schema for GenericRecord: Be able to leave blank fields

阅读更多关于 Avro Schema for GenericRecord: Be able to leave blank fields

问题 I'm using Java to convert JSON to Avro and store these to GCS using Google DataFlow. The Avro schema is created on runtime using SchemaBuilder. One of the fields I define in the schema is an optional LONG field, it is defined like this: SchemaBuilder.FieldAssembler<Schema> fields = SchemaBuilder.record(mainName).fields(); Schema concreteType = SchemaBuilder.nullable().longType(); fields.name("key1").type(concreteType).noDefault(); Now when I create a GenericRecord using the schema above, and

Avro Schema for GenericRecord: Be able to leave blank fields

阅读更多关于 Avro Schema for GenericRecord: Be able to leave blank fields

Avro Schema for GenericRecord: Be able to leave blank fields

阅读更多关于 Avro Schema for GenericRecord: Be able to leave blank fields

Better approach to call external API in apache beam

阅读更多关于 Better approach to call external API in apache beam

问题 I have 2 approaches to initialize the HttpClient in order to make an API call from a ParDo in Apache Beam. Approach 1: Initialise the HttpClient object in the StartBundle and close the HttpClient in FinishBundle . The code is as follows: public class ProcessNewIncomingRequest extends DoFn<String, KV<String, String>> { @StartBundle public void startBundle() { HttpClient client = HttpClient.newHttpClient(); HttpRequest request = HttpRequest.newBuilder() .uri(URI.create(<Custom_URL>)) .build();