apache-beam

Connecting to Cloud SQL from Dataflow Job

…衆ロ難τιáo~ 提交于 2020-08-06 12:46:53
问题 I'm struggling to use JdbcIO with Apache Beam 2.0 (Java) to connect to a Cloud SQL instance from Dataflow within the same project. I'm getting the following error: java.sql.SQLException: Cannot create PoolableConnectionFactory (Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.) According to the documentation the dataflow service account *@dataflow-service-producer-prod.iam

Connecting to Cloud SQL from Dataflow Job

ⅰ亾dé卋堺 提交于 2020-08-06 12:45:24
问题 I'm struggling to use JdbcIO with Apache Beam 2.0 (Java) to connect to a Cloud SQL instance from Dataflow within the same project. I'm getting the following error: java.sql.SQLException: Cannot create PoolableConnectionFactory (Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.) According to the documentation the dataflow service account *@dataflow-service-producer-prod.iam

Connecting to Cloud SQL from Dataflow Job

白昼怎懂夜的黑 提交于 2020-08-06 12:44:26
问题 I'm struggling to use JdbcIO with Apache Beam 2.0 (Java) to connect to a Cloud SQL instance from Dataflow within the same project. I'm getting the following error: java.sql.SQLException: Cannot create PoolableConnectionFactory (Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.) According to the documentation the dataflow service account *@dataflow-service-producer-prod.iam

Local Pubsub Emulator won't work with Dataflow

你离开我真会死。 提交于 2020-07-23 08:08:08
问题 I am developing Dataflow in Java, the input comes from a Pubsub. Later, I saw a guide here on how to use local Pubsub emulator so I would not need to deploy to GCP in order to test. Here is my simple code: private interface Options extends PipelineOptions, PubsubOptions, StreamingOptions { @Description("Pub/Sub topic to read messages from") String getTopic(); void setTopic(String topic); @Description("Pub/Sub subscription to read messages from") String getSubscription(); void setSubscription

Local Pubsub Emulator won't work with Dataflow

六月ゝ 毕业季﹏ 提交于 2020-07-23 08:05:32
问题 I am developing Dataflow in Java, the input comes from a Pubsub. Later, I saw a guide here on how to use local Pubsub emulator so I would not need to deploy to GCP in order to test. Here is my simple code: private interface Options extends PipelineOptions, PubsubOptions, StreamingOptions { @Description("Pub/Sub topic to read messages from") String getTopic(); void setTopic(String topic); @Description("Pub/Sub subscription to read messages from") String getSubscription(); void setSubscription

Local Pubsub Emulator won't work with Dataflow

故事扮演 提交于 2020-07-23 08:04:39
问题 I am developing Dataflow in Java, the input comes from a Pubsub. Later, I saw a guide here on how to use local Pubsub emulator so I would not need to deploy to GCP in order to test. Here is my simple code: private interface Options extends PipelineOptions, PubsubOptions, StreamingOptions { @Description("Pub/Sub topic to read messages from") String getTopic(); void setTopic(String topic); @Description("Pub/Sub subscription to read messages from") String getSubscription(); void setSubscription

How to infer avro schema from a kafka topic in Apache Beam KafkaIO

柔情痞子 提交于 2020-07-03 12:59:10
问题 I'm using Apache Beam's kafkaIO to read from a topic that has an avro schema in Confluent schema registry. I'm able to deserialize the message and write to files. But ultimately i want to write to BigQuery. My pipeline isn't able to infer the schema. How do I extract/infer the schema and attach it to the data in the pipeline so that my downstream processes (write to BigQuery) can infer the schema? Here is the code where I use the schema registry url to set the deserializer and where i read

Apache Beam : Refreshing a sideinput which i am reading from the MongoDB using MongoDbIO.read()

白昼怎懂夜的黑 提交于 2020-06-29 04:20:09
问题 I am reading a PCollection mongodata from the MongoDB and using this PCollection as a sideInput to my ParDo(DoFN).withSideInputs(PCollection) And from Backend my MongoDB collection is updating on a daily or monthly basis or a yearly may be . And i need that newly added value in my pipeline. We can consider this as refreshing the mongo collection value in a running pipeline. For example of mongo collection has total 20K documents and after one day three more records added into mongo collection

Dataflow / apache beam Trigger window on number of bytes in window

老子叫甜甜 提交于 2020-06-27 15:14:52
问题 I have a simple job that moves data from pub sub to gcs. The pub sub topic is a shared topic with many different message types of varying size I want the result to be in GCS vertically partition accordingly: Schema/version/year/month/day/ under that parent key should be a group of files for that day, and the files should be a reasonable size, ie 10-200 mb Im using scio and i am able to a groupby operation to make a P/SCollection of [String, Iterable[Event]] where the key is based on the

AttributeError: 'module' object has no attribute 'ensure_str'

最后都变了- 提交于 2020-06-27 11:05:28
问题 I try to transfer data from one bigquery to anther through Beam , however, the following error comes up: WARNING:root:Retry with exponential backoff: waiting for 4.12307941111 seconds before retrying get_query_location because we caught exception: AttributeError: 'module' object has no attribute 'ensure_str' Traceback for above exception (most recent call last): File "/usr/local/lib/python2.7/site-packages/apache_beam/utils/retry.py", line 197, in wrapper return fun(*args, **kwargs) File "