apache-beam | 易学教程

Connecting to Cloud SQL from Dataflow Job

阅读更多关于 Connecting to Cloud SQL from Dataflow Job

问题 I'm struggling to use JdbcIO with Apache Beam 2.0 (Java) to connect to a Cloud SQL instance from Dataflow within the same project. I'm getting the following error: java.sql.SQLException: Cannot create PoolableConnectionFactory (Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.) According to the documentation the dataflow service account *@dataflow-service-producer-prod.iam

Connecting to Cloud SQL from Dataflow Job

阅读更多关于 Connecting to Cloud SQL from Dataflow Job

Connecting to Cloud SQL from Dataflow Job

阅读更多关于 Connecting to Cloud SQL from Dataflow Job

Local Pubsub Emulator won't work with Dataflow

阅读更多关于 Local Pubsub Emulator won't work with Dataflow

问题 I am developing Dataflow in Java, the input comes from a Pubsub. Later, I saw a guide here on how to use local Pubsub emulator so I would not need to deploy to GCP in order to test. Here is my simple code: private interface Options extends PipelineOptions, PubsubOptions, StreamingOptions { @Description("Pub/Sub topic to read messages from") String getTopic(); void setTopic(String topic); @Description("Pub/Sub subscription to read messages from") String getSubscription(); void setSubscription

Local Pubsub Emulator won't work with Dataflow

阅读更多关于 Local Pubsub Emulator won't work with Dataflow

Local Pubsub Emulator won't work with Dataflow

阅读更多关于 Local Pubsub Emulator won't work with Dataflow

How to infer avro schema from a kafka topic in Apache Beam KafkaIO

阅读更多关于 How to infer avro schema from a kafka topic in Apache Beam KafkaIO

问题 I'm using Apache Beam's kafkaIO to read from a topic that has an avro schema in Confluent schema registry. I'm able to deserialize the message and write to files. But ultimately i want to write to BigQuery. My pipeline isn't able to infer the schema. How do I extract/infer the schema and attach it to the data in the pipeline so that my downstream processes (write to BigQuery) can infer the schema? Here is the code where I use the schema registry url to set the deserializer and where i read

Apache Beam : Refreshing a sideinput which i am reading from the MongoDB using MongoDbIO.read()

阅读更多关于 Apache Beam : Refreshing a sideinput which i am reading from the MongoDB using MongoDbIO.read()

问题 I am reading a PCollection mongodata from the MongoDB and using this PCollection as a sideInput to my ParDo(DoFN).withSideInputs(PCollection) And from Backend my MongoDB collection is updating on a daily or monthly basis or a yearly may be . And i need that newly added value in my pipeline. We can consider this as refreshing the mongo collection value in a running pipeline. For example of mongo collection has total 20K documents and after one day three more records added into mongo collection

Dataflow / apache beam Trigger window on number of bytes in window

阅读更多关于 Dataflow / apache beam Trigger window on number of bytes in window

问题 I have a simple job that moves data from pub sub to gcs. The pub sub topic is a shared topic with many different message types of varying size I want the result to be in GCS vertically partition accordingly: Schema/version/year/month/day/ under that parent key should be a group of files for that day, and the files should be a reasonable size, ie 10-200 mb Im using scio and i am able to a groupby operation to make a P/SCollection of [String, Iterable[Event]] where the key is based on the

AttributeError: 'module' object has no attribute 'ensure_str'

阅读更多关于 AttributeError: 'module' object has no attribute 'ensure_str'

问题 I try to transfer data from one bigquery to anther through Beam , however, the following error comes up: WARNING:root:Retry with exponential backoff: waiting for 4.12307941111 seconds before retrying get_query_location because we caught exception: AttributeError: 'module' object has no attribute 'ensure_str' Traceback for above exception (most recent call last): File "/usr/local/lib/python2.7/site-packages/apache_beam/utils/retry.py", line 197, in wrapper return fun(*args, **kwargs) File "