google-cloud-pubsub

Reading from Pubsub using Dataflow Java SDK 2

放肆的年华 提交于 2019-12-14 02:17:57
问题 A lot of the documentation for the Google Cloud Platform for Java SDK 2.x tell you to reference Beam documentation. When reading from PubSub using Dataflow, should I still be doing PubsubIO.Read.named("name").topic(""); Or should I be doing something else? Also building off of that, is there a way to just print PubSub data received by the Dataflow to standard output or to a file? 回答1: For Apache Beam 2.2.0, you can define the following transform to pull messages from a Pub/Sub subscription:

Is it possible to perform real-time communication with a Google Compute Engine instance?

半世苍凉 提交于 2019-12-13 05:34:59
问题 I would like to run a program on my laptop (Gazebo simulator) and send a stream of image data to a GCE instance, where it will be run through an object-detection network and sent back to my laptop in near real-time. Is such a set-up possible? My best idea right now is, for each image: Save the image as a JPEG on my personal machine Stream the JPEG to a Cloud Storage bucket Access the storage bucket from my GCE instance and transfer the file to the instance In my python script, convert the

Apache beam parsing data flow pub/sub into a dictionary

别来无恙 提交于 2019-12-13 04:26:08
问题 I am running a streaming pipeline using beam / dataflow. I am reading my input from pub/sub as converting into a dict as below: raw_loads_dict = (p | 'ReadPubsubLoads' >> ReadFromPubSub(topic=PUBSUB_TOPIC_NAME).with_output_types(bytes) | 'JSONParse' >> beam.Map(lambda x: json.loads(x)) ) Since this is done on each element of a high throughput pipeline I am worried that this is not the most efficient way to do this? What is the best practice in this case, considering I am then manipulating the

Dataflow failing to push messages to BigQuery from PubSub

戏子无情 提交于 2019-12-13 04:14:51
问题 I am trying to now work a data pipeline. I am using the Python client library to insert the record into PubSub. From there DataFlow is supposed to pick it up and then push into BQ. Dataflow is failing.My guess is because I don't have the right encoding for the data. My code looks like this: data = base64.b64encode(message) publisher.publish(topic_path, data=data) where message is a string. This is the json object which I am trying to push: { "current_speed" : "19.77", "_east" : "-87.654561",

Writing to GCS with Dataflow using element count

℡╲_俬逩灬. 提交于 2019-12-13 04:00:18
问题 This is in reference to Apache Beam SDK Version 2.2.0. I'm attempting to use AfterPane.elementCountAtLeast(...) but not having any success so far. What I want looks a lot like Writing to Google Cloud Storage from PubSub using Cloud Dataflow using DoFn, but needs to be adapted to 2.2.0. Ultimately I just need a simple OR where a file is written after X elements OR Y time has passed. I intend to set the time very high so that the write happens on the number of elements in the majority of cases,

How To Filter None Values Out Of PCollection

点点圈 提交于 2019-12-13 03:58:41
问题 My pubsub pull subscription is sending over the message and a None value for each message. I need to find a way to filter out the none values as part of my pipeline processing Of course some help preventing the none values from arriving from the pull subscription would be nice. But I feel like I'm missing something about the general workflow of defining & applying functions via ParDo. I've set up a function to filter out none values which seems to work based on a print to console check,

Catch error code from GCP pub/sub

▼魔方 西西 提交于 2019-12-13 03:27:32
问题 I am using go package for pub/sub. On my API dashboard I see this error(google.pubsub.v1.Subscriber.StreamingPull - error code 503). Per docs(https://cloud.google.com/pubsub/docs/reference/error-codes) it seems it is transient condition but better to implement backoff strategy(https://cloud.google.com/storage/docs/exponential-backoff). the question is I am not able to wrap my head where this error code is coming on Receive method. Here is func: err = sub.Receive(ctx, func(ctx context.Context,

Unknown configuration 'errors.deadletterqueue.topic.name'

允我心安 提交于 2019-12-13 00:23:37
问题 I am trying to configure Sink Kafka Connect for Google Cloud PubSub Service. Using following command to configure Kafka Connect: curl -X POST -H 'Content-Type: application/json' -H 'Accept: application/json' -d '{ "name": "pubsub_test", "config": { "connector.class": "com.google.pubsub.kafka.sink.CloudPubSubSinkConnector", "tasks.max": "1", "topics": "kafka_test_topic", "cps.topic": "cps_test_topic", "cps.project": "cps_test_project" } }' http://localhost:8083/connectors In status, I have a

Is google.cloud.pubsub_v1.PublisherClient Thread safe?

亡梦爱人 提交于 2019-12-12 19:09:26
问题 I am using google cloud PubSub and was wondering whether google.cloud.pubsub_v1.PublisherClient was thread safe. Do I need to pass a new instance of this object to each threading.Thread or is it safe to share the same instance across threads? 回答1: It depends on the client library you are using. This Python client library is not thread safe due to being built on top of the httplib2 library, which is not thread-safe. But, as the first link says, this is an old library. The newer Python library

Node.js on Google Cloud Platform Pub/Sub tutorial worker is failing with “TypeError: Cannot call method 'on' of null”

吃可爱长大的小学妹 提交于 2019-12-12 13:28:59
问题 I'm getting an error while working through https://cloud.google.com/nodejs/getting-started/using-pub-sub. (I've successfully completed previous tutorials in the series.) With the command "SCRIPT=worker.js PORT=8081 npm start", I get this error related to background.js: TypeError: Cannot call method 'on' of null at /Users/xke/Documents/node.js/6-pubsub/lib/background.js:57:20 at /Users/xke/Documents/node.js/6-pubsub/node_modules/gcloud/lib/pubsub/index.js:256:7 at /Users/xke/Documents/node.js