How to log incoming messages in apache beam pipeline

为君一笑 提交于 2020-01-16 19:06:49

问题


I am writing a simple apache beam streaming pipeline, taking input from a pubsub topic and storing this into bigquery. For hours I thought I am not able to even read a message, as I was simply trying to log the input to console:

events = p | 'Read PubSub' >> ReadFromPubSub(subscription=SUBSCRIPTION)
logging.info(events)

When I write this to text it works fine! However my call to the logger never happens.

How to people develop / debug these streaming pipelines?

I have tried adding the following line: events | 'Log' >> logging.info(events)

Using print() also yields no results in the console.


回答1:


This is because events is a PCollection so you need to apply a PTransform to it.

The simplest way would be to apply a ParDo to events:

events | 'Log results' >> beam.ParDo(LogResults())

which is defined as:

class LogResults(beam.DoFn):
  """Just log the results"""
  def process(self, element):
    logging.info("Pub/Sub event: %s", element)
    yield element

Notice that I also yield the element in case you want to apply further steps downstream, such as writing to a sink after logging the elements. See the issue here, for example.



来源:https://stackoverflow.com/questions/56912517/how-to-log-incoming-messages-in-apache-beam-pipeline

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!