Exception Handling in Apache Beam pipelines using Python

后端 未结 2 1457
半阙折子戏
半阙折子戏 2021-01-19 10:01

I\'m doing a simple pipeline using Apache Beam in python (on GCP Dataflow) to read from PubSub and write on Big Query but can\'t handle exceptions on pipeline to create alte

2条回答
  •  庸人自扰
    2021-01-19 10:47

    You can also use the generator flavor of FlatMap:

    This is similar to the other answer, in that you can use a DoFn in the place of something else, e.g. a CombineFn to produce no outputs when there is an exception or other kind of failed-preconditions.

    def sum_values(values: List[int]) -> Generator[int, None, None]:
        if not values or len(values) < 10:
            logging.error(f'received invalid inputs: {...}')
            return
        yield sum(values)
    
    
    # Now instead of use |CombinePerKey|
    (inputs
      | 'WithKey' >> beam.Map(lambda x: (x.key, x)) \
      | 'GroupByKey' >> beam.GroupByKey() \
      | 'Values' >> beam.Values() \
      | 'MaybeSum' >> beam.FlatMap(sum_values))
    

提交回复
热议问题