Apache Beam : FlatMap vs Map?

后端 未结 3 549
渐次进展
渐次进展 2020-12-14 07:35

I want to understand in which scenario that I should use FlatMap or Map. The documentation did not seem clear to me.

I still do not understand in which scenario I sh

相关标签:
3条回答
  • 2020-12-14 08:05

    Let me show you one example

    import apache_beam as beam
    
    def categorize_explode(text):
      result = text.split(':')
      category = result[0]
      elements = result[1].split(',')
      return list(map(lambda x: (category, x), elements))
    
    with beam.Pipeline() as pipeline:
      things = (
          pipeline
          | 'Categories and Elements' >> beam.Create(["Vehicles:Car,Jeep,Truck,BUS,AIRPLANE","FOOD:Burger,SANDWICH,ICECREAM,APPLE"])
          | 'Explode' >> beam.FlatMap(categorize_explode)
          | beam.Map(print)
      )
    

    As you can see categorize_explode function splits the strings into categories and corresponding elements and returns iterator like [('Vehicles','Car'),('Vehicles','Jeep'),...]

    FlatMap takes each element in this iterator and treats each element as a separate element in PCollection.

    So the result would be:

    ('Vehicles', 'Car')
    ('Vehicles', 'Jeep')
    ('Vehicles', 'Truck')
    ('Vehicles', 'BUS')
    ('Vehicles', 'AIRPLANE')
    ('FOOD', 'Burger')
    ('FOOD', 'SANDWICH')
    ('FOOD', 'ICECREAM')
    ('FOOD', 'APPLE')
    

    While Map performs one to one mapping. i.e. this iterator [('Vehicles','Car'),('Vehicles','Jeep'),...] would be returned as it is.

    So the result would be for Map:

    [('Vehicles', 'Car'), ('Vehicles', 'Jeep'), ('Vehicles', 'Truck'), ('Vehicles', 'BUS'), ('Vehicles', 'AIRPLANE')]
    [('FOOD', 'Burger'), ('FOOD', 'SANDWICH'), ('FOOD', 'ICECREAM'), ('FOOD', 'APPLE')]
    

    The approach I have used is somewhat similar to spark explode transform.

    Hope this helps!!!

    0 讨论(0)
  • 2020-12-14 08:09

    These transforms in Beam are exactly same as Spark (Scala too).

    A Map transform, maps from a PCollection of N elements into another PCollection of N elements.

    A FlatMap transform maps a PCollections of N elements into N collections of zero or more elements, which are then flattened into a single PCollection.

    As a simple example, the following happens:

    beam.Create([1, 2, 3]) | beam.Map(lambda x: [x, 'any'])
    # The result is a collection of THREE lists: [[1, 'any'], [2, 'any'], [3, 'any']]
    

    Whereas:

    beam.Create([1, 2, 3]) | beam.FlatMap(lambda x: [x, 'any'])
    # The lists that are output by the lambda, are then flattened into a
    # collection of SIX single elements: [1, 'any', 2, 'any', 3, 'any']
    
    0 讨论(0)
  • 2020-12-14 08:23

    In simplest word,

    Map transformation is "one to one" mapping on each element of list/collection. Ex -

    {"Amar", "Akabar", "Anthony"} -> {"Mr.Amar", "Mr.Akabar", "Mr.Anthony"}
    

    FlatMap transformation is usually on collection like "list of list", and this collection gets flattened to single list and transformation/mapping is applied on each element of "list of list"/collection.

    FlatMap transformation Ex -

    { {"Amar", "Akabar"},  "Anthony"} -> {"Mr.Amar", "Mr.Akabar", "Mr.Anthony"}
    

    This concept remains same across programming language and across platform.

    Hope it helps.

    0 讨论(0)
提交回复
热议问题