Scala return value calculated in foreach

后端 未结 2 1224
慢半拍i
慢半拍i 2020-12-22 14:12

I am new new to scala and spark and trying to understand few basic stuff out here.

Spark version used 1.5.

why does value of sum does not ge

2条回答
  •  余生分开走
    2020-12-22 14:58

    The way you reason about the program is wrong. foreach is executed independently on each executor and modifies its own copy of sum. There is no global shared state here. Just count values directly:

    df.select("column1").distinct.count
    

    If you really want to handle this manually you'll need some type of reduce:

    df.select("column1").distinct.rdd.map(_ => 1L).reduce(_ + _)
    

提交回复
热议问题