Using Java 8 parallelStream inside Spark mapParitions
问题 I am trying to understand the behavior of Java 8 parallel stream inside spark parallelism. When I run the below code, I am expecting the output size of listOfThings to be the same as input size. But that's not the case, I sometimes have missing items in my output. This behavior is not consistent. If I just iterate through the iterator instead of using parallelStream , everything is fine. Count matches every time. // listRDD.count = 10 JavaRDD test = listRDD.mapPartitions(iterator -> { List