Is groupByKey ever preferred over reduceByKey
问题 I always use reduceByKey when I need to group data in RDDs, because it performs a map side reduce before shuffling data, which often means that less data gets shuffled around and I thus get better performance. Even when the map side reduce function collects all values and does not actually reduce the data amount, I still use reduceByKey , because I\'m assuming that the performance of reduceByKey will never be worse than groupByKey . However, I\'m wondering if this assumption is correct or if