Comparing two RDDs

喜夏-厌秋 提交于 2019-12-02 08:44:43

new RDD containing just the entries of rdd2 not in rdd1

left join would retain all keys in rdd1 and append columns of RDD2 matching key values. So clearly left join/outer join is not the solution.

rdd1Grouped.subtractByKey(rdd2Grouped) would be apt in your case.

P.S. : Also note that if rdd1 is smaller better broadcast it. In that way, only second rdd would be streamed at the time of subtract.

Switch rdd1Grouped and rdd2Grouped, and then use filter:

val output = rdd2Grouped.leftOuterJoin(rdd1Grouped).filter( line => {
  line._2._2.isEmpty
}).collect
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!