pyspark join rdds by a specific key
问题 I have two rdds that I need to join them together. They look like the followings: RDD1 [(u'2', u'100', 2), (u'1', u'300', 1), (u'1', u'200', 1)] RDD2 [(u'1', u'2'), (u'1', u'3')] My desired output is: [(u'1', u'2', u'100', 2)] So I would like to select those from RDD2 that have the same second value of RDD1. I have tried join and also cartesian and none is working and not getting even close to what I am looking for. I am new to Spark and would appreciate any help from you guys. Thanks 回答1: