PySpark Suggestion on how to organize RDD
问题 I'm a Spark noobie and I'm trying to test something out on Spark and see if there are any performance boosts for the size of data that I'm using. Each object in my rdd contains a time, id, and position. I want to compare the positions of groups with same times containing the same id. So, I would first run the following to get grouped by id grouped_rdd = rdd.map(lambda x: (x.id, [x])).groupByKey() I would then like to break this into the time of each object. Any suggestions? Thanks! 回答1: First