Apache Spark update a row in an RDD or Dataset based on another row
问题 I'm trying to figure how I can update some rows based on another another row. For example, I have some data like Id | useraname | ratings | city -------------------------------- 1, philip, 2.0, montreal, ... 2, john, 4.0, montreal, ... 3, charles, 2.0, texas, ... I want to update the users in the same city to the same groupId (either 1 or 2) Id | useraname | ratings | city -------------------------------- 1, philip, 2.0, montreal, ... 1, john, 4.0, montreal, ... 3, charles, 2.0, texas, ...