How does Distinct() function work in Spark?

后端 未结 5 957
后悔当初
后悔当初 2020-12-02 15:43

I\'m a newbie to Apache Spark and was learning basic functionalities. Had a small doubt.Suppose I have an RDD of tuples (key, value) and wanted to obtain some unique ones ou

5条回答
  •  北海茫月
    2020-12-02 16:43

    distinct uses the hashCode and equals method of the objects for this determination. Tuples come built in with the equality mechanisms delegating down into the equality and position of each object. So, distinct will work against the entire Tuple2 object. As Paul pointed out, you can call keys or values and then distinct. Or you can write your own distinct values via aggregateByKey, which would keep the key pairing. Or if you want the distinct keys, then you could use a regular aggregate

提交回复
热议问题