How does Distinct() function work in Spark?

后端未结

关注

 5  957

后悔当初 2020-12-02 15:43

I\'m a newbie to Apache Spark and was learning basic functionalities. Had a small doubt.Suppose I have an RDD of tuples (key, value) and wanted to obtain some unique ones ou

5条回答

北海茫月 (楼主)

2020-12-02 16:43

distinct uses the hashCode and equals method of the objects for this determination. Tuples come built in with the equality mechanisms delegating down into the equality and position of each object. So, distinct will work against the entire Tuple2 object. As Paul pointed out, you can call keys or values and then distinct. Or you can write your own distinct values via aggregateByKey, which would keep the key pairing. Or if you want the distinct keys, then you could use a regular aggregate

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...