ReduceByKey with a byte array as the key
问题 I would like to work with RDD pairs of Tuple2<byte[], obj> , but byte[] s with the same contents are considered as different values because their reference values are different. I didn't see any to pass in a custom comparer. I could convert the byte[] into a String with an explicit charset, but I'm wondering if there's a more efficient way. 回答1: Custom comparers are insufficient because Spark uses the hashCode of the objects to organize keys in partitions. (At least the HashPartitioner will