Spark HashPartitioner Unexpected Partitioning

旧巷老猫 提交于 2019-12-02 02:30:38

There is nothing strange going on here. Utils.nonNegativeMod, which is used by HashPartitioner is implemented as follows:

def nonNegativeMod(x: Int, mod: Int): Int = {
  val rawMod = x % mod
  rawMod + (if (rawMod < 0) mod else 0)
}

With 3 partitions the key distribution is defined as shown below:

for { car <- Seq("Honda", "Toyota", "Kia") } 
  yield (car -> nonNegativeMod(car.hashCode, 3))
Seq[(String, Int)] = List((Honda,1), (Toyota,0), (Kia,1))

which is exactly what you get in your case. In other words, lack of direct hash collision doesn't guarantee lack of collision modulo an arbitrary number.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!