Which one will perform better, broadcast variable or broadcast join?

前端 未结 1 1782
星月不相逢
星月不相逢 2020-12-12 07:27

I am using Spark 2.4.1 with Java 8 in my project.

I have a scenario where I need to look-up another table/dataset which has two fields i.e. country-name and country-

相关标签:
1条回答
  • 2020-12-12 08:03

    Quite honestly they should perform similarly, since they are effectively doing the same thing.

    There may be a very slight advantage to allowing spark to do the broadcast join inherently, but it likely depends on your fact table size and overall effect of a broadcast variable's overhead.

    One thing to take note of, the default broadcast threshold is only 10MiB, so if your dimension table is larger than that, you'll want to explicitly use the broadcast() hint.

    0 讨论(0)
提交回复
热议问题