发表新帖

发表新帖

Which one will perform better, broadcast variable or broadcast join?

前端未结

关注

 1  1785

星月不相逢

I am using Spark 2.4.1 with Java 8 in my project.

I have a scenario where I need to look-up another table/dataset which has two fields i.e. country-name and country-

相关标签:

1条回答

再見小時候

2020-12-12 08:03

Quite honestly they should perform similarly, since they are effectively doing the same thing.

There may be a very slight advantage to allowing spark to do the broadcast join inherently, but it likely depends on your fact table size and overall effect of a broadcast variable's overhead.

One thing to take note of, the default broadcast threshold is only 10MiB, so if your dimension table is larger than that, you'll want to explicitly use the broadcast() hint.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题