Spark functions vs UDF performance?

后端未结

关注

 3  1518

别跟我提以往 2020-11-22 05:29

Spark now offers predefined functions that can be used in dataframes, and it seems they are highly optimized. My original question was going to be on which is faster, but I

3条回答

不知归路 (楼主)

2020-11-22 05:45

Use the higher-level standard Column-based functions with Dataset operators whenever possible before reverting to using your own custom UDF functions since UDFs are a BlackBox for Spark and so it does not even try to optimize them.

What actually happens behind the screens, is that the Catalyst can’t process and optimize UDFs at all, and it threats them as BlackBox, which results in losing many optimizations like Predicate pushdown, Constant folding and many others.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...