Spark functions vs UDF performance?

后端 未结 3 1518
别跟我提以往
别跟我提以往 2020-11-22 05:29

Spark now offers predefined functions that can be used in dataframes, and it seems they are highly optimized. My original question was going to be on which is faster, but I

3条回答
  •  不知归路
    2020-11-22 05:45

    Use the higher-level standard Column-based functions with Dataset operators whenever possible before reverting to using your own custom UDF functions since UDFs are a BlackBox for Spark and so it does not even try to optimize them.

    What actually happens behind the screens, is that the Catalyst can’t process and optimize UDFs at all, and it threats them as BlackBox, which results in losing many optimizations like Predicate pushdown, Constant folding and many others.

提交回复
热议问题