Spark now offers predefined functions that can be used in dataframes, and it seems they are highly optimized. My original question was going to be on which is faster, but I
Use the higher-level standard Column-based functions with Dataset operators whenever possible before reverting to using your own custom UDF functions since UDFs are a BlackBox for Spark and so it does not even try to optimize them.
What actually happens behind the screens, is that the Catalyst can’t process and optimize UDFs at all, and it threats them as BlackBox, which results in losing many optimizations like Predicate pushdown, Constant folding and many others.