I\'m using Spark 1.6.1 and encountering a strange behaviour: I\'m running an UDF with some heavy computations (a physics simulations) on a dataframe containing some input da
In newer spark verion (2.3+) we can mark UDFs as non-deterministic: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/expressions/UserDefinedFunction.html#asNondeterministic():org.apache.spark.sql.expressions.UserDefinedFunction
i.e. use
val myUdf = udf(...).asNondeterministic()
This makes sure the UDF is only called once