i have a dataframe with a parquet file and I have to add a new column with some random data, but I need that random data different each other. This is my actual code and the
Spark >= 2.3
It is possible to disable some optimizations using asNondeterministic method:
import org.apache.spark.sql.expressions.UserDefinedFunction
val f: UserDefinedFunction = ???
val fNonDeterministic: UserDefinedFunction = f.asNondeterministic
Please make sure you understand the guarantees before using this option.
Spark < 2.3
Function which is passed to udf should be deterministic (with possible exception of SPARK-20586) and nullary functions calls can be replaced by constants. If you want to generate random numbers use on of the built-in functions:
and transform the output to obtain required distribution for example:
(rand * Integer.MAX_VALUE).cast("bigint").cast("string")