About how to add a new column to an existing DataFrame with random values in Scala

前端未结

关注

 2  1822

再見小時候 2020-11-27 08:51

i have a dataframe with a parquet file and I have to add a new column with some random data, but I need that random data different each other. This is my actual code and the

2条回答

星月不相逢 (楼主)

2020-11-27 09:14
Spark >= 2.3

It is possible to disable some optimizations using asNondeterministic method:
```
import org.apache.spark.sql.expressions.UserDefinedFunction

val f: UserDefinedFunction = ???
val fNonDeterministic: UserDefinedFunction = f.asNondeterministic
```
Please make sure you understand the guarantees before using this option.

Spark < 2.3

Function which is passed to udf should be deterministic (with possible exception of SPARK-20586) and nullary functions calls can be replaced by constants. If you want to generate random numbers use on of the built-in functions:
- rand - Generate a random column with independent and identically distributed (i.i.d.) samples from U[0.0, 1.0].
- randn - Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
and transform the output to obtain required distribution for example:
```
(rand * Integer.MAX_VALUE).cast("bigint").cast("string")
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...