I have written a method that must consider a random number to simulate a Bernoulli distribution. I am using random.nextDouble
to generate a number between 0 and
Just use the SQL function rand
:
import org.apache.spark.sql.functions._
//df: org.apache.spark.sql.DataFrame = [key: int]
df.select($"key", rand() as "rand").show
+---+-------------------+
|key| rand|
+---+-------------------+
| 1| 0.8635073400704648|
| 2| 0.6870153659986652|
| 3|0.18998048357873532|
+---+-------------------+
df.select($"key", rand() as "rand").show
+---+------------------+
|key| rand|
+---+------------------+
| 1|0.3422484248879837|
| 2|0.2301384925817671|
| 3|0.6959421970071372|
+---+------------------+