Spark - Random Number Generation

后端 未结 4 1663
旧时难觅i
旧时难觅i 2021-01-02 00:01

I have written a method that must consider a random number to simulate a Bernoulli distribution. I am using random.nextDouble to generate a number between 0 and

4条回答
  •  旧时难觅i
    2021-01-02 00:40

    Just use the SQL function rand:

    import org.apache.spark.sql.functions._
    
    //df: org.apache.spark.sql.DataFrame = [key: int]
    
    df.select($"key", rand() as "rand").show
    +---+-------------------+
    |key|               rand|
    +---+-------------------+
    |  1| 0.8635073400704648|
    |  2| 0.6870153659986652|
    |  3|0.18998048357873532|
    +---+-------------------+
    
    
    df.select($"key", rand() as "rand").show
    +---+------------------+
    |key|              rand|
    +---+------------------+
    |  1|0.3422484248879837|
    |  2|0.2301384925817671|
    |  3|0.6959421970071372|
    +---+------------------+
    

提交回复
热议问题