Spark - Random Number Generation

后端 未结 4 1685
旧时难觅i
旧时难觅i 2021-01-02 00:01

I have written a method that must consider a random number to simulate a Bernoulli distribution. I am using random.nextDouble to generate a number between 0 and

4条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-02 00:35

    The reason why the same sequence is repeated is that the random generator is created and initialized with a seed before the data is partitioned. Each partition then starts from the same random seed. Maybe not the most efficient way to do it, but the following should work:

    val myClass = new MyClass()
    val M = 3
    
    for (m <- 1 to M) {
      val newDF = sqlContext.createDataFrame(myDF
        .map{ 
           val rand = scala.util.Random
           row => RowFactory
          .create(row.getString(0),
            myClass.myMethod(row.getString(2), rand.nextDouble())
        }, myDF.schema)
    }
    

提交回复
热议问题