Why does the rdd.sample() function on Spark RDD return a different number of elements even though the fraction parameter is the same? For example, if my code is
rdd.sample()
Another way can be to first takeSample and then make RDD. This might be slow with large datasets.
sc.makeRDD(a.takeSample(false, 1000, 1234))