How to get a sample with an exact sample size in Spark RDD?
问题 Why does the rdd.sample() function on Spark RDD return a different number of elements even though the fraction parameter is the same? For example, if my code is like below: val a = sc.parallelize(1 to 10000, 3) a.sample(false, 0.1).count Every time I run the second line of the code it returns a different number not equal to 1000. Actually I expect to see 1000 every time although the 1000 elements might be different. Can anyone tell me how I can get a sample with the sample size exactly equal