How do I create a list of random numbers without duplicates?

前端未结

关注

 17  2405

灰色年华 2020-11-22 13:30

I tried using random.randint(0, 100), but some numbers were the same. Is there a method/module to create a list unique random numbers?

Note: The fol

17条回答

谎友^ (楼主)

2020-11-22 13:54

In order to obtain a program that generates a list of random values without duplicates that is deterministic, efficient and built with basic programming constructs consider the function extractSamples defined below,

def extractSamples(populationSize, sampleSize, intervalLst) : import random if (sampleSize > populationSize) : raise ValueError("sampleSize = "+str(sampleSize) +" > populationSize (= " + str(populationSize) + ")") samples = [] while (len(samples) < sampleSize) : i = random.randint(0, (len(intervalLst)-1)) (a,b) = intervalLst[i] sample = random.randint(a,b) if (a==b) : intervalLst.pop(i) elif (a == sample) : # shorten beginning of interval intervalLst[i] = (sample+1, b) elif ( sample == b) : # shorten interval end intervalLst[i] = (a, sample - 1) else : intervalLst[i] = (a, sample - 1) intervalLst.append((sample+1, b)) samples.append(sample) return samples

The basic idea is to keep track of intervals intervalLst for possible values from which to select our required elements from. This is deterministic in the sense that we are guaranteed to generate a sample within a fixed number of steps (solely dependent on populationSize and sampleSize).

To use the above function to generate our required list,

In [3]: populationSize, sampleSize = 10**17, 10**5 In [4]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)]) CPU times: user 289 ms, sys: 9.96 ms, total: 299 ms Wall time: 293 ms

We may also compare with an earlier solution (for a lower value of populationSize)

In [5]: populationSize, sampleSize = 10**8, 10**5 In [6]: %time lst = random.sample(range(populationSize), sampleSize) CPU times: user 1.89 s, sys: 299 ms, total: 2.19 s Wall time: 2.18 s In [7]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)]) CPU times: user 449 ms, sys: 8.92 ms, total: 458 ms Wall time: 442 ms

Note that I reduced populationSize value as it produces Memory Error for higher values when using the random.sample solution (also mentioned in previous answers here and here). For above values, we can also observe that extractSamples outperforms the random.sample approach.

P.S. : Though the core approach is similar to my earlier answer, there are substantial modifications in implementation as well as approach alongwith improvement in clarity.

0 讨论(0)

查看其它17个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复