I tried using random.randint(0, 100)
, but some numbers were the same. Is there a method/module to create a list unique random numbers?
Note: The fol
In order to obtain a program that generates a list of random values without duplicates that is deterministic, efficient and built with basic programming constructs consider the function extractSamples
defined below,
def extractSamples(populationSize, sampleSize, intervalLst) :
import random
if (sampleSize > populationSize) :
raise ValueError("sampleSize = "+str(sampleSize) +" > populationSize (= " + str(populationSize) + ")")
samples = []
while (len(samples) < sampleSize) :
i = random.randint(0, (len(intervalLst)-1))
(a,b) = intervalLst[i]
sample = random.randint(a,b)
if (a==b) :
intervalLst.pop(i)
elif (a == sample) : # shorten beginning of interval
intervalLst[i] = (sample+1, b)
elif ( sample == b) : # shorten interval end
intervalLst[i] = (a, sample - 1)
else :
intervalLst[i] = (a, sample - 1)
intervalLst.append((sample+1, b))
samples.append(sample)
return samples
The basic idea is to keep track of intervals intervalLst
for possible values from which to select our required elements from. This is deterministic in the sense that we are guaranteed to generate a sample within a fixed number of steps (solely dependent on populationSize
and sampleSize
).
To use the above function to generate our required list,
In [3]: populationSize, sampleSize = 10**17, 10**5
In [4]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 289 ms, sys: 9.96 ms, total: 299 ms
Wall time: 293 ms
We may also compare with an earlier solution (for a lower value of populationSize)
In [5]: populationSize, sampleSize = 10**8, 10**5
In [6]: %time lst = random.sample(range(populationSize), sampleSize)
CPU times: user 1.89 s, sys: 299 ms, total: 2.19 s
Wall time: 2.18 s
In [7]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 449 ms, sys: 8.92 ms, total: 458 ms
Wall time: 442 ms
Note that I reduced populationSize
value as it produces Memory Error for higher values when using the random.sample
solution (also mentioned in previous answers here and here). For above values, we can also observe that extractSamples
outperforms the random.sample
approach.
P.S. : Though the core approach is similar to my earlier answer, there are substantial modifications in implementation as well as approach alongwith improvement in clarity.