How do I create a list of random numbers without duplicates?

前端 未结 17 2405
灰色年华
灰色年华 2020-11-22 13:30

I tried using random.randint(0, 100), but some numbers were the same. Is there a method/module to create a list unique random numbers?

Note: The fol

17条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-22 13:54

    In order to obtain a program that generates a list of random values without duplicates that is deterministic, efficient and built with basic programming constructs consider the function extractSamples defined below,

    def extractSamples(populationSize, sampleSize, intervalLst) :
        import random
        if (sampleSize > populationSize) :
            raise ValueError("sampleSize = "+str(sampleSize) +" > populationSize (= " + str(populationSize) + ")")
        samples = []
        while (len(samples) < sampleSize) :
            i = random.randint(0, (len(intervalLst)-1))
            (a,b) = intervalLst[i]
            sample = random.randint(a,b)
            if (a==b) :
                intervalLst.pop(i)
            elif (a == sample) : # shorten beginning of interval                                                                                                                                           
                intervalLst[i] = (sample+1, b)
            elif ( sample == b) : # shorten interval end                                                                                                                                                   
                intervalLst[i] = (a, sample - 1)
            else :
                intervalLst[i] = (a, sample - 1)
                intervalLst.append((sample+1, b))
            samples.append(sample)
        return samples
    

    The basic idea is to keep track of intervals intervalLst for possible values from which to select our required elements from. This is deterministic in the sense that we are guaranteed to generate a sample within a fixed number of steps (solely dependent on populationSize and sampleSize).

    To use the above function to generate our required list,

    In [3]: populationSize, sampleSize = 10**17, 10**5
    
    In [4]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
    CPU times: user 289 ms, sys: 9.96 ms, total: 299 ms
    Wall time: 293 ms
    
    

    We may also compare with an earlier solution (for a lower value of populationSize)

    In [5]: populationSize, sampleSize = 10**8, 10**5
    
    In [6]: %time lst = random.sample(range(populationSize), sampleSize)
    CPU times: user 1.89 s, sys: 299 ms, total: 2.19 s
    Wall time: 2.18 s
    
    In [7]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
    CPU times: user 449 ms, sys: 8.92 ms, total: 458 ms
    Wall time: 442 ms
    

    Note that I reduced populationSize value as it produces Memory Error for higher values when using the random.sample solution (also mentioned in previous answers here and here). For above values, we can also observe that extractSamples outperforms the random.sample approach.

    P.S. : Though the core approach is similar to my earlier answer, there are substantial modifications in implementation as well as approach alongwith improvement in clarity.

提交回复
热议问题