Generating non-repeating random numbers in Python

前端 未结 17 1738
粉色の甜心
粉色の甜心 2020-11-30 19:56

Ok this is one of those trickier than it sounds questions so I\'m turning to stack overflow because I can\'t think of a good answer. Here is what I want: I need Python to ge

相关标签:
17条回答
  • 2020-11-30 20:24

    The standard Linear Congruential random number generator's seed sequence CANNOT repeat until the full set of numbers from the starting seed value have been generated. Then it MUST repeat precisely.

    The internal seed is often large (48 or 64 bits). The generated numbers are smaller (32 bits usually) because the entire set of bits are not random. If you follow the seed values they will form a distinct non-repeating sequence.

    The question is essentially one of locating a good seed that generates "enough" numbers. You can pick a seed, and generate numbers until you get back to the starting seed. That's the length of the sequence. It may be millions or billions of numbers.

    There are some guidelines in Knuth for picking suitable seeds that will generate very long sequences of unique numbers.

    0 讨论(0)
  • 2020-11-30 20:26

    I think you are overestimating the problems with approach 1). Unless you have hard-realtime requirements just checking by random choice terminates rather fast. The probability of needing more than a number of iterations decays exponentially. With 100M numbers outputted (10% fillfactor) you'll have one in billion chance of requiring more than 9 iterations. Even with 50% of numbers taken you'll on average need 2 iterations and have one in a billion chance of requiring more than 30 checks. Or even the extreme case where 99% of the numbers are already taken might still be reasonable - you'll average a 100 iterations and have 1 in a billion change of requiring 2062 iterations

    0 讨论(0)
  • 2020-11-30 20:30

    I started trying to write an explanation of the approach used below, but just implementing it was easier and more accurate. This approach has the odd behavior that it gets faster the more numbers you've generated. But it works, and it doesn't require you to generate all the numbers in advance.

    As a simple optimization, you could easily make this class use a probabilistic algorithm (generate a random number, and if it's not in the set of used numbers add it to the set and return it) at first, keep track of the collision rate, and switch over to the deterministic approach used here once the collision rate gets bad.

    import random
    
    class NonRepeatingRandom(object):
    
        def __init__(self, maxvalue):
            self.maxvalue = maxvalue
            self.used = set()
    
        def next(self):
            if len(self.used) >= self.maxvalue:
                raise StopIteration
            r = random.randrange(0, self.maxvalue - len(self.used))
            result = 0
            for i in range(1, r+1):
                result += 1
                while result in self.used:
                     result += 1
            self.used.add(result)
            return result
    
        def __iter__(self):
            return self
    
        def __getitem__(self):
            raise NotImplemented
    
        def get_all(self):
            return [i for i in self]
    
    >>> n = NonRepeatingRandom(20)
    >>> n.get_all()
    [12, 14, 13, 2, 20, 4, 15, 16, 19, 1, 8, 6, 7, 9, 5, 11, 10, 3, 18, 17]
    
    0 讨论(0)
  • 2020-11-30 20:31

    If you don't need something cryptographically secure, but just "sufficiently obfuscated"...

    Galois Fields

    You could try operations in Galois Fields, e.g. GF(2)32, to map a simple incrementing counter x to a seemingly random serial number y:

    x = counter_value
    y = some_galois_function(x)
    
    • Multiply by a constant
      • Inverse is to multiply by the reciprocal of the constant
    • Raise to a power: xn
    • Reciprocal x-1
      • Special case of raising to power n
      • It is its own inverse
    • Exponentiation of a primitive element: ax
      • Note that this doesn't have an easily-calculated inverse (discrete logarithm)
      • Ensure a is a primitive element, aka generator

    Many of these operations have an inverse, which means, given your serial number, you can calculate the original counter value from which it was derived.

    As for finding a library for Galois Field for Python... good question. If you don't need speed (which you wouldn't for this) then you could make your own. I haven't tried these:

    • NZMATH
    • Finite field Python package
    • Sage, although it's a whole environment for mathematical computing, much more than just a Python library

    Matrix multiplication in GF(2)

    Pick a suitable 32×32 invertible matrix in GF(2), and multiply a 32-bit input counter by it. This is conceptually related to LFSR, as described in S.Lott's answer.

    CRC

    A related possibility is to use a CRC calculation. Based on the remainder of long-division with an irreducible polynomial in GF(2). Python code is readily available for CRCs (crcmod, pycrc), although you might want to pick a different irreducible polynomial than is normally used, for your purposes. I'm a little fuzzy on the theory, but I think a 32-bit CRC should generate a unique value for every possible combination of 4-byte inputs. Check this. It's quite easy to experimentally check this, by feeding the output back into the input, and checking that it produces a complete cycle of length 232-1 (zero just maps to zero). You may need to get rid of any initial/final XORs in the CRC algorithm for this check to work.

    0 讨论(0)
  • 2020-11-30 20:33

    To generate a list of totally random numbers within a defined threshold, as follows:

    plist=list()
    length_of_list=100
    upbound=1000
    lowbound=0
    while len(pList)<(length_of_list):
         pList.append(rnd.randint(lowbound,upbound))
         pList=list(set(pList))
    
    0 讨论(0)
  • 2020-11-30 20:34

    You can run 1) without running into the problem of too many wrong random numbers if you just decrease the random interval by one each time.

    For this method to work, you will need to save the numbers already given (which you want to do anyway) and also save the quantity of numbers taken.

    It is pretty obvious that, after having collected 10 numbers, your pool of possible random numbers will have been decreased by 10. Therefore, you must not choose a number between 1 and 1.000.000 but between 1 an 999.990. Of course this number is not the real number but only an index (unless the 10 numbers collected have been 999.991, 999.992, …); you’d have to count now from 1 omitting all the numbers already collected.

    Of course, your algorithm should be smarter than just counting from 1 to 1.000.000 but I hope you understand the method.

    I don’t like drawing random numbers until I get one which fits either. It just feels wrong.

    0 讨论(0)
提交回复
热议问题