Generating non-repeating random numbers in Python

前端 未结 17 1740
粉色の甜心
粉色の甜心 2020-11-30 19:56

Ok this is one of those trickier than it sounds questions so I\'m turning to stack overflow because I can\'t think of a good answer. Here is what I want: I need Python to ge

相关标签:
17条回答
  • 2020-11-30 20:34

    I'd rethink the problem itself... You don't seem to be doing anything sequential with the numbers... and you've got an index on the column which has them. Do they actually need to be numbers?

    Consider a sha hash... you don't actually need the entire thing. Do what git or other url shortening services do, and take first 3/4/5 characters of the hash. Given that each character now has 36 possible values instead of 10, you have 2,176,782,336 combinations instead of 999,999 combinations (for six digits). Combine that with a quick check on whether the combination exists (a pure index query) and a seed like a timestamp + random number and it should do for almost any situation.

    0 讨论(0)
  • 2020-11-30 20:37

    With some modular arithmic and prime numbers, you can create all numbers between 0 and a big prime, out of order. If you choose your numbers carefully, the next number is hard to guess.

    modulo = 87178291199 # prime
    incrementor = 17180131327 # relative prime
    
    current = 433494437 # some start value
    for i in xrange(1, 100):
        print current
        current = (current + incrementor) % modulo
    
    0 讨论(0)
  • 2020-11-30 20:38

    If they don't have to be random, but just not obviously linear (1, 2, 3, 4, ...), then here's a simple algorithm:

    Pick two prime numbers. One of them will be the largest number you can generate, so it should be around one billion. The other should be fairly large.

    max_value = 795028841
    step = 360287471
    previous_serial = 0
    for i in xrange(0, max_value):
        previous_serial += step
        previous_serial %= max_value
        print "Serial: %09i" % previous_serial
    

    Just store the previous serial each time so you know where you left off. I can't prove mathmatically that this works (been too long since those particular classes), but it's demonstrably correct with smaller primes:

    s = set()
    with open("test.txt", "w+") as f:
        previous_serial = 0
        for i in xrange(0, 2711):
            previous_serial += 1811
            previous_serial %= 2711
            assert previous_serial not in s
            s.add(previous_serial)
    

    You could also prove it empirically with 9-digit primes, it'd just take a bit more work (or a lot more memory).

    This does mean that given a few serial numbers, it'd be possible to figure out what your values are--but with only nine digits, it's not likely that you're going for unguessable numbers anyway.

    0 讨论(0)
  • 2020-11-30 20:39

    I bumped into the same problem and opened a question with a different title before getting to this one. My solution is a random sample generator of indexes (i.e. non-repeating numbers) in the interval [0,maximal), called itersample. Here are some usage examples:

    import random
    generator=itersample(maximal)
    another_number=generator.next() # pick the next non-repeating random number
    

    or

    import random
    generator=itersample(maximal)
    for random_number in generator:
        # do something with random_number
        if some_condition: # exit loop when needed
            break
    

    itersample generates non-repeating random integers, storage need is limited to picked numbers, and the time needed to pick n numbers should be (as some tests confirm) O(n log(n)), regardelss of maximal.

    Here is the code of itersample:

    import random
    def itersample(c): # c = upper bound of generated integers
        sampled=[]
        def fsb(a,b): # free spaces before middle of interval a,b
            fsb.idx=a+(b+1-a)/2
            fsb.last=sampled[fsb.idx]-fsb.idx if len(sampled)>0 else 0
            return fsb.last
        while len(sampled)<c:
            sample_index=random.randrange(c-len(sampled))
            a,b=0,len(sampled)-1
            if fsb(a,a)>sample_index:
                yielding=sample_index
                sampled.insert(0,yielding)
                yield yielding
            elif fsb(b,b)<sample_index+1:
                yielding=len(sampled)+sample_index
                sampled.insert(len(sampled),yielding)
                yield yielding
            else: # sample_index falls inside sampled list
                while a+1<b:
                    if fsb(a,b)<sample_index+1:
                        a=fsb.idx
                    else:
                        b=fsb.idx
                yielding=a+1+sample_index
                sampled.insert(a+1,yielding)
                yield yielding
    
    0 讨论(0)
  • 2020-11-30 20:44

    My solution https://github.com/glushchenko/python-unique-id, i think you should extend matrix for 1,000,000,000 variations and have fun.

    0 讨论(0)
  • 2020-11-30 20:44

    Do you need this to be cryptographically secure or just hard to guess? How bad are collisions? Because if it needs to be cryptographically strong and have zero collisions, it is, sadly, impossible.

    0 讨论(0)
提交回复
热议问题