Generating non-repeating random numbers in Python

前端未结

关注

 17  1752

Ok this is one of those trickier than it sounds questions so I\'m turning to stack overflow because I can\'t think of a good answer. Here is what I want: I need Python to ge

相关标签:

17条回答

栀梦

2020-11-30 20:34

I'd rethink the problem itself... You don't seem to be doing anything sequential with the numbers... and you've got an index on the column which has them. Do they actually need to be numbers?

Consider a sha hash... you don't actually need the entire thing. Do what git or other url shortening services do, and take first 3/4/5 characters of the hash. Given that each character now has 36 possible values instead of 10, you have 2,176,782,336 combinations instead of 999,999 combinations (for six digits). Combine that with a quick check on whether the combination exists (a pure index query) and a seed like a timestamp + random number and it should do for almost any situation.

0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2020-11-30 20:37
With some modular arithmic and prime numbers, you can create all numbers between 0 and a big prime, out of order. ~~If you choose your numbers carefully, the next number is hard to guess.~~
```
modulo = 87178291199 # prime
incrementor = 17180131327 # relative prime

current = 433494437 # some start value
for i in xrange(1, 100):
    print current
    current = (current + incrementor) % modulo
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-11-30 20:38
If they don't have to be random, but just not obviously linear (1, 2, 3, 4, ...), then here's a simple algorithm:

Pick two prime numbers. One of them will be the largest number you can generate, so it should be around one billion. The other should be fairly large.
```
max_value = 795028841
step = 360287471
previous_serial = 0
for i in xrange(0, max_value):
    previous_serial += step
    previous_serial %= max_value
    print "Serial: %09i" % previous_serial
```
Just store the previous serial each time so you know where you left off. I can't prove mathmatically that this works (been too long since those particular classes), but it's demonstrably correct with smaller primes:
```
s = set()
with open("test.txt", "w+") as f:
    previous_serial = 0
    for i in xrange(0, 2711):
        previous_serial += 1811
        previous_serial %= 2711
        assert previous_serial not in s
        s.add(previous_serial)
```
You could also prove it empirically with 9-digit primes, it'd just take a bit more work (or a lot more memory).

This does mean that given a few serial numbers, it'd be possible to figure out what your values are--but with only nine digits, it's not likely that you're going for unguessable numbers anyway.
0 讨论(0)
发布评论:

提交评论
- 加载中...

梦谈多话

2020-11-30 20:39

I bumped into the same problem and opened a question with a different title before getting to this one. My solution is a random sample generator of indexes (i.e. non-repeating numbers) in the interval [0,maximal), called itersample. Here are some usage examples:

import random
generator=itersample(maximal)
another_number=generator.next() # pick the next non-repeating random number

import random
generator=itersample(maximal)
for random_number in generator:
    # do something with random_number
    if some_condition: # exit loop when needed
        break

itersample generates non-repeating random integers, storage need is limited to picked numbers, and the time needed to pick n numbers should be (as some tests confirm) O(n log(n)), regardelss of maximal.

Here is the code of itersample:

import random
def itersample(c): # c = upper bound of generated integers
    sampled=[]
    def fsb(a,b): # free spaces before middle of interval a,b
        fsb.idx=a+(b+1-a)/2
        fsb.last=sampled[fsb.idx]-fsb.idx if len(sampled)>0 else 0
        return fsb.last
    while len(sampled)<c:
        sample_index=random.randrange(c-len(sampled))
        a,b=0,len(sampled)-1
        if fsb(a,a)>sample_index:
            yielding=sample_index
            sampled.insert(0,yielding)
            yield yielding
        elif fsb(b,b)<sample_index+1:
            yielding=len(sampled)+sample_index
            sampled.insert(len(sampled),yielding)
            yield yielding
        else: # sample_index falls inside sampled list
            while a+1<b:
                if fsb(a,b)<sample_index+1:
                    a=fsb.idx
                else:
                    b=fsb.idx
            yielding=a+1+sample_index
            sampled.insert(a+1,yielding)
            yield yielding

0 讨论(0)

后悔当初

2020-11-30 20:44

My solution https://github.com/glushchenko/python-unique-id, i think you should extend matrix for 1,000,000,000 variations and have fun.

0 讨论(0)
发布评论:

提交评论
- 加载中...
走了就别回头了

2020-11-30 20:44

Do you need this to be cryptographically secure or just hard to guess? How bad are collisions? Because if it needs to be cryptographically strong and have zero collisions, it is, sadly, impossible.

0 讨论(0)
发布评论:

提交评论
- 加载中...