Fastest way to generate a random-like unique string with random length in Python 3

后端未结

关注

 5  1615

我寻月下人不归 2020-12-13 14:45

I know how to create random string, like:

\'\'.join(secrets.choice(string.ascii_uppercase + string.digits) for _ in range(N))

However, ther

5条回答

臣服心动 (楼主)

2020-12-13 15:07
Caveat: This is not cryptographically secure. I want to give an alternative numpy approach to the one in Martijn's great answer.

numpy functions aren't really optimised to be called repeatedly in a loop for small tasks; rather, it's better to perform each operation in bulk. This approach gives more keys than you need (massively so in this case because I over-exaggerated the need to overestimate) and so is less memory efficient but is still super fast.
1. We know that all your string lengths are between 12 and 20. Just generate all the string lengths in one go. We know that the final set has the possibility of trimming down the final list of strings, so we should anticipate that and make more "string lengths" than we need. 20,000 extra is excessive, but it's to make a point:
  
  string_lengths = np.random.randint(12, 20, 60000)
2. Rather than create all our sequences in a for loop, create a 1D list of characters that is long enough to be cut into 40,000 lists. In the absolute worst case scenario, all the random string lengths in (1) were the max length of 20. That means we need 800,000 characters.
  
  pool = list(string.ascii_letters + string.digits)
  
  random_letters = np.random.choice(pool, size=800000)
3. Now we just need to chop that list of random characters up. Using np.cumsum() we can get sequential starting indices for the sublists, and np.roll() will offset that array of indices by 1, to give a corresponding array of end indices.
  
  starts = string_lengths.cumsum()
  
  ends = np.roll(string_lengths.cumsum(), -1)
4. Chop up the list of random characters by the indices.
  
  final = [''.join(random_letters[starts[x]:ends[x]]) for x, _ in enumerate(starts)]
Putting it all together:
```
def numpy_approach():
    pool = list(string.ascii_letters + string.digits)
    string_lengths = np.random.randint(12, 20, 60000)   
    ends = np.roll(string_lengths.cumsum(), -1) 
    starts = string_lengths.cumsum()
    random_letters = np.random.choice(pool, size=800000)
    final = [''.join(random_letters[starts[x]:ends[x]]) for x, _ in enumerate(starts)]
    return final
```
And timeit results:
```
322 ms ± 7.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...