I know how to create random string, like:
\'\'.join(secrets.choice(string.ascii_uppercase + string.digits) for _ in range(N))
However, ther
Caveat: This is not cryptographically secure. I want to give an alternative numpy approach to the one in Martijn's great answer.
numpy functions aren't really optimised to be called repeatedly in a loop for small tasks; rather, it's better to perform each operation in bulk. This approach gives more keys than you need (massively so in this case because I over-exaggerated the need to overestimate) and so is less memory efficient but is still super fast.
We know that all your string lengths are between 12 and 20. Just generate all the string lengths in one go. We know that the final set has the possibility of trimming down the final list of strings, so we should anticipate that and make more "string lengths" than we need. 20,000 extra is excessive, but it's to make a point:
string_lengths = np.random.randint(12, 20, 60000)
Rather than create all our sequences in a for loop, create a 1D list of characters that is long enough to be cut into 40,000 lists. In the absolute worst case scenario, all the random string lengths in (1) were the max length of 20. That means we need 800,000 characters.
pool = list(string.ascii_letters + string.digits)
random_letters = np.random.choice(pool, size=800000)
Now we just need to chop that list of random characters up. Using np.cumsum() we can get sequential starting indices for the sublists, and np.roll() will offset that array of indices by 1, to give a corresponding array of end indices.
starts = string_lengths.cumsum()
ends = np.roll(string_lengths.cumsum(), -1)
Chop up the list of random characters by the indices.
final = [''.join(random_letters[starts[x]:ends[x]]) for x, _ in enumerate(starts)]
Putting it all together:
def numpy_approach():
pool = list(string.ascii_letters + string.digits)
string_lengths = np.random.randint(12, 20, 60000)
ends = np.roll(string_lengths.cumsum(), -1)
starts = string_lengths.cumsum()
random_letters = np.random.choice(pool, size=800000)
final = [''.join(random_letters[starts[x]:ends[x]]) for x, _ in enumerate(starts)]
return final
And timeit results:
322 ms ± 7.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)