Algorithm to print out a shuffled list, in-place and with O(1) memory

╄→гoц情女王★ 提交于 2019-11-28 08:32:38
Jason Orendorff

Here is a very simple proof that no PRNG scheme can work:

The PRNG idea has two phases: first, select a PRNG and its initial state; second, use the PRNG to shuffle the output. Well, there are N! possible permutations, so you need at least N! different possible start states, entering phase 2. This means that at the start of phase 2 you must have at least log2 N! bits of state, which isn't allowed.

However this does not rule out schemes where the algorithm receives new random bits from the environment as it goes. There might be, say, a PRNG that reads its initial state lazily and yet is guaranteed not to repeat. Can we prove there isn't?

Suppose we do have a perfect shuffling algorithm. Imagine we start running it, and when it's halfway done, we put the computer to sleep. Now the full state of the program has been saved somewhere. Let S be the set of all possible states the program could be in at this halfway mark.

Since the algorithm is correct and guaranteed to terminate, there is a function f which, given the saved program state plus any long enough string of bits, produces a valid sequence of disk reads and writes completing the shuffle. The computer itself implements this function. But consider it as a mathematical function:

f : (S × bits) → sequence of reads and writes

Then, trivially, there exists a function g which, given only the saved program state, produces the set of disk locations yet to be read and written. (Simply pass some arbitrary string of bits to f, then look at the results.)

g : Sset of locations to read and write

The remaining bit of the proof is to show that the domain of g contains at least NCN/2 different sets regardless of the choice of algorithm. If that's true, there must be at least that many elements of S, and so the state of the program must contain at least log2NCN/2 bits at the halfway mark, in violation of the requirements.

I'm not sure how to prove that last bit, though, since either the set-of-locations-to-read or the set-of-locations-to-write can be low-entropy, depending on the algorithm. I suspect there's some obvious principle of information theory that can cut the knot. Marking this community wiki in the hopes someone will supply it.

Well it depends a bit on what kind of randomness you except for the shuffling, i.e. should all shufflings be as probable, or can the distribution be skewed.

There are mathematical ways to produce "random-looking" permutations of N integers, so if P is such a permutation from 0..N-1 to 0..N-1, you can just iterate x from 0 to N-1 and output list item L(P(x)) instead of L(x) and you have obtained a shuffling. Such permutations can be obtained e.g. using modular arithmetics. For example, if N is prime, P(x)=(x * k) mod N is a permutation for any 0 < k < N (but maps 0 to 0). Similary for a prime N, for example P(x)=(x^3) mod N should be a permutation (but maps 0 to 0 and 1 to 1). This solution can be easily expanded to non-prime N by selecting the least prime above N (call it M), permute upto M, and discard the permuted indices above N (similary below).

It should be noted that modular exponentiation is the basis for many cryptographic algorithms (e.g. RSA, Diffie-Hellman) and is considered a strongly pseudorandom operation by the experts in the field.

Another easy way (not requiring prime numbers) is first to expand the domain so that instead of N you consider M where M is the least power of two above N. So e.g. if N=12 you set M=16. Then you use bijective bit operations, e.g.

P(x) = ((x ^ 0xf) ^ (x << 2) + 3) & 0xf

Then when you output your list, you iterate x from 0 to M-1 and output L(P(x)) only if P(x) is actually < N.

A "true, unbiased random" solution can be constructed by fixing a cryptographically strong block cipher (e.g. AES) and a random key (k) and then iterating the sequence

AES(k, 0), AES(k, 1), ...

and outputting the corresponding item from the sequence iff AES(k,i) < N. This can be done in constant space (the internal memory required by the cipher) and is indistinguishable from a random permutation (due to the cryptographic properties of the cipher) but is obviously very slow. In the case of AES, you would need to iterate until i = 2^128.

You're not allowed to make a copy, modify it, or keep track of which elements you've visited? I'm gonna say it's not possible. Unless I'm misunderstanding your third criteria.

I take it to mean you're not allowed to say, make an array of 10,000,000 corresponding booleans, set to true when you've printed the corresponding element. And you're not allowed to make a list of the 10,000,000 indices, shuffle the list, and print out the elements in that order.

Those 10,000,000 items are only references (or pointers) to actual items, so your list will not be that large. Only ~40MB on 32-bit architecture for all references + size of internal variables of that list. In case when your items are smaller than reference size, you just copy whole list.

It's not possible to do this with a truly random number generator since you either have to:

  • remember which numbers have already been chosen and skip them (which requires an O(n) list of booleans and progressively worsening run-times as you skip more and more numbers); or
  • reduce the pool after each selection (which requires either modifications to the original list or a separate O(n) list to modify).

Neither of those are possibilities in your question so I'm going to have to say "no, you can't do it".

What I would tend to go for in this case is a bit mask of used values but not with skipping since, as mentioned, the run-times get worse as the used values accumulate.

A bit mask will be substantially better than the original list of 39Gb (10 million bits is only about 1.2M), many order of magnitude less as you requested even though it's still O(n).

In order to get around the run-time problem, only generate one random number each time and, if the relevant "used" bit is already set, scan forward through the bit mask until you find one that's not set.

That means you won't be hanging around, desperate for the random number generator to give you a number that hasn't been used yet. The run times will only ever get as bad as the time taken to scan 1.2M of data.

Of course this means that the specific number chosen at any time is skewed based on the numbers that have already been chosen but, since those numbers were random anyway, the skewing is random (and if the numbers weren't truly random to begin with, then the skewing won't matter).

And you could even alternate the search direction (scanning up or down) if you want a bit more variety.

Bottom line: I don't believe what you're asking for is doable but keep in mind I've been wrong before as my wife will attest to, quickly and frequently :-) But, as with all things, there's usually ways to get around such issues.

It sounds impossible.

But 10,000,000 64-bit pointers is only about 76MB.

A linear-feedback shift register can do pretty much what you want -- generate a list of numbers up to some limit, but in a (reasonably) random order. The patterns it produces are statistically similar to what you'd expect from try randomness, but it's not even close to cryptographically secure. The Berlekamp-Massey algorithm allows you to reverse engineer an equivalent LFSR based on an output sequence.

Given your requirement for a list of ~10,000,000 items, you'd want a 24-bit maximal-length LFSR, and simply discard outputs larger than the size of your list.

For what it's worth, an LFSR is generally quite fast compared to a typical linear congruential PRNG of the same period. In hardware, an LFSR is extremely simple, consisting of an N-bit register, and M 2-input XOR's (where M is the number of taps -- sometimes only a couple, and rarely more than a half dozen or so).

If there's enough space, you could store node's pointers in an array, create a bitmap and get random ints that point to the next chosen item. If already chosen (you store that in your bitmap), then get closest one (left or right, you can randomize that), until no items are left.

If there's no enough space, then you could do same without storing node's pointers, but time will suffer (that's the time-space tradeoff ☺).

You can create a pseudorandom, 'secure' permutation using a block cipher - see here. They key insight is that, given a block cipher of n bits length, you can use 'folding' to shorten it to m < n bits, then the trick antti.huima already mentioned to generate a smaller permutation from it without spending huge amounts of time discarding out-of-range values.

Essentially what you need is a random number generator that produces the numbers 0..n-1 exactly once each.

Here's a half-baked idea: You could do pretty well by picking a prime p slightly larger than n, then picking a random x between 1 and p-1 whose order in the multiplicative group mod p is p-1 (pick random xs and test which ones satisfy x^i != 1 for i < p-1, you will only need to test a few before you find one). Since x then generates the group, just compute x^i mod p for 1 <= i <= p-2 and that will give you p-2 distinct random(ish) numbers between 2 and p-1. Subtract 2 and throw out the ones >= n and that gives you a sequence of indexes to print.

This isn't terribly random, but you can use the same technique multiple times, taking the indexes above (+1) and using them as the exponents of another generator x2 modulo another prime p2 (you'll need n < p2 < p), and so on. A dozen repetitions should make things pretty random.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!