Algorithm to print out a shuffled list, in-place and with O(1) memory

后端 未结 10 1389
野趣味
野趣味 2020-12-09 11:51

After reading this question I started to wonder: is it possible to have a shuffling algorithm which does not modify or copy the original list?

To make it clear:

相关标签:
10条回答
  • 2020-12-09 12:42

    Essentially what you need is a random number generator that produces the numbers 0..n-1 exactly once each.

    Here's a half-baked idea: You could do pretty well by picking a prime p slightly larger than n, then picking a random x between 1 and p-1 whose order in the multiplicative group mod p is p-1 (pick random xs and test which ones satisfy x^i != 1 for i < p-1, you will only need to test a few before you find one). Since x then generates the group, just compute x^i mod p for 1 <= i <= p-2 and that will give you p-2 distinct random(ish) numbers between 2 and p-1. Subtract 2 and throw out the ones >= n and that gives you a sequence of indexes to print.

    This isn't terribly random, but you can use the same technique multiple times, taking the indexes above (+1) and using them as the exponents of another generator x2 modulo another prime p2 (you'll need n < p2 < p), and so on. A dozen repetitions should make things pretty random.

    0 讨论(0)
  • 2020-12-09 12:47

    Here is a very simple proof that no PRNG scheme can work:

    The PRNG idea has two phases: first, select a PRNG and its initial state; second, use the PRNG to shuffle the output. Well, there are N! possible permutations, so you need at least N! different possible start states, entering phase 2. This means that at the start of phase 2 you must have at least log2 N! bits of state, which isn't allowed.

    However this does not rule out schemes where the algorithm receives new random bits from the environment as it goes. There might be, say, a PRNG that reads its initial state lazily and yet is guaranteed not to repeat. Can we prove there isn't?

    Suppose we do have a perfect shuffling algorithm. Imagine we start running it, and when it's halfway done, we put the computer to sleep. Now the full state of the program has been saved somewhere. Let S be the set of all possible states the program could be in at this halfway mark.

    Since the algorithm is correct and guaranteed to terminate, there is a function f which, given the saved program state plus any long enough string of bits, produces a valid sequence of disk reads and writes completing the shuffle. The computer itself implements this function. But consider it as a mathematical function:

    f : (S × bits) → sequence of reads and writes

    Then, trivially, there exists a function g which, given only the saved program state, produces the set of disk locations yet to be read and written. (Simply pass some arbitrary string of bits to f, then look at the results.)

    g : Sset of locations to read and write

    The remaining bit of the proof is to show that the domain of g contains at least NCN/2 different sets regardless of the choice of algorithm. If that's true, there must be at least that many elements of S, and so the state of the program must contain at least log2 NCN/2 bits at the halfway mark, in violation of the requirements.

    I'm not sure how to prove that last bit, though, since either the set-of-locations-to-read or the set-of-locations-to-write can be low-entropy, depending on the algorithm. I suspect there's some obvious principle of information theory that can cut the knot. Marking this community wiki in the hopes someone will supply it.

    0 讨论(0)
  • 2020-12-09 12:47

    You can create a pseudorandom, 'secure' permutation using a block cipher - see here. They key insight is that, given a block cipher of n bits length, you can use 'folding' to shorten it to m < n bits, then the trick antti.huima already mentioned to generate a smaller permutation from it without spending huge amounts of time discarding out-of-range values.

    0 讨论(0)
  • 2020-12-09 12:48

    Those 10,000,000 items are only references (or pointers) to actual items, so your list will not be that large. Only ~40MB on 32-bit architecture for all references + size of internal variables of that list. In case when your items are smaller than reference size, you just copy whole list.

    0 讨论(0)
提交回复
热议问题