Ensuring that items that have been viewed are not seen again

ぃ、小莉子 提交于 2019-12-01 14:34:37

Use a floating point number between 0 and 1 instead of an integer. It has a nice domain, which doesn't change with the number of entities you have, and doubles have a 52 bit mantissa, which gives us approximately 2^26 objects before we can expect a collision; substantially more than what you're dealing with.

I don't think you need to "choose 30 at random" out of your 100 question pool. The 100 were selected randomly to begin with - if you take the first 30, those will already be randomized. Your code will be simpler, with no less randomness.

Put the items in some order and give each one a number. Maintain a counter for the user which starts at zero. Generate a random(ish) key for each user. Choose a function which maps the combination of an integer in the range 0...N and a key to another integer in the range 0...N, such that for any value off the key, the mapping between integers is bijective. When you need an item, take the value of the counter, put it through the function along with the key, and use that number to index into the list of items. Increment the counter.

All you need to store for each user is the key and the counter, and you have a guarantee that no item will ever be repeated. It's basically a way of constructing a huge permutation on the fly.

The problem is of course choosing the function!

A simple one would be f(counter, key) = (counter + key) mod N, and whilst this would work, it doesn't randomise the items at all, so everyone will get the items in the same order, just starting at different places. If N + 1 is prime, you can use F(counter, key) = (((counter + 1) * (key + 1)) mod (N + 1)) - 1, which would work pretty well. If the range was 0...2^64, you could use any block cipher which has a 64-bit block, which would give excellent randomisation. But it's not clear that you could adapt this to smaller sizes.

I'll have a bit of a dig to see if i can come up with a generically applicable function. This is a problem i've come across myself, and it would be great to finally have a good answer. I'll edit this answer if i find anything.

EDIT: Okay, here we go! I got the key ideas from a thread i started on sci.crypt, and in particular from one Paul Rubin, who is an all-round usenet hero.

Slight change of strategy. Put your items in a list, so they can be accessed by an index. Pick a number B, such that 2^B >= N - any value will do, but you really want the smallest one (ie the ceiling of the base-2 logarithm of N-1). We then refer to 2^B as M. Set up a counter which will run from 0 to M-1, and a key for each user, which can be an arbitrary byte string - a random integer is probably simplest. Set up the Magic Function, which is a bijection, aka a permutation, on the set of integers 0 ... M-1 (see below!). When you want an item, take the value of the counter, increment the counter, then put the original value through the Magic Function to get an index; if the index is greater than N-1, then throw it away, and repeat the process. Repeat until you get an index that is less than N. Sometimes, you'll need to go through one or more useless indices before you get a good one, but on average, it will take M/N tries, which is not too bad (that's less than 2 if you chose the smallest possible B).

Incidentally, calculating the ceiling of the base 2 logarithm of a number is easy:

int lb(int x) {
    int lb = 0;
    while (x > 0) {
        ++lb;
        x >>= 1;
    }
    return lb;
}

Okay, so the Magic Function, which maps a number from 0 ... M-1 to another such number. I mentioned block ciphers above, and that's what we're going to use. Except that our block is B bits long, which is variable and smaller than the normal 64 or 128 bits. So we need a cipher that works on small, variable-sized blocks. So we're going to write our own - a variable-sized mildly-unbalanced Feistel cipher. Easy!

To do a Feistel cipher, you need:

  • Numbers B_L and B_R such that B_L + B_R = B. In a balanced Feistel cipher, B_L = B_R, so B has to be even; we'll use B_L = B/2, and B_R = B - B_L, ie similar to a balanced network, but with a slightly bigger B_R when B is odd.
  • A number of rounds, called C; we'll count rounds with i, which will go from 0 to C-1. I've been told that 4 and 10 are absolute minimums, so let's go with 12 to be on the safe side.
  • A key schedule, which is just a key for each round, each called K_i, all derived from the main key i mentioned above. You could just use the same key for each round; proper ciphers have some way of generating subkeys from the master key. I'd suggest just concatenating the round number to the master key.
  • A round function, called F, which takes a B_R-bit sub-block and a round key, and produces a B_L-bit sub-block; within those constraints, it can be absolutely anything. This is the heart of a Feistel cipher. For us, the best option is just to use an already-existing cryptographic hash function, as this is easy and reliable. SHA-1 is the current favourite. We'll feed it the round key, followed by the input sub-block, compute the hash, then take B_L bits from one end (doesn't matter which) as our output.

A Feistel cipher works like this:

  1. Take a B-bit input word
  2. For i from 0 to C-1:
    1. Split the word into left and right sub-blocks L and R, with B_L and B_R bits respectively
    2. Put R and K_i through the round function F to produce a ciphering value X
    3. Add X to L, discarding any carry bit which overflows from the top bit
    4. Reassemble L and R the wrong way round, with R on the left and L on the right, to form a new value for B

The final value for B is the encrypted value, and is the output from the cipher. Decrypting it is pretty simple - do the above in reverse - but since we don't need to decrypt, don't worry about that.

So there you go. Maintain a counter (and a key, and the value of M), encrypt its value with a tiny cipher, and use the result as an index. Given that the cipher is a permutation, it's easy to show that this will never repeat, which should keep your clients happy. Even better, given the cryptographic properties of the cipher, you can also claim that the students will be unable to predict which question will come next (which is probably not important, but sounds cool).

All this is a bit more complicated than just incrementing a counter and giving them the items in order, but it's not that hard. You could do it in a hundred lines of java. Well, i can do it in a hundred lines of java - i don't know about you! :)

Incidentally, this will work with a growing pool of items, provided that you always add items at the end, although it will never pick items numbered higher than M. In many cases, though, that will give you some room for growth.

Here is an approach to consider. Do not have tons of time to write it up and edit, so apologies in advance for any issues to follow. Create a model with your 100K questions such that you can access them using a key name (e.g. question_000001, question_0000002, etc.) When a student enrolls create his/her record with three TextProperties. When the record is created, generate a randomized set of numbers from 1-100K and serialize it as a delimited text string. The second string is to record answered questions yet to be taskqueue processed. The third string is for answered questions after TQ processing.

When the user logs in, send the first N fields of the to-be-aksed string (N=more than enough questions to service any type of session), plus the whole answered questions string. On the client side, split them into an array of questions to ask, and a hash of questions answered. If the question is in the hash, skip it. As the user works through the questions, each is serviced by a simple get_by_id call to your on-line handler. As each question is answered, client sends it to GAE where you append the question number to the text field that has recently answered questions, and set a boolean to flag for later TQ processing. Once a day or thereabouts, run a taskqueue process to go through and update the record by splitting the questions to be asked text field, and removing all of the questions answered since the last update. Move these to the third text filed as completed questions.

Note that you could skip a lot of this using just two text fields, and updating as the answers get posted. However, my inclination is to always do as little as possible with the on-line handler, and push things to TQs.

No counters (always available by getting the length of you split GAE text fields), no queries (except for the TQ boolean flag query). No complexity. There are numerous ways to make this more efficient re: amount of data transferred. etc.

Wish I had a bit more time to more ensure this is 100% aligned with your needs, but have to go. So I leave you with a "HTH" -stevep

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!