Controlling distance of shuffling

后端未结

关注

 7  1929

醉梦人生 2020-12-05 05:26

I have tried to ask this question before, but have never been able to word it correctly. I hope I have it right this time:

I have a list of unique elements. I want t

7条回答

小蘑菇 (楼主)

2020-12-05 05:50

In short, the list that should be shuffled gets ordered by the sum of index and a random number.

import random
xs = range(20) # list that should be shuffled
d = 5          # distance
[x for i,x in sorted(enumerate(xs), key= lambda (i,x): i+(d+1)*random.random())]

Out:

[1, 4, 3, 0, 2, 6, 7, 5, 8, 9, 10, 11, 12, 14, 13, 15, 19, 16, 18, 17]

Thats basically it. But this looks a little bit overwhelming, therefore...

The algorithm in more detail

To understand this better, consider this alternative implementation of an ordinary, random shuffle:

import random
sorted(range(10), key = lambda x: random.random())

Out:

[2, 6, 5, 0, 9, 1, 3, 8, 7, 4]

In order to constrain the distance, we have to implement a alternative sort key function that depends on the index of an element. The function sort_criterion is responsible for that.

import random

def exclusive_uniform(a, b):
    "returns a random value in the interval  [a, b)"
    return a+(b-a)*random.random()

def distance_constrained_shuffle(sequence, distance,
                                 randmoveforward = exclusive_uniform):
    def sort_criterion(enumerate_tuple):
        """
        returns the index plus a random offset,
        such that the result can overtake at most 'distance' elements
        """
        indx, value = enumerate_tuple
        return indx + randmoveforward(0, distance+1)

    # get enumerated, shuffled list
    enumerated_result = sorted(enumerate(sequence), key = sort_criterion)
    # remove enumeration
    result = [x for i, x in enumerated_result]
    return result

With the argument randmoveforward you can pass a random number generator with a different probability density function (pdf) to modify the distance distribution.

The remainder is testing and evaluation of the distance distribution.

Test function

Here is an implementation of the test function. The validatefunction is actually taken from the OP, but I removed the creation of one of the dictionaries for performance reasons.

def test(num_cases = 10, distance = 3, sequence = range(1000)):
    def validate(d, lst, answer):
        #old = {e:i for i,e in enumerate(lst)}
        new = {e:i for i,e in enumerate(answer)}
        return all(abs(i-new[e])<=d for i,e in enumerate(lst))
        #return all(abs(i-new[e])<=d for e,i in old.iteritems())


    for _ in range(num_cases):
        result = distance_constrained_shuffle(sequence, distance)
        if not validate(distance, sequence, result):
            print "Constraint violated. ", result
            break
    else:
        print "No constraint violations"


test()

Out:

No constraint violations

Distance distribution

I am not sure whether there is a way to make the distance uniform distributed, but here is a function to validate the distribution.

def distance_distribution(maxdistance = 3, sequence = range(3000)):
    from collections import Counter

    def count_distances(lst, answer):
        new = {e:i for i,e in enumerate(answer)}
        return Counter(i-new[e] for i,e in enumerate(lst))    

    answer = distance_constrained_shuffle(sequence, maxdistance)
    counter = count_distances(sequence, answer)

    sequence_length = float(len(sequence))

    distances = range(-maxdistance, maxdistance+1)
    return distances, [counter[d]/sequence_length for d in distances]

distance_distribution()

Out:

([-3, -2, -1, 0, 1, 2, 3],
 [0.01,
  0.076,
  0.22166666666666668,
  0.379,
  0.22933333333333333,
  0.07766666666666666,
  0.006333333333333333])

Distance distribution/pdf for d=3

Or for a case with greater maximum distance:

distance_distribution(maxdistance=9, sequence=range(100*1000))

Distance distribution for d=9

0 讨论(0)

查看其它7个回答