Controlling distance of shuffling

后端 未结 7 1929
醉梦人生
醉梦人生 2020-12-05 05:26

I have tried to ask this question before, but have never been able to word it correctly. I hope I have it right this time:

I have a list of unique elements. I want t

7条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-05 05:50

    In short, the list that should be shuffled gets ordered by the sum of index and a random number.

    import random
    xs = range(20) # list that should be shuffled
    d = 5          # distance
    [x for i,x in sorted(enumerate(xs), key= lambda (i,x): i+(d+1)*random.random())]
    

    Out:

    [1, 4, 3, 0, 2, 6, 7, 5, 8, 9, 10, 11, 12, 14, 13, 15, 19, 16, 18, 17]
    

    Thats basically it. But this looks a little bit overwhelming, therefore...

    The algorithm in more detail

    To understand this better, consider this alternative implementation of an ordinary, random shuffle:

    import random
    sorted(range(10), key = lambda x: random.random())
    

    Out:

    [2, 6, 5, 0, 9, 1, 3, 8, 7, 4]
    

    In order to constrain the distance, we have to implement a alternative sort key function that depends on the index of an element. The function sort_criterion is responsible for that.

    import random
    
    def exclusive_uniform(a, b):
        "returns a random value in the interval  [a, b)"
        return a+(b-a)*random.random()
    
    def distance_constrained_shuffle(sequence, distance,
                                     randmoveforward = exclusive_uniform):
        def sort_criterion(enumerate_tuple):
            """
            returns the index plus a random offset,
            such that the result can overtake at most 'distance' elements
            """
            indx, value = enumerate_tuple
            return indx + randmoveforward(0, distance+1)
    
        # get enumerated, shuffled list
        enumerated_result = sorted(enumerate(sequence), key = sort_criterion)
        # remove enumeration
        result = [x for i, x in enumerated_result]
        return result
    

    With the argument randmoveforward you can pass a random number generator with a different probability density function (pdf) to modify the distance distribution.

    The remainder is testing and evaluation of the distance distribution.


    Test function

    Here is an implementation of the test function. The validatefunction is actually taken from the OP, but I removed the creation of one of the dictionaries for performance reasons.

    def test(num_cases = 10, distance = 3, sequence = range(1000)):
        def validate(d, lst, answer):
            #old = {e:i for i,e in enumerate(lst)}
            new = {e:i for i,e in enumerate(answer)}
            return all(abs(i-new[e])<=d for i,e in enumerate(lst))
            #return all(abs(i-new[e])<=d for e,i in old.iteritems())
    
    
        for _ in range(num_cases):
            result = distance_constrained_shuffle(sequence, distance)
            if not validate(distance, sequence, result):
                print "Constraint violated. ", result
                break
        else:
            print "No constraint violations"
    
    
    test()
    

    Out:

    No constraint violations
    

    Distance distribution

    I am not sure whether there is a way to make the distance uniform distributed, but here is a function to validate the distribution.

    def distance_distribution(maxdistance = 3, sequence = range(3000)):
        from collections import Counter
    
        def count_distances(lst, answer):
            new = {e:i for i,e in enumerate(answer)}
            return Counter(i-new[e] for i,e in enumerate(lst))    
    
        answer = distance_constrained_shuffle(sequence, maxdistance)
        counter = count_distances(sequence, answer)
    
        sequence_length = float(len(sequence))
    
        distances = range(-maxdistance, maxdistance+1)
        return distances, [counter[d]/sequence_length for d in distances]
    
    distance_distribution()
    

    Out:

    ([-3, -2, -1, 0, 1, 2, 3],
     [0.01,
      0.076,
      0.22166666666666668,
      0.379,
      0.22933333333333333,
      0.07766666666666666,
      0.006333333333333333])
    

    Distance distribution/pdf for d=3

    Or for a case with greater maximum distance:

    distance_distribution(maxdistance=9, sequence=range(100*1000))
    

    Distance distribution for d=9

提交回复
热议问题