I have tried to ask this question before, but have never been able to word it correctly. I hope I have it right this time:
I have a list of unique elements. I want t
In short, the list that should be shuffled gets ordered by the sum of index and a random number.
import random
xs = range(20) # list that should be shuffled
d = 5 # distance
[x for i,x in sorted(enumerate(xs), key= lambda (i,x): i+(d+1)*random.random())]
Out:
[1, 4, 3, 0, 2, 6, 7, 5, 8, 9, 10, 11, 12, 14, 13, 15, 19, 16, 18, 17]
Thats basically it. But this looks a little bit overwhelming, therefore...
To understand this better, consider this alternative implementation of an ordinary, random shuffle:
import random
sorted(range(10), key = lambda x: random.random())
Out:
[2, 6, 5, 0, 9, 1, 3, 8, 7, 4]
In order to constrain the distance, we have to implement a alternative sort key function that depends on the index of an element. The function sort_criterion
is responsible for that.
import random
def exclusive_uniform(a, b):
"returns a random value in the interval [a, b)"
return a+(b-a)*random.random()
def distance_constrained_shuffle(sequence, distance,
randmoveforward = exclusive_uniform):
def sort_criterion(enumerate_tuple):
"""
returns the index plus a random offset,
such that the result can overtake at most 'distance' elements
"""
indx, value = enumerate_tuple
return indx + randmoveforward(0, distance+1)
# get enumerated, shuffled list
enumerated_result = sorted(enumerate(sequence), key = sort_criterion)
# remove enumeration
result = [x for i, x in enumerated_result]
return result
With the argument randmoveforward
you can pass a random number generator with a different probability density function (pdf) to modify the distance distribution.
The remainder is testing and evaluation of the distance distribution.
Here is an implementation of the test function. The validate
function is actually taken from the OP, but I removed the creation of one of the dictionaries for performance reasons.
def test(num_cases = 10, distance = 3, sequence = range(1000)):
def validate(d, lst, answer):
#old = {e:i for i,e in enumerate(lst)}
new = {e:i for i,e in enumerate(answer)}
return all(abs(i-new[e])<=d for i,e in enumerate(lst))
#return all(abs(i-new[e])<=d for e,i in old.iteritems())
for _ in range(num_cases):
result = distance_constrained_shuffle(sequence, distance)
if not validate(distance, sequence, result):
print "Constraint violated. ", result
break
else:
print "No constraint violations"
test()
Out:
No constraint violations
I am not sure whether there is a way to make the distance uniform distributed, but here is a function to validate the distribution.
def distance_distribution(maxdistance = 3, sequence = range(3000)):
from collections import Counter
def count_distances(lst, answer):
new = {e:i for i,e in enumerate(answer)}
return Counter(i-new[e] for i,e in enumerate(lst))
answer = distance_constrained_shuffle(sequence, maxdistance)
counter = count_distances(sequence, answer)
sequence_length = float(len(sequence))
distances = range(-maxdistance, maxdistance+1)
return distances, [counter[d]/sequence_length for d in distances]
distance_distribution()
Out:
([-3, -2, -1, 0, 1, 2, 3],
[0.01,
0.076,
0.22166666666666668,
0.379,
0.22933333333333333,
0.07766666666666666,
0.006333333333333333])
Or for a case with greater maximum distance:
distance_distribution(maxdistance=9, sequence=range(100*1000))