问题
Given a tuple list a
:
a =[(23, 11), (10, 16), (13, 11), (12, 3), (4, 15), (10, 16), (10, 16)]
We can count how many appearances of each tuple we have using Counter
:
>>> from collections import Counter
>>> b = Counter(a)
>>> b
Counter({(4, 15): 1, (10, 16): 3, (12, 3): 1, (13, 11): 1, (23, 11): 1}
Now, the idea is to select 3 random tuples from the list, without repetition, such that the count determines the probability that a particular tuple is chosen.
For instance, (10, 16)
is more likely to be chosen than the others - its weight is 3/7 while the other four tuples have weight 1/7.
I have tried to use np.random.choice
:
a[np.random.choice(len(a), 3, p=b/len(a))]
But I'm not able to generate the tuples.
Im trying:
a =[(23, 11), (10, 16), (13, 11), (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(a)
c = []
print "counter list"
print b
for item in b:
print "item from current list"
print item
print "prob of the item"
print (float(b[item])/float(len(a)))
c.append(float(b[item])/float(len(a)))
print "prob list"
print c
print (np.random.choice(np.arange(len(b)), 3, p=c, replace=False))
In this case im getting the random indexes of the array.
Is there any more optimized way not to have to calculate the probabilities array?
Also there is an issue that is that the prob array does not correspond to the Counter array.
回答1:
If you aren't interested in the intermediate step of calculating frequencies, you could use random.shuffle (either on the list or a copy) and then slice off as many items as you need.
e.g.
import random
a =[(23, 11), (10, 16), (13, 11), (12, 3), (4, 15), (10, 16), (10, 16)]
random.shuffle(a)
random_sample = a[0:3]
print(random_sample)
As shuffle reorders in place it will avoid the repetition issue, and statistically should give the same result (excluding differences in random number generation between np and random).
回答2:
This will do the trick
from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import random
listOfNumbers =[(23, 11), (10, 16), (13, 11), (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(listOfNumbers)
c = []
pres=[]
for k,v in b.most_common():
c.append(float(v)/float(len(listOfNumbers)))
pres.append(k)
resultIndex = np.random.choice(np.arange(len(b)), 3, p=c, replace=False)
ass=[]
for res in resultIndex:
ass.append(pres[res])
print ass
Now is just to see if is there any way to optimize it.
回答3:
You can repeat the following steps 3 times:
- Randomly chose a number
i
in range[0..n-1]
wheren
is a current number of elements ina
. - Find a
tuple
oni
-th position in the initiala
list. Addtuple
to resulting triplet. - Remove all occurrences of
tuple
froma
.
Pay attention to a corner case when a
can be empty.
The overall time complexity will be O(n) for a list.
On the first step number i
should be generated according to uniform distribution which regular random
provides. The more occurrences of a particular tuple are in a
the more likely it will be chosen.
来源:https://stackoverflow.com/questions/34934540/generate-a-list-of-random-weighted-tuples-from-a-list