Generate a list of random weighted tuples from a list

问题

Given a tuple list a:

a =[(23, 11), (10, 16), (13, 11),  (12, 3), (4, 15), (10, 16), (10, 16)]

We can count how many appearances of each tuple we have using Counter:

>>> from collections import Counter
>>> b = Counter(a)
>>> b
Counter({(4, 15): 1, (10, 16): 3, (12, 3): 1, (13, 11): 1, (23, 11): 1}

Now, the idea is to select 3 random tuples from the list, without repetition, such that the count determines the probability that a particular tuple is chosen.

For instance, (10, 16) is more likely to be chosen than the others - its weight is 3/7 while the other four tuples have weight 1/7.

I have tried to use np.random.choice:

a[np.random.choice(len(a), 3, p=b/len(a))]

But I'm not able to generate the tuples.

Im trying:

a =[(23, 11), (10, 16), (13, 11),  (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(a)
c = []
print "counter list"
print b
for item in b:
    print "item from current list"
    print item
    print "prob of the item"
    print (float(b[item])/float(len(a)))

    c.append(float(b[item])/float(len(a)))

print "prob list"
print c

print (np.random.choice(np.arange(len(b)), 3, p=c, replace=False))

In this case im getting the random indexes of the array.

Is there any more optimized way not to have to calculate the probabilities array?
Also there is an issue that is that the prob array does not correspond to the Counter array.

回答1:

If you aren't interested in the intermediate step of calculating frequencies, you could use random.shuffle (either on the list or a copy) and then slice off as many items as you need.

e.g.

import random
a =[(23, 11), (10, 16), (13, 11),  (12, 3), (4, 15), (10, 16), (10, 16)]
random.shuffle(a)
random_sample = a[0:3]
print(random_sample)

As shuffle reorders in place it will avoid the repetition issue, and statistically should give the same result (excluding differences in random number generation between np and random).

回答2:

This will do the trick

from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import random

listOfNumbers =[(23, 11), (10, 16), (13, 11),  (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(listOfNumbers)
c = []
pres=[]
for k,v in b.most_common():
    c.append(float(v)/float(len(listOfNumbers)))
    pres.append(k)

resultIndex = np.random.choice(np.arange(len(b)), 3, p=c, replace=False)

ass=[]
for res in resultIndex:
    ass.append(pres[res])

print ass

Now is just to see if is there any way to optimize it.

回答3:

You can repeat the following steps 3 times:

Randomly chose a number i in range [0..n-1] where n is a current number of elements in a.
Find a tuple on i-th position in the initial a list. Add tuple to resulting triplet.
Remove all occurrences of tuple from a.

Pay attention to a corner case when a can be empty.

The overall time complexity will be O(n) for a list.

On the first step number i should be generated according to uniform distribution which regular random provides. The more occurrences of a particular tuple are in a the more likely it will be chosen.

来源：https://stackoverflow.com/questions/34934540/generate-a-list-of-random-weighted-tuples-from-a-list

标签

python

numpy

tuples

probability