Using NumPy to Find Median of Second Element of List of Tuples

让人想犯罪 __ 提交于 2019-12-10 14:57:37

问题


Let's say I have a list of tuples, as follows:

list = [(a,1), (b,3), (c,5)]

My goal is to obtain the first element of the median of the list of tuples, using the tuples' second element. In the above case, I would want an output of b, as the median is 3. I tried using NumPy with the following code, to no avail:

import numpy as np

list = [('a',1), ('b',3), ('c',5)]
np.median(list, key=lambda x:x[1])

回答1:


You could calculate the median like this:

np.median(dict(list).values()) 
# in Python 2.7; in Python 3.x it would be `np.median(list(dict(list_of_tuples).values()))`

That converts your list to a dictionary first and then calculates the median of its values.

When you want to get the actual key, you can do it like this:

dl = dict(list) #{'a': 1, 'b': 3, 'c': 5}

dl.keys()[dl.values().index(np.median(dl.values()))]

which will print 'b'. That assumes that the median is in the list, if not a ValueError will be thrown. You could therefore then use a try/except like this using the example from @Anand S Kumar's answer:

import numpy as np

l = [('a',1), ('b',3), ('c',5), ('d',22),('e',11),('f',3)]

# l = [('a',1), ('b',3), ('c',5)]

dl = dict(l)
try:
    print(dl.keys()[dl.values().index(np.median(dl.values()))])
except ValueError:
    print('The median is not in this list. Its value is ',np.median(dl.values()))
    print('The closest key is ', dl.keys()[min(dl.values(), key=lambda x:abs(x-np.median(dl.values())))])

For the first list you will then obtain:

The median is not in this list. Its value is 4.0

The closest key is f

for your example it just prints:

b




回答2:


np.median does not accept any argument called key . Instead you can use a list comprehension, to take just the second elements from the inner list. Example -

In [3]: l = [('a',1), ('b',3), ('c',5)]

In [4]: np.median([x[1] for x in l])
Out[4]: 3.0

In [5]: l = [('a',1), ('b',3), ('c',5), ('d',22),('e',11),('f',3)]

In [6]: np.median([x[1] for x in l])
Out[6]: 4.0

Also, if its not for example purpose, do not use list as variable name, it shadows the builtin function list .




回答3:


np.median does not accept some sort of 'key' argument, and does not return the index of what it finds. Also, when there are an even number of items (along the axis), it returns the mean of the 2 center items.

But np.partition, which median uses to find the center items, does take structured array field name(s). So if we turn the list of tuples into a structured array, we can easily select the middle item(s).

The list:

In [1001]: ll
Out[1001]: [('a', 1), ('b', 3), ('c', 5)]

as structured array:

In [1002]: la1 = np.array(ll,dtype='a1,i')
In [1003]: la1
Out[1003]: 
array([(b'a', 1), (b'b', 3), (b'c', 5)], 
     dtype=[('f0', 'S1'), ('f1', '<i4')])

we can get the middle item (1 for size 3) with:

In [1115]: np.partition(la1, (1), order='f1')[[1]]
Out[1115]: 
array([(b'b', 3)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

And allowing for even number of items (with code cribbed from np.median):

def mymedian1(arr, field):
    # return the middle items of arr, selected by field
    sz = arr.shape[0]  # 1d for now
    if sz % 2 == 0:
        ind = ((sz // 2)-1, sz // 2)
    else:
        ind = ((sz - 1) // 2,)
    return np.partition(arr, ind, order=field)[list(ind)]

for the 3 item array:

In [1123]: mymedian1(la1,'f1')
Out[1123]: 
array([(b'b', 3)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

for a 6 item array:

In [1124]: la2
Out[1124]: 
array([(b'a', 1), (b'b', 3), (b'c', 5), (b'd', 22), (b'e', 11), (b'f', 3)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

In [1125]: mymedian1(la2,'f1')
Out[1125]: 
array([(b'f', 3), (b'c', 5)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])

See my edit history for an earlier version using np.argpartition.


It even works for the 1st field (the characters):

In [1132]: mymedian1(la2,'f0')
Out[1132]: 
array([(b'c', 5), (b'd', 22)], 
      dtype=[('f0', 'S1'), ('f1', '<i4')])


来源:https://stackoverflow.com/questions/31836655/using-numpy-to-find-median-of-second-element-of-list-of-tuples

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!