Top-k on a list of dict in python

南笙酒味 提交于 2020-12-11 04:42:27

问题


Is there an easy way to perform the max k number of key:values pair in this example

s1 = {'val' : 0}
s2 = {'val': 10}
s3 = {'val': 5}
s4 = {'val' : 4}
s5 = {'val' : 6}
s6 = {'val' : 7}
s7 = {'val' : 3}
shapelets = [s1,s2,s3,s4,s5,s6,s7]

I want to get the max 5 numbers in the shapelets list, knowing that it contains a key of name "val" and to which a value is assigned. The solution here resides in parsing through the list of dict elements and get the max n numbers of it ( in this case the max 5 values )

What can be a simple solution, does operator library in python supports such operation ?


回答1:


Here's a working example:

s1 = {'val': 0}
s2 = {'val': 10}
s3 = {'val': 5}
s4 = {'val': 4}
s5 = {'val': 6}
s6 = {'val': 7}
s7 = {'val': 3}
shapelets = [s1, s2, s3, s4, s5, s6, s7]

print(sorted(shapelets, key=lambda x: x['val'])[-5:])



回答2:


You can use heapq:

import heapq

s1 = {'val': 0}
s2 = {'val': 10}
s3 = {'val': 5}
s4 = {'val': 4}
s5 = {'val': 6}
s6 = {'val': 7}
s7 = {'val': 3}
shapelets = [s1, s2, s3, s4, s5, s6, s7]

heapq.nlargest(5,[dct['val'] for dct in shapelets])
# [10, 7, 6, 5, 4]

heapq is likely to be faster than sorted for large lists if you only want a few of the largest values. Some discussions of heapq vs. sorted are here.




回答3:


You could do it in linear time using numpy.argpartition:

from operator import itemgetter
import numpy as np
arr = np.array(list(map(itemgetter("val"), shapelets)))

print(arr[np.argpartition(arr, -5)][-5:])

The 5 max values will not necessarily be in order, if you want that then you would need to sort the k elements returned.



来源:https://stackoverflow.com/questions/39771064/top-k-on-a-list-of-dict-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!