sort paired tuple and take top n results

自古美人都是妖i 提交于 2020-12-15 08:33:43

问题


So I have a function that provides the following tuple.

keywords:  [('notre dame coach brian kelly', 21.0), ('put student-athlete health', 14.5), ('fourth acc game impacted', 12.5), ('student-athlete health', 10.5), ('football-related activities', 9.0), ('resuming team activities', 9.0), ('ongoing testing procedures', 8.0), ('october 3rd weekend', 8.0), ('players tested positive', 7.75), ('irish announced 13 players', 7.25), ('notre dame', 6.0), ('positive results', 4.5), ('players handled', 4.25), ('testing results', 4.0), ('primary focus', 4.0), ('prevention protocols', 4.0), ('positivity rates', 4.0), ('knew covid', 4.0), ('present challenges', 4.0), ('discussing options', 4.0), ('future opponents', 4.0), ('pause practices', 4.0), ('decision making', 3.5), ('playing field', 3.5), ('open date', 3.5), ('schools share', 3.5), ('coronavirus issues', 3.5), ('statement tuesday', 3.25), ('acc', 2.5), ('13 players', 2.25), ('game', 2.0), ('weekend', 2.0), ('irish', 2.0), ('testing', 2.0), ('coronavirus', 1.5), ('schools', 1.5), ('date', 1.5), ('decision', 1.5), ('playing', 1.5), ('statement', 1.25), ('saturday', 1.0), ('postponed', 1.0), ('isolation', 1.0), ('94 tests', 1.0), ('monday', 1.0), ('combined', 1.0), ('week', 1.0), ('quarantine', 1.0), ('result', 1.0), ('paused', 1.0), ('working', 1.0), ('reschedule', 1.0), ('safety', 1.0), ('continue', 1.0), ('follow', 1.0), ('managed', 1.0), ('increase', 1.0), ('august', 1.0), ('wonderfully', 1.0), ('season', 1.0), ('ll', 1.0), ('forefront', 1.0), ('forward', 1.0), ('back', 1.0), ('rescheduling', 1.0), ('oct', 1.0), ('involved', 1.0), ('saddened', 1.0), ('unable', 1.0), ('play', 1.0), ('based', 1.0), ('circumstances', 1.0), ('currie', 1.0), ('including', 1.0), ('possibility', 1.0), ('home', 1.0), ('opened', 1.0), ('win', 1.0), ('duke', 1.0), ('time', 1.0), ('days', 1.0), ('month', 1.0), ('rounds', 1.0), ('10', 0), ('3', 0)]

I'm trying to right a function that will take the counts of the top n values. Since its a tuple I thought I could:

sorted(keywords, reverse = True) 

where keywords is the bind storing the tuple but that doesn't work. I tried using Counter but it doesn't seem to work with a list or tuple.

def top_keywords(rake_keywords, n=3):
    """Given a RAKE keywords list of tuples in the form of:

        (keyword, score)

    return the top n keywords.

    rake_keywords is assumed to be in descending order by score, since that is
    how we get it from RAKE. Thus, simply return the first n terms extracted
    from their tuples.

    Returns: a list of strings. Returns an empty string if rake_keywords is empty.
    """
    counts = Counter(rake_keywords)
    most_common = counts.most_common(n)
    return most_common

I'm still new to python with less than two months of experience, so thank you in advance.


回答1:


sorted(keywords, reverse=True, key=lambda t: t[1])

You need to specify a key to sort by, which is the second element of a tuple




回答2:


The reason that just using sorted(keywords, reverse=True) doesn't sort your keywords by score is that when you compare two tuples, the first elements in the tuples are compared first, aka the keywords themselves. Strings can be compared to each other, and you might have noticed that your keywords got sorted in reverse alphabetical order.

To sort the tuples by their score in the second index, you can pass a custom key function to sorted that returns the score from each tuple, and that return value is what sorted will use to compare the tuples:

def key_func(your_tuple):
    return your_tuple[1]

>>> print(sorted(keywords, key=key_func, reverse=True))
[('notre dame coach brian kelly', 21.0),
 ('put student-athlete health', 14.5),
 ('fourth acc game impacted', 12.5),
... 

To add on, since your function simply needs to return the second element of your tuple so sorted sorts by that number, it would be cleaner and more concise to use a lambda expression:

sorted(keywords, key=lambda x: x[1], reverse=True)

Note: You might also have the option of reorganizing your tuples so the scores are the first elements, removing the need for a key function, that is if your other code isn't dependent on the data.



来源:https://stackoverflow.com/questions/64018485/sort-paired-tuple-and-take-top-n-results

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!