Sort a list with longest items first

匆匆过客 提交于 2020-07-05 04:13:43

问题


I am using a lambda to modify the behaviour of sort.

sorted(list, key=lambda item:(item.lower(),len(item)))

Sorting a list containing the elements A1,A2,A3,A,B1,B2,B3,B, the result is A,A1,A2,A3,B,B1,B2,B3.

My expected sorted list would be A1,A2,A3,A,B1,B2,B3,B.

I've already tried to include the len(item) for sorting, which didn't work. How to modify the lambda so that the sort result is instead?


回答1:


Here is one way to do it:

>>> import functools
>>> def cmp(s, t):
    'Alter lexicographic sort order to make longer keys go *before* any of their prefixes'
    ls, lt = len(s), len(t)
    if ls < lt:   s += t[ls:] + 'x'
    elif lt < ls: t += s[lt:] + 'x'
    if s < t: return -1
    if s > t: return 1
    return 0

>>> sorted(l, key=functools.cmp_to_key(cmp))
['A1', 'A2', 'A3', 'A', 'B1', 'B2', 'B3', 'B']

Traditionally, lexicographic sort order longer strings after their otherwise identical prefixes (i.e. 'abc' goes before 'abcd').

To meet your sort expectation, we first "fix-up" the shorter string by adding the remaining part of the longer string plus another character to make it the longer of the two:

compare abc to defg     -->  compare abcgx to defg
compare a   to a2       -->  compare a2x to a2

The functools.cmp_to_key() tool then converts the comparison function to a key function.

This may seem like a lot of work, but the sort expectations are very much at odds with the built-in lexicographic sorting rules.

FWIW, here's another way of writing it, that might or might not be considered clearer:

def cmp(s, t):
    'Alter lexicographic sort order to make longer keys go *before* any of their prefixes'
    for p, q in zip(s, t):
        if p < q: return -1
        if q < p: return 1
    if len(s) > len(t): return -1
    elif len(t) > len(s): return 1
    return 0

The logic is:

  • Compare character by character until a different pair is found
  • That differing pair determines the sort order in the traditional way
  • If there is no differing pair, then longest input goes first.
  • If there is no differing pair and the lengths are equal, the strings are equal.



回答2:


My first answer was: just negate the len criterion to reverse only on that criterion.

sorted(list, key=lambda item:(item.lower(),-len(item)))   # doesn't work!

But that doesn't work, because there's a conflict between alpha sort and length. Alpha sort puts small strings first. So length criterion doesn't work.

You need to merge both criteria. There's no clear priority between each other.

I found a way: first compute the max length of your strings, then return the chr(127) filled (the biggest char provided you're using only ASCII) version of the string as key so smallest strings are filled with big chars in the end: they always come last.

l = ["A","B","A1","A2","A3","B1","B2","B3"]

maxlen = max(len(x) for x in l)
print(sorted(l, key=lambda item:item.lower()+chr(127)*(maxlen-len(item))))

result:

['A1', 'A2', 'A3', 'A', 'B1', 'B2', 'B3', 'B']

BTW don't call your list list for obvious reasons.




回答3:


One could construct the key by taking:

  1. the first letter of every item
  2. the length
  3. the item itself

For example:

>>> L = ['A1', 'B2', 'A', 'A2', 'B1', 'A3', 'B3', 'B']
>>> print(sorted(L, key = lambda item: (item[0], -len(item), item)))
['A1', 'A2', 'A3', 'A', 'B1', 'B2', 'B3', 'B']



回答4:


I love Tries, so just for fun, I wrote a Trie-based solution :

class Trie():

    def __init__(self):
        self.data = {}

    def add(self, word):
        ref = self.data
        for char in word:
            ref[char] = char in ref and ref[char] or {}
            ref = ref[char]
        ref[''] = 1


def sorted_print(dct, prefix=''):
    sorted_keys = sorted(filter(bool, dct.keys()), key=str.lower)
    for key in sorted_keys:
        v = dct[key]
        if isinstance(v, dict):
            sorted_print(v, prefix + key)
    if '' in dct:
        print(prefix)

my_list = ["B1", "B3", "B2", "A1", "A2", "A3", "A", "B"]
t = Trie()
for w in my_list:
    t.add(w)


sorted_print(t.data)
# A1
# A2
# A3
# A
# B1
# B2
# B3
# B

This should work for any string of any length.

Note that the result is just printed to screen, not written back in a new list. You didn't write much code, so I'll leave it as an exercise ;)



来源:https://stackoverflow.com/questions/42899405/sort-a-list-with-longest-items-first

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!