Nested lambda statements when sorting lists

前端 未结 8 1017
再見小時候
再見小時候 2020-12-03 21:20

I wish to sort the below list first by the number, then by the text.

lst = [\'b-3\', \'a-2\', \'c-4\', \'d-2\']

# result:
# [\'a-2\', \'d-2\', \'b-3\', \'c-         


        
相关标签:
8条回答
  • 2020-12-03 21:47

    In almost all cases I would simply go with your second attempt. It's readable and concise (I would prefer three simple lines over one complicated line every time!) - even though the function name could be more descriptive. But if you use it as local function that's not going to matter much.

    You also have to remember that Python uses a key function, not a cmp (compare) function. So to sort an iterable of length n the key function is called exactly n times, but sorting generally does O(n * log(n)) comparisons. So whenever your key-function has an algorithmic complexity of O(1) the key-function call overhead isn't going to matter (much). That's because:

    O(n*log(n)) + O(n)   ==  O(n*log(n))
    

    There's one exception and that's the best case for Pythons sort: In the best case the sort only does O(n) comparisons but that only happens if the iterable is already sorted (or almost sorted). If Python had a compare function (and in Python 2 there really was one) then the constant factors of the function would be much more significant because it would be called O(n * log(n)) times (called once for each comparison).

    So don't bother about being more concise or making it much faster (except when you can reduce the big-O without introducing too big constant factors - then you should go for it!), the first concern should be readability. So you should really not do any nested lambdas or any other fancy constructs (except maybe as exercise).

    Long story short, simply use your #2:

    def sorter_func(x):
        text, num = x.split('-')
        return int(num), text
    
    res = sorted(lst, key=sorter_func)
    

    By the way, it's also the fastest of all proposed approaches (although the difference isn't much):

    Summary: It's readable and fast!

    Code to reproduce the benchmark. It requires simple_benchmark to be installed for this to work (Disclaimer: It's my own library) but there are probably equivalent frameworks to do this kind of task, but I'm just familiar with it:

    # My specs: Windows 10, Python 3.6.6 (conda)
    
    import toolz
    import iteration_utilities as it
    
    def approach_jpp_1(lst):
        return sorted(lst, key=lambda x: (int(x.split('-')[1]), x.split('-')[0]))
    
    def approach_jpp_2(lst):
        def sorter_func(x):
            text, num = x.split('-')
            return int(num), text
        return sorted(lst, key=sorter_func)
    
    def jpp_nested_lambda(lst):
        return sorted(lst, key=lambda x: (lambda y: (int(y[1]), y[0]))(x.split('-')))
    
    def toolz_compose(lst):
        return sorted(lst, key=toolz.compose(lambda x: (int(x[1]), x[0]), lambda x: x.split('-')))
    
    def AshwiniChaudhary_list_comprehension(lst):
        return sorted(lst, key=lambda x: [(int(num), text) for text, num in [x.split('-')]])
    
    def AshwiniChaudhary_next(lst):
        return sorted(lst, key=lambda x: next((int(num), text) for text, num in [x.split('-')]))
    
    def PaulCornelius(lst):
        return sorted(lst, key=lambda x: tuple(f(a) for f, a in zip((int, str), reversed(x.split('-')))))
    
    def JeanFrançoisFabre(lst):
        return sorted(lst, key=lambda s : [x if i else int(x) for i,x in enumerate(reversed(s.split("-")))])
    
    def iteration_utilities_chained(lst):
        return sorted(lst, key=it.chained(lambda x: x.split('-'), lambda x: (int(x[1]), x[0])))
    
    from simple_benchmark import benchmark
    import random
    import string
    
    funcs = [
        approach_jpp_1, approach_jpp_2, jpp_nested_lambda, toolz_compose, AshwiniChaudhary_list_comprehension,
        AshwiniChaudhary_next, PaulCornelius, JeanFrançoisFabre, iteration_utilities_chained
    ]
    
    arguments = {2**i: ['-'.join([random.choice(string.ascii_lowercase),
                                  str(random.randint(0, 2**(i-1)))]) 
                        for _ in range(2**i)] 
                 for i in range(3, 15)}
    
    b = benchmark(funcs, arguments, 'list size')
    
    %matplotlib notebook
    b.plot_difference_percentage(relative_to=approach_jpp_2)
    

    I took the liberty to include a function composition approach of one of my own libraries iteration_utilities.chained:

    from iteration_utilities import chained
    sorted(lst, key=chained(lambda x: x.split('-'), lambda x: (int(x[1]), x[0])))
    

    It's quite fast (2nd or 3rd place) but still slower than using your own function.


    Note that the key overhead would be more significant if you used a function that had O(n) (or better) algorithmic complexity, for example min or max. Then the constant factors of the key-function would be more significant!

    0 讨论(0)
  • 2020-12-03 21:50

    In general with FOP ( functional oriented programming ) you can put it all in one liner and nest lambdas within one-liners but that is in general bad etiquette, since after 2 nesting function it all becomes quite unreadable.

    The best way to approach this kind of issue is to split it up in several stages:

    1: splitting string into tuple:

    lst = ['b-3', 'a-2', 'c-4', 'd-2']
    res = map( lambda str_x: tuple( str_x.split('-') ) , lst)   
    

    2: sorting elements like you wished :

    lst = ['b-3', 'a-2', 'c-4', 'd-2']
    res = map( lambda str_x: tuple( str_x.split('-') ) , lst)  
    res = sorted( res, key=lambda x: ( int(x[1]), x[0] ) ) 
    

    Since we split the string into tuple it will return an map object that will be represented as list of tuples. So now the 3rd step is optional:

    3: representing data as you inquired:

    lst = ['b-3', 'a-2', 'c-4', 'd-2']
    res = map( lambda str_x: tuple( str_x.split('-') ) , lst)  
    res = sorted( res, key=lambda x: ( int(x[1]), x[0] ) ) 
    res = map( '-'.join, res )  
    

    Now have in mind that lambda nesting could produce a more one-liner solution and that you can actually embed a non discrete nesting type of lambda like follows:

    a = ['b-3', 'a-2', 'c-4', 'd-2']
    resa = map( lambda x: x.split('-'), a)
    resa = map( lambda x: ( int(x[1]),x[0]) , a) 
    # resa can be written as this, but you must be sure about type you are passing to lambda 
    resa = map( lambda x: tuple( map( lambda y: int(y) is y.isdigit() else y , x.split('-') ) , a)  
    

    But as you can see if contents of list a arent anything but 2 string types separated by '-' , lambda function will raise an error and you will have a bad time figuring what the hell is happening.


    So in the end, i would like to show you several ways the 3rd step program could be written:

    1:

    lst = ['b-3', 'a-2', 'c-4', 'd-2']
    res = map( '-'.join,\
                 sorted(\ 
                      map( lambda str_x: tuple( str_x.split('-') ) , lst),\
                           key=lambda x: ( int(x[1]), x[0] )\
                  )\
             )
    

    2:

    lst = ['b-3', 'a-2', 'c-4', 'd-2']
    res = map( '-'.join,\
            sorted( map( lambda str_x: tuple( str_x.split('-') ) , lst),\
                    key=lambda x: tuple( reversed( tuple(\
                                map( lambda y: int(y) if y.isdigit() else y ,x  )\
                            )))\
                )\
        )  # map isn't reversible
    

    3:

    res = sorted( lst,\
                 key=lambda x:\
                    tuple(reversed(\
                        tuple( \
                            map( lambda y: int(y) if y.isdigit() else y , x.split('-') )\
                        )\
                    ))\
                )
    

    So you can see how this all can get very complicated and incomprehensible. When reading my own or someone else's code i often love to see this version:

    res = map( lambda str_x: tuple( str_x.split('-') ) , lst) # splitting string 
    res = sorted( res, key=lambda x: ( int(x[1]), x[0] ) ) # sorting for each element of splitted string
    res = map( '-'.join, res ) # rejoining string  
    

    That is all from me. Have fun. I've tested all code in py 3.6.


    PS. In general, you have 2 ways to approach lambda functions:

    mult = lambda x: x*2  
    mu_add= lambda x: mult(x)+x #calling lambda from lambda
    

    This way is useful for typical FOP,where you have constant data , and you need to manipulate each element of that data. But if you need to resolve list,tuple,string,dict in lambda these kind of operations aren't very useful, since if any of those container/wrapper types is present the data type of elements inside containers becomes questionable. So we would need to go up a level of abstraction and determine how to manipulate data per its type.

    mult_i = lambda x: x*2 if isinstance(x,int) else 2 # some ternary operator to make our life easier by putting if statement in lambda 
    

    Now you can use another type of lambda function:

    int_str = lambda x: ( lambda y: str(y) )(x)*x # a bit of complex, right?  
    # let me break it down. 
    #all this could be written as: 
    str_i = lambda x: str(x) 
    int_str = lambda x: str_i(x)*x 
    ## we can separate another function inside function with ()
    ##because they can exclude interpreter to look at it first, then do the multiplication  
    # ( lambda x: str(x)) with this we've separated it as new definition of function  
    # ( lambda x: str(x) )(i) we called it and passed it i as argument.  
    

    Some people call this type of syntax as nested lambdas, i call it indiscreet since you can see all.

    And you can use recursive lambda assignment:

    def rec_lambda( data, *arg_lambda ):  
        # filtering all parts of lambda functions parsed as arguments 
        arg_lambda = [ x for x in arg_lambda if type(x).__name__ == 'function' ]  
    
        # implementing first function in line
        data = arg_lambda[0](data)  
    
        if arg_lambda[1:]: # if there are still elements in arg_lambda 
            return rec_lambda( data, *arg_lambda[1:] ) #call rec_lambda
        else: # if arg_lambda is empty or []
            return data # returns data  
    
    #where you can use it like this  
    a = rec_lambda( 'a', lambda x: x*2, str.upper, lambda x: (x,x), '-'.join) 
    >>> 'AA-AA' 
    
    0 讨论(0)
  • 2020-12-03 21:54

    I think* if you are certain the format is consistently "[0]alphabet [1]dash" following indexes beyond [2:] will always be number, then you can replace split with slice, or you can use str.index('-')

    sorted(lst, key=lambda x:(int(x[2:]),x[0]))
    
    # str.index('-') 
    sorted(lst, key=lambda x:(int(x[x.index('-')+1 :]),x[0])) 
    
    0 讨论(0)
  • 2020-12-03 22:01

    There are 2 points to note:

    • One-line answers are not necessarily better. Using a named function is likely to make your code easier to read.
    • You are likely not looking for a nested lambda statement, as function composition is not part of the standard library (see Note #1). What you can do easily is have one lambda function return the result of another lambda function.

    Therefore, the correct answer can found in Lambda inside lambda.

    For your specific problem, you can use:

    res = sorted(lst, key=lambda x: (lambda y: (int(y[1]), y[0]))(x.split('-')))
    

    Remember that lambda is just a function. You can call it immediately after defining it, even on the same line.

    Note #1: The 3rd party toolz library does allow composition:

    from toolz import compose
    
    res = sorted(lst, key=compose(lambda x: (int(x[1]), x[0]), lambda x: x.split('-')))
    

    Note #2: As @chepner points out, the deficiency of this solution (repeated function calls) is one of the reasons why PEP-572 is considered implemented in Python 3.8.

    0 讨论(0)
  • 2020-12-03 22:01

    you could convert to integer only if the index of the item is 0 (when reversing the splitted list). The only object (besides the result of split) which is created is the 2-element list used for comparison. The rest are just iterators.

    sorted(lst,key = lambda s : [x if i else int(x) for i,x in enumerate(reversed(s.split("-")))])
    

    As an aside, the - token isn't particularly great when numbers are involved, because it complicates the use of negative numbers (but can be solved with s.split("-",1)

    0 讨论(0)
  • 2020-12-03 22:02
    lst = ['b-3', 'a-2', 'c-4', 'd-2']
    def xform(l):
        return list(map(lambda x: x[1] + '-' + x[0], list(map(lambda x: x.split('-'), lst))))
    lst = sorted(xform(lst))
    print(xform(lst))
    

    See it here I think @jpp has a better solution, but a fun little brainteaser :-)

    0 讨论(0)
提交回复
热议问题