Finding top N columns for each row in data frame

后端 未结 5 2140
盖世英雄少女心
盖世英雄少女心 2021-01-05 01:24

given a data frame with one descriptive column and X numeric columns, for each row I\'d like to identify the top N columns with the higher values and save it as rows on a ne

5条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-05 02:02

    If you just want pairings:

    from operator import itemgetter as it
    from itertools import repeat
    n = 3
    
     # sort_values = order pandas < 0.17
    new_d = (zip(repeat(row["index"]), map(it(0),(row[1:].sort_values(ascending=0)[:n].iteritems())))
                     for _, row in df.iterrows())
    for row in new_d:
        print(list(row))
    

    Output:

    [('B', 'option3'), ('B', 'option4'), ('B', 'option1')]
    [('C', 'option2'), ('C', 'option5'), ('C', 'option1')]
    [('D', 'option5'), ('D', 'option1'), ('D', 'option2')]
    [('E', 'option1'), ('E', 'option2'), ('E', 'option3')]
    [('F', 'option3'), ('F', 'option1'), ('F', 'option2')]
    

    Which also maintains the order.

    If you want a list of lists:

    from operator import itemgetter as it
    from itertools import repeat
    n = 3
    
    new_d = [list(zip(repeat(row["index"]), map(it(0),(row[1:].sort_values(ascending=0)[:n].iteritems()))))
                     for _, row in df.iterrows()]
    

    Output:

    [[('A', 'option3'), ('A', 'option2'), ('A', 'option4')],
    [('B', 'option3'), ('B', 'option4'), ('B', 'option1')], 
    [('C', 'option2'), ('C', 'option5'), ('C', 'option1')], 
    [('D', 'option5'), ('D', 'option1'), ('D', 'option2')], 
    [('E', 'option1'), ('E', 'option2'), ('E', 'option3')],
    [('F', 'option3'), ('F', 'option1'), ('F', 'option2')]]
    

    Or using pythons sorted:

    new_d = [list(zip(repeat(row["index"]), map(it(0), sorted(row[1:].iteritems(), key=it(1) ,reverse=1)[:n])))
                         for _, row in df.iterrows()]
    

    Which is actually the fastest, if you really want strings, it is pretty trivial to format the output however you want.

提交回复
热议问题