Fastest way to sort each row in a pandas dataframe

后端 未结 5 2235
温柔的废话
温柔的废话 2020-12-01 21:06

I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.

So something like this:

A   B   C         


        
相关标签:
5条回答
  • 2020-12-01 21:17

    To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame... not really sure why this works, but it does. There seems to be no control on the order.

        In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
    
        In [98]: A
        Out[98]: 
        one  two  three  four  five
        0   22   63     72    46    49
        1   43   30     69    33    25
        2   93   24     21    56    39
        3    3   57     52    11    74
    
        In [99]: A.values.sort
        Out[99]: <function ndarray.sort>
    
        In [100]: A
        Out[100]: 
        one  two  three  four  five
        0   22   63     72    46    49
        1   43   30     69    33    25
        2   93   24     21    56    39
        3    3   57     52    11    74
    
        In [101]: A.values.sort()
    
        In [102]: A
        Out[102]: 
        one  two  three  four  five
        0   22   46     49    63    72
        1   25   30     33    43    69
        2   21   24     39    56    93
        3    3   11     52    57    74
        In [103]: A = A.iloc[:,::-1]
    
        In [104]: A
        Out[104]: 
        five  four  three  two  one
        0    72    63     49   46   22
        1    69    43     33   30   25
        2    93    56     39   24   21
        3    74    57     52   11    3
    

    I hope someone can explain the why of this, just happy that it works 8)

    0 讨论(0)
  • 2020-12-01 21:36

    One could try this approach to preserve the integrity of the df:

    import pandas as pd 
    import numpy as np
    
    A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five']) 
    print (A) 
    print(type(A))
    
       one  two  three  four  five
    0   85   27     64    50    55
    1    3   90     65    22     8
    2    0    7     64    66    82
    3   58   21     42    27    30
    <class 'pandas.core.frame.DataFrame'>
    
    B = A.apply(lambda x: np.sort(x), axis=1, raw=True) 
    print(B) 
    print(type(B))
    
       one  two  three  four  five
    0   27   50     55    64    85
    1    3    8     22    65    90
    2    0    7     64    66    82
    3   21   27     30    42    58
    <class 'pandas.core.frame.DataFrame'>
    
    0 讨论(0)
  • 2020-12-01 21:37

    You could use pd.apply.

    Eg:
    
    A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five']) 
    print (A)
    
       one  two  three  four  five
    0    2   75     44    53    46
    1   18   51     73    80    66
    2   35   91     86    44    25
    3   60   97     57    33    79
    
    A = A.apply(np.sort, axis = 1) 
    print(A)
    
       one  two  three  four  five
    0    2   44     46    53    75
    1   18   51     66    73    80
    2   25   35     44    86    91
    3   33   57     60    79    97
    

    Since you want it in descending order, you can simply multiply the dataframe with -1 and sort it.

    A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
    A = A * -1
    A = A.apply(np.sort, axis = 1)
    A = A * -1
    
    0 讨论(0)
  • 2020-12-01 21:41

    I think I would do this in numpy:

    In [11]: a = df.values
    
    In [12]: a.sort(axis=1)  # no ascending argument
    
    In [13]: a = a[:, ::-1]  # so reverse
    
    In [14]: a
    Out[14]:
    array([[8, 4, 3, 1],
           [9, 7, 2, 2]])
    
    In [15]: pd.DataFrame(a, df.index, df.columns)
    Out[15]:
       A  B  C  D
    0  8  4  3  1
    1  9  7  2  2
    

    I had thought this might work, but it sorts the columns:

    In [21]: df.sort(axis=1, ascending=False)
    Out[21]:
       D  C  B  A
    0  1  8  4  3
    1  2  7  2  9
    

    Ah, pandas raises:

    In [22]: df.sort(df.columns, axis=1, ascending=False)
    

    ValueError: When sorting by column, axis must be 0 (rows)

    0 讨论(0)
  • 2020-12-01 21:43

    Instead of using pd.DataFrame constructor, an easier way to assign the sorted values back is to use double brackets:

    original dataframe:

    A   B   C   D
    3   4   8   1
    9   2   7   2
    
    df[['A', 'B', 'C', 'D']] = np.sort(df)[:, ::-1]
    
       A  B  C  D
    0  8  4  3  1
    1  9  7  2  2
    

    This way you can also sort a part of the columns:

    df[['B', 'C']] = np.sort(df[['B', 'C']])[:, ::-1]
    
       A  B  C  D
    0  3  8  4  1
    1  9  7  2  2
    
    0 讨论(0)
提交回复
热议问题