How to sort data frame by column values?

问题

I am relatively new to python and pandas data frames so maybe I have missed something very easy here. So I was having data frame with many rows and columns but at the end finally manage to get only one row with maximum value from each column. I used this code to do that:

import pandas as pd

d = {'A' : [1.2, 2, 4, 6],
     'B' : [2, 8, 10, 12],
     'C' : [5, 3, 4, 5],
     'D' : [3.5, 9, 1, 11],
     'E' : [5, 8, 7.5, 3],
     'F' : [8.8, 4, 3, 2]}


df = pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
print df

Out:
     A   B  C     D    E    F
a  1.2   2  5   3.5  5.0  8.8
b  2.0   8  3   9.0  8.0  4.0
c  4.0  10  4   1.0  7.5  3.0
d  6.0  12  5  11.0  3.0  2.0

Then to choose max value from each column I used this function:

def sorted(s, num):
    tmp = s.order(ascending=False)[:num]
    tmp.index = range(num)
    return tmp

NewDF=df.apply(lambda x: sorted(x, 1))
print NewDF

Out:
     A   B  C     D    E    F
0  6.0  12  5  11.0  8.0  8.8

Yes, I lost row labels (indexes whatever) but this column labels are more important for me to retain. Now I just need to sort columns I need top 5 columns based on values inside them, I need this output:

Out:
   B     D   F    E    A    
0  12.0  11  8.8  8.0  6.0

I was looking for a solution but with no luck. The best I found for sorting by columns is print NewDF.sort(axis=1) but nothing happens.

Edit: Ok, I found one way but with transformation:

transposed = NewDF.T
print(transposed.sort([0], ascending=False))

Is this the only possible way to do it?

回答1:

You can use max with nlargest, because nlargest sorts output:

print df.max().nlargest(5)
B    12.0
D    11.0
F     8.8
E     8.0
A     6.0
dtype: float64

And then convert to DataFrame:

print pd.DataFrame(df.max().nlargest(5)).T
      B     D    F    E    A
0  12.0  11.0  8.8  8.0  6.0

EDIT:

If you need sort one row DataFrame:

print NewDF.T.sort_values(0, ascending=False)
      0
B  12.0
D  11.0
F   8.8
E   8.0
A   6.0
C   5.0

Another solution is apply sort_values:

print NewDF.apply(lambda x: x.sort_values(ascending=False), axis=1)
      B     D    F    E    A    C
0  12.0  11.0  8.8  8.0  6.0  5.0

来源：https://stackoverflow.com/questions/37140223/how-to-sort-data-frame-by-column-values

标签

python-2.7

pandas

dataframe