Sorting a pandas series

时光怂恿深爱的人放手 提交于 2019-11-26 12:42:38

问题


I am trying to figure out how to sort the Series generated as a result of a groupby aggregation in a smart way.

I generate an aggregation of my DataFrame like this:

means = df.testColumn.groupby(df.testCategory).mean()

This results in a Series. I now try to sort this by value, but get an error:

means.sort()
...
-> Exception: This Series is a view of some other array, to sort in-place you must create a copy

I then try creating a copy:

meansCopy = Series(means)
meansCopy.sort()
-> Exception: This Series is a view of some other array, to sort in-place you must create a copy

How can I get this sort working?


回答1:


Use sort_values, i.e. means = means.sort_values(). [Pandas v0.17+]


(Very old answer, pre-v0.17 / 2015)

pandas used to use order() method: means = means.order().




回答2:


1) Use Series.sort_values()

# Setup.
np.random.seed(0)
df = pd.DataFrame({'A': list('aaabbbbccddd'), 'B': np.random.choice(5, 12)})
ser = df.groupby('A')['B'].mean()
ser

A
a    2.333333
b    2.500000
c    3.000000
d    1.333333
Name: B, dtype: float64

ser.sort_values()

A
d    1.333333
a    2.333333
b    2.500000
c    3.000000
Name: B, dtype: float64

1b) To sort in descending order: sort_values(ascending=False)


2) You can also call Series.argsort() and reindex with __getitem__ / Series.iloc:

ser[ser.argsort()]

A
d    1.333333
a    2.333333
b    2.500000
c    3.000000
Name: B, dtype: float64

ser.iloc[ser.argsort()]

A
d    1.333333
a    2.333333
b    2.500000
c    3.000000
Name: B, dtype: float64

3) Similarly, numpy.argsort() (should be marginally faster):

ser[np.argsort(ser)]
# ser[np.argsort(ser.values)]

A
d    1.333333
a    2.333333
b    2.500000
c    3.000000
Name: B, dtype: float64

3b) To sort in descending order, negate the argument:

ser[(-ser).argsort()]

A
c    3.000000
b    2.500000
a    2.333333
d    1.333333
Name: B, dtype: float64

The process is the same for the other similar methods.


4) If you only care about the values (and not the index), use np.sort:

np.sort(ser)
# array([1.33333333, 2.33333333, 2.5       , 3.        ])

5) As a side note, in-place sorting (calling .sort() on ser.values) is possible but not recommended:

ser.values.sort() will sort the series' values in-place, but won't modify the index, so technically it is incorrect.


[Old pre-v0.17 /2015 methods: order, sort, sortUp, sortDown are deprecated]



来源:https://stackoverflow.com/questions/12133075/sorting-a-pandas-series

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!