Pandas sort_values does not sort numbers correctly

天大地大妈咪最大 提交于 2020-12-25 00:48:29

问题


I'm new to pandas and working with tabular data in a programming environment. I have sorted a dataframe by a specific column but the answer that panda spits out is not exactly correct.

Here is the code I have used:

league_dataframe.sort_values('overall_league_position')

The result that the sort method yields values in column 'overall league position' are not sorted in ascending or order which is the default for the method.

What am I doing wrong? Thanks for your patience!


回答1:


For whatever reason, you seem to be working with a column of strings, and sort_values is returning you a lexsorted result.

Here's an example.

df = pd.DataFrame({"Col": ['1', '2', '3', '10', '20', '19']})
df

  Col
0   1
1   2
2   3
3  10
4  20
5  19

df.sort_values('Col')

  Col
0   1
3  10
5  19
1   2
4  20
2   3

The remedy is to convert it to numeric, either using .astype or pd.to_numeric.

df.Col = df.Col.astype(float)

Or,

df.Col = pd.to_numeric(df.Col, errors='coerce')
df.sort_values('Col')

   Col
0    1
1    2
2    3
3   10
5   19
4   20

The only difference b/w astype and pd.to_numeric is that the latter is more robust at handling non-numeric strings (they're coerced to NaN), and will attempt to preserve integers if a coercion to float is not necessary (as is seen in this case).



来源:https://stackoverflow.com/questions/47914274/pandas-sort-values-does-not-sort-numbers-correctly

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!