Pandas DataFrame sort by categorical column but by specific class ordering

匿名 (未验证) 提交于 2019-12-03 09:02:45

问题:

I would like to select the top entries in a Pandas dataframe base on the entries of a specific column by using df_selected = df_targets.head(N).

Each entry has a target value (by order of importance):

Likely Supporter, GOTV, Persuasion, Persuasion+GOTV   

Unfortunately if I do

df_targets = df_targets.sort("target") 

the ordering will be alphabetical (GOTV,Likely Supporter, ...).

I was hoping for a keyword like list_ordering as in:

my_list = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"]  df_targets = df_targets.sort("target", list_ordering=my_list) 

To deal with this issue I create a dictionary:

dict_targets = OrderedDict() dict_targets["Likely Supporter"] = "0 Likely Supporter" dict_targets["GOTV"] = "1 GOTV" dict_targets["Persuasion"] = "2 Persuasion" dict_targets["Persuasion+GOTV"] = "3 Persuasion+GOTV" 

, but it seems like a non-pythonic approach.

Suggestions would be much appreciated!

回答1:

I think you need Categorical with parameter ordered=True and then sorting by sort_values works very nice:

import pandas as pd  df = pd.DataFrame({'a': ['GOTV', 'Persuasion', 'Likely Supporter',                           'GOTV', 'Persuasion', 'Persuasion+GOTV']})  df.a = pd.Categorical(df.a,                        categories=["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"],                       ordered=True)  print (df)                   a 0              GOTV 1        Persuasion 2  Likely Supporter 3              GOTV 4        Persuasion 5   Persuasion+GOTV  print (df.a) 0                GOTV 1          Persuasion 2    Likely Supporter 3                GOTV 4          Persuasion 5     Persuasion+GOTV Name: a, dtype: category Categories (4, object): [Likely Supporter < GOTV < Persuasion < Persuasion+GOTV] 
df.sort_values('a', inplace=True) print (df)                   a 2  Likely Supporter 0              GOTV 3              GOTV 1        Persuasion 4        Persuasion 5   Persuasion+GOTV 


回答2:

Thanks to jerzrael's input and references,

I like this sliced solution:

list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"]    df["target"] = df["target"].astype("category", categories=list_ordering, ordered=True) 


回答3:

The method shown in my previous answer is now deprecated.

In stead it is best to use pandas.Categorical as shown here.

So:

list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"]   df["target"] = pd.Categorical(df["target"], categories=list_ordering)  


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!