可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I would like to select the top entries in a Pandas dataframe base on the entries of a specific column by using df_selected = df_targets.head(N)
.
Each entry has a target
value (by order of importance):
Likely Supporter, GOTV, Persuasion, Persuasion+GOTV
Unfortunately if I do
df_targets = df_targets.sort("target")
the ordering will be alphabetical (GOTV
,Likely Supporter
, ...).
I was hoping for a keyword like list_ordering
as in:
my_list = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"] df_targets = df_targets.sort("target", list_ordering=my_list)
To deal with this issue I create a dictionary:
dict_targets = OrderedDict() dict_targets["Likely Supporter"] = "0 Likely Supporter" dict_targets["GOTV"] = "1 GOTV" dict_targets["Persuasion"] = "2 Persuasion" dict_targets["Persuasion+GOTV"] = "3 Persuasion+GOTV"
, but it seems like a non-pythonic approach.
Suggestions would be much appreciated!
回答1:
I think you need Categorical
with parameter ordered=True
and then sorting by sort_values
works very nice:
import pandas as pd df = pd.DataFrame({'a': ['GOTV', 'Persuasion', 'Likely Supporter', 'GOTV', 'Persuasion', 'Persuasion+GOTV']}) df.a = pd.Categorical(df.a, categories=["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"], ordered=True) print (df) a 0 GOTV 1 Persuasion 2 Likely Supporter 3 GOTV 4 Persuasion 5 Persuasion+GOTV print (df.a) 0 GOTV 1 Persuasion 2 Likely Supporter 3 GOTV 4 Persuasion 5 Persuasion+GOTV Name: a, dtype: category Categories (4, object): [Likely Supporter < GOTV < Persuasion < Persuasion+GOTV]
df.sort_values('a', inplace=True) print (df) a 2 Likely Supporter 0 GOTV 3 GOTV 1 Persuasion 4 Persuasion 5 Persuasion+GOTV
回答2:
Thanks to jerzrael's input and references,
I like this sliced solution:
list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"] df["target"] = df["target"].astype("category", categories=list_ordering, ordered=True)
回答3:
The method shown in my previous answer is now deprecated.
In stead it is best to use pandas.Categorical
as shown here.
So:
list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"] df["target"] = pd.Categorical(df["target"], categories=list_ordering)