问题
Given a DataFrame:
name email
0 Carl carl@yahoo.com
1 Bob bob@gmail.com
2 Alice alice@yahoo.com
3 David dave@hotmail.com
4 Eve eve@gmail.com
How can it be sorted according to the email's domain name (alphabetically, ascending), and then, within each domain group, according to the string before the "@"?
The result of sorting the above should then be:
name email
0 Bob bob@gmail.com
1 Eve eve@gmail.com
2 David dave@hotmail.com
3 Alice alice@yahoo.com
4 Carl carl@yahoo.com
回答1:
Option 1sorted
+ reindex
df = df.set_index('email')
df.reindex(sorted(df.index, key=lambda x: x.split('@')[::-1])).reset_index()
email name
0 bob@gmail.com Bob
1 eve@gmail.com Eve
2 dave@hotmail.com David
3 alice@yahoo.com Alice
4 carl@yahoo.com Carl
Option 2sorted
+ pd.DataFrame
As an alternative, you can ditch the reindex
call from Option 1 by re-creating a new DataFrame.
pd.DataFrame(
sorted(df.values, key=lambda x: x[1].split('@')[::-1]),
columns=df.columns
)
name email
0 Bob bob@gmail.com
1 Eve eve@gmail.com
2 David dave@hotmail.com
3 Alice alice@yahoo.com
4 Carl carl@yahoo.com
回答2:
Use:
df = df.reset_index(drop=True)
idx = df['email'].str.split('@', expand=True).sort_values([1,0]).index
df = df.reindex(idx).reset_index(drop=True)
print (df)
name email
0 Bob bob@gmail.com
1 Eve eve@gmail.com
2 David dave@hotmail.com
3 Alice alice@yahoo.com
4 Carl carl@yahoo.com
Explanation:
- First reset_index with
drop=True
for unique default indices - Then split values to new
DataFrame
and sort_values - Last reindex to new order
来源:https://stackoverflow.com/questions/49727872/sort-pandas-dataframe-by-substrings-of-a-column