Sort Pandas Dataframe by substrings of a column

和自甴很熟 提交于 2020-01-01 09:57:11

问题


Given a DataFrame:

    name             email
0   Carl    carl@yahoo.com
1    Bob     bob@gmail.com
2  Alice   alice@yahoo.com
3  David  dave@hotmail.com
4    Eve     eve@gmail.com

How can it be sorted according to the email's domain name (alphabetically, ascending), and then, within each domain group, according to the string before the "@"?

The result of sorting the above should then be:

    name             email
0    Bob     bob@gmail.com
1    Eve     eve@gmail.com
2  David  dave@hotmail.com
3  Alice   alice@yahoo.com
4   Carl    carl@yahoo.com

回答1:


Option 1
sorted + reindex

df = df.set_index('email')
df.reindex(sorted(df.index, key=lambda x: x.split('@')[::-1])).reset_index()

              email   name
0     bob@gmail.com    Bob
1     eve@gmail.com    Eve
2  dave@hotmail.com  David
3   alice@yahoo.com  Alice
4    carl@yahoo.com   Carl

Option 2
sorted + pd.DataFrame
As an alternative, you can ditch the reindex call from Option 1 by re-creating a new DataFrame.

pd.DataFrame(
    sorted(df.values, key=lambda x: x[1].split('@')[::-1]), 
    columns=df.columns
)

    name             email
0    Bob     bob@gmail.com
1    Eve     eve@gmail.com
2  David  dave@hotmail.com
3  Alice   alice@yahoo.com
4   Carl    carl@yahoo.com



回答2:


Use:

df = df.reset_index(drop=True)
idx = df['email'].str.split('@', expand=True).sort_values([1,0]).index
df = df.reindex(idx).reset_index(drop=True)
print (df)
    name             email
0    Bob     bob@gmail.com
1    Eve     eve@gmail.com
2  David  dave@hotmail.com
3  Alice   alice@yahoo.com
4   Carl    carl@yahoo.com

Explanation:

  1. First reset_index with drop=True for unique default indices
  2. Then split values to new DataFrame and sort_values
  3. Last reindex to new order


来源:https://stackoverflow.com/questions/49727872/sort-pandas-dataframe-by-substrings-of-a-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!