How to delete text before a specific character - Python (Pandas)

人盡茶涼 提交于 2021-02-05 07:43:14

问题


I have a column in a larger dataset that looks like:

Name
----
Mr. John Doe
Jack Daw
Prof. Charles Winchester
Jane Shaw
... etc.

(Names anonymized)

Basically, its a list of names that have prefixes mixed in. All prefixes end with a dot. So far, the prefixes have been limited to: Mr. Mrs. Ms. Dr. and Prof.

The output I would like is:

Name
----
John Doe
Jack Daw
Charles Winchester
Jane Shaw
... etc.

Ideally, I would like a solution that relies on the position of the dot instead of having to create multiple if conditions (or something equivalent). Below is what I have attempted and where it went wrong:

def mid(s, offset, amount):
    return s[offset:offset+amount]
print(mid(Sample_Raw_Emp_Data['Name'],Sample_Raw_Emp_Data['Name'].str.find('.'),len(Sample_Raw_Emp_Data['Name'])))

Sample_Raw_Emp_Data['Name']=mid(Sample_Raw_Emp_Data['Name'],Sample_Raw_Emp_Data['Name'].str.find('.'),len(Sample_Raw_Emp_Data['Name']))

The above returned the error "TypeError: cannot do slice indexing on with these indexers"

I also tried:

print(Sample_Raw_Emp_Data['Name'][(Sample_Raw_Emp_Data['Name'].str.find('.')):])

Same error as above

A different approach:

Sample_Raw_Emp_Data['Name']=Sample_Raw_Emp_Data['Name'].str.rsplit('.', expand=True,n=1)[1]

The result looked like:

Name
----
John Doe
None
Charles Winchester
None
... etc.

Instances that used to have a prefix remained, while the rest became None. I am not sure how to retain both.

What is going wrong?


回答1:


Try this:

df['Name'].str.split('\.').str[-1].str.strip()

Output:

0              John Doe
1              Jack Daw
2    Charles Winchester
3             Jane Shaw
Name: Name, dtype: object


来源:https://stackoverflow.com/questions/52045995/how-to-delete-text-before-a-specific-character-python-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!