Extract sub-string between 2 special characters from one column of Pandas DataFrame

后端未结

关注

 2  588

I have a Python Pandas DataFrame like this:

Name  
Jim, Mr. Jones
Sara, Miss. Baker
Leila, Mrs. Jacob
Ramu, Master. Kuttan

I would like to ext

相关标签:

2条回答

长发绾君心

2021-01-23 15:31

In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)

In [158]: df
Out[158]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

In [163]: df['Title'] = df.Name.str.split(r'\s*,\s*|\s*\.\s*').str[1]

In [164]: df
Out[164]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

0 讨论(0)

無奈伤痛

2021-01-23 15:49

Have a look at str.extract.

The regexp you are looking for is (?<=, )\w+(?=.). In words: take the substring that is preceded by , (but do not include), consists of at least one word character, and ends with a . (but do not include). In future, use an online regexp tester such as regex101; regexps become rather trivial that way.

This is assuming each entry in the Name column is formatted the same way.

0 讨论(0)
发布评论:

提交评论
- 加载中...