Extract sub-string between 2 special characters from one column of Pandas DataFrame

后端 未结 2 575
轮回少年
轮回少年 2021-01-23 15:07

I have a Python Pandas DataFrame like this:

Name  
Jim, Mr. Jones
Sara, Miss. Baker
Leila, Mrs. Jacob
Ramu, Master. Kuttan 

I would like to ext

相关标签:
2条回答
  • 2021-01-23 15:31
    In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)
    
    In [158]: df
    Out[158]:
                       Name   Title
    0        Jim, Mr. Jones      Mr
    1     Sara, Miss. Baker    Miss
    2     Leila, Mrs. Jacob     Mrs
    3  Ramu, Master. Kuttan  Master
    

    or

    In [163]: df['Title'] = df.Name.str.split(r'\s*,\s*|\s*\.\s*').str[1]
    
    In [164]: df
    Out[164]:
                       Name   Title
    0        Jim, Mr. Jones      Mr
    1     Sara, Miss. Baker    Miss
    2     Leila, Mrs. Jacob     Mrs
    3  Ramu, Master. Kuttan  Master
    
    0 讨论(0)
  • 2021-01-23 15:49

    Have a look at str.extract.

    The regexp you are looking for is (?<=, )\w+(?=.). In words: take the substring that is preceded by , (but do not include), consists of at least one word character, and ends with a . (but do not include). In future, use an online regexp tester such as regex101; regexps become rather trivial that way.

    This is assuming each entry in the Name column is formatted the same way.

    0 讨论(0)
提交回复
热议问题