Using str.split for pandas dataframe values based on parentheses location

此生再无相见时 提交于 2021-02-05 06:47:05

问题


Let's say I have the following dataframe series df['Name'] column:

         Name
       'Jerry'
  'Adam (and family)'
'Paul and Hellen (and family):\n'
'John and Peter (and family):/n'

How would I remove all the contents in Name after the first parentheses?

df['Name']= df['Name'].str.split("'(").str[0] 

doesn't seem to work and I don't understand why?

The output I want is

         Name
       'Jerry'
        'Adam'
    'Paul and Hellen'
    'John and Peter'

so everything after the parentheses is deleted.


回答1:


Solution with split - is necessary escape ( by \:

df['Name']= df['Name'].str.split("\s+\(").str[0]
print (df)
               Name
0           'Jerry'
1             'Adam
2  'Paul and Hellen
3   'John and Peter

Solution with regex and replace:

df['Name']= df['Name'].str.replace("\s+\(.*$", "")
print (df)
               Name
0           'Jerry'
1             'Adam
2  'Paul and Hellen
3   'John and Peter

\s+\(.*$ means replace from optional whitespace, first ( to the end of string $ to "" - empty string.




回答2:


Use regular expression:

>>> import re
>>> str = 'Adam (and family)'
>>> result = re.sub(r"( \().*$", '', str)
>>> print result
Adam


来源:https://stackoverflow.com/questions/42205616/using-str-split-for-pandas-dataframe-values-based-on-parentheses-location

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!