Extract substring from text in a pandas DataFrame as new column

不想你离开。 提交于 2019-12-04 11:05:58

Use str.extract:

df['EXTRACT'] = df.TEXT.str.extract('({})'.format('|'.join(word_list)), 
                        flags=re.IGNORECASE, expand=False).str.lower().fillna('')
df['EXTRACT']

0      one
1      one
2    three
3    three
4      one
5      one
6      one
7         
8         
Name: EXTRACT, dtype: object

Each word in word_list is joined by the regex separator | and then passed to str.extract for regex pattern matching.

The re.IGNORECASE switch is turned on for case-insensitive comparisons, and the resultant matches are lowercased to match with your expected output.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!