How to check if Pandas rows contain any full string or substring of a list?

无人久伴 提交于 2021-01-28 04:51:30

问题


I have a list of strings

list_ = ['abc', 'def', 'xyz']

And I have a df with column CheckCol, that I want to check if the values in CheckCol contains any of the whole of substring of the list element.

If it does, I want to extract the original value into a new column NewCol.

CheckCol
'a'
'ab'
'abc'
'abc-de'

Into

# What I want
CheckCol        NewCol
'a'
'ab'
'abc'           'abc'
'abc-de'       'abc-de'

My following codes, however, only recognize the exact full string, but not the substrings I was looking for.

df['NewCol'] = np.where(df['CheckCol'].isin(list_), df['CheckCol'], '')

And gives

# What I get
CheckCol        NewCol
'a'
'ab'
'abc'           'abc'
'abc-de'       

Edits: Changed list to list_


回答1:


I think the "easiest" implemented solution would be to use a regex-expression. In regex the pipe | means or. By doing '|'.join(yourlist) we get the substrings we want to check.

import pandas as pd
import numpy as np

list_ = ['abc', 'def', 'xyz']

df = pd.DataFrame({
    'CheckCol': ['a','ab','abc','abd-def']
})

df['NewCol'] = np.where(df['CheckCol'].str.contains('|'.join(list_)), df['CheckCol'], '')

print(df)

#  CheckCol   NewCol
#0        a         
#1       ab         
#2      abc      abc
#3  abd-def  abd-def

NOTE: Your variable name list was changed to list_. Try to avoid using the reserved Python namespace.



来源:https://stackoverflow.com/questions/53327023/how-to-check-if-pandas-rows-contain-any-full-string-or-substring-of-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!