问题
I have a list of strings
list_ = ['abc', 'def', 'xyz']
And I have a df
with column CheckCol
, that I want to check if the values in CheckCol
contains any of the whole of substring of the list element.
If it does, I want to extract the original value into a new column NewCol
.
CheckCol
'a'
'ab'
'abc'
'abc-de'
Into
# What I want
CheckCol NewCol
'a'
'ab'
'abc' 'abc'
'abc-de' 'abc-de'
My following codes, however, only recognize the exact full string, but not the substrings I was looking for.
df['NewCol'] = np.where(df['CheckCol'].isin(list_), df['CheckCol'], '')
And gives
# What I get
CheckCol NewCol
'a'
'ab'
'abc' 'abc'
'abc-de'
Edits: Changed list
to list_
回答1:
I think the "easiest" implemented solution would be to use a regex-expression. In regex the pipe |
means or. By doing '|'.join(yourlist)
we get the substrings we want to check.
import pandas as pd
import numpy as np
list_ = ['abc', 'def', 'xyz']
df = pd.DataFrame({
'CheckCol': ['a','ab','abc','abd-def']
})
df['NewCol'] = np.where(df['CheckCol'].str.contains('|'.join(list_)), df['CheckCol'], '')
print(df)
# CheckCol NewCol
#0 a
#1 ab
#2 abc abc
#3 abd-def abd-def
NOTE: Your variable name list
was changed to list_
. Try to avoid using the reserved Python namespace.
来源:https://stackoverflow.com/questions/53327023/how-to-check-if-pandas-rows-contain-any-full-string-or-substring-of-a-list