问题
This does not need to necessarily be done in pandas but it would be nice if it could be done in pandas.
Say I have a list or Series of strings:
['XXY8779','0060-19','McChicken','456728']
And I have another list or Series which contains sub-strings of the original like so:
['60-19','Chicken','8779','1124231','92871','johnson']
And this would return something like:
[True, True, True, False]
I'm looking for a match that is something like:
^[a-zA-Z0-9.,$;]+ < matching string in other list >
So in other words, something that starts with 1 or more of any character but the rest matches exactly with one of the strings in my other list.
Does anyone have any ideas on the best way to accomplish this?
Thanks!
回答1:
Use str.contains
'|'.join(s2)
produces a string that tells contains
to use regex
and use or logic.
s1 = pd.Series(['XXY8779', '0060-19', 'McChicken', '456728'])
s2 = ['60-19', 'Chicken', '8779', '1124231', '92871', 'johnson']
s1.str.contains('|'.join(s2))
0 True
1 True
2 True
3 False
dtype: bool
回答2:
Since it's always at the end you can use .str.endswith and any
to short-circuit the logic. s1
and s2
are just your lists above (but it also works if they are pd.Series
)
[any(i.endswith(j) for j in s2) for i in s1]
#[True, True, True, False]
You can then convert it to a series with pd.Series
or just use that list as a mask as-is.
来源:https://stackoverflow.com/questions/51085069/pandas-find-super-string-in-one-series-from-another-series