If I have a frame like this
frame = pd.DataFrame({\'a\' : [\'the cat is blue\', \'the sky is green\', \'the dog is black\']})
and I want to
After going through the comments of the accepted answer of extracting the string, this approach can also be tried.
frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})
frame
a
0 the cat is blue
1 the sky is green
2 the dog is black
Let us create our list which will have strings that needs to be matched and extracted.
mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)
Now let create a function which will be responsible to find and extract the substring.
import re
def pattern_searcher(search_str:str, search_list:str):
search_obj = re.search(search_list, search_str)
if search_obj :
return_str = search_str[search_obj.start(): search_obj.end()]
else:
return_str = 'NA'
return return_str
We will use this function with pandas.DataFrame.apply
frame['matched_str'] = frame['a'].apply(lambda x: pattern_searcher(search_str=x, search_list=pattern))
Result :
a matched_str
0 the cat is blue cat
1 the sky is green NA
2 the dog is black dog