问题
I have a csv which contains "KKR" map to "MBI" data. I want to perform a lookup from a user given data to extract the longest matched phrase from KKR (ignore small phrase if it contains words of long phrase)
#os.chdir("kkr_lookup")
data = pd.read_csv("KKR_MBI_MAP.csv")
dfData = pd.DataFrame(data)
dfVerbatim = pd.DataFrame()
dataVerbatim = {'verbatim': ['She experienced skin allergy and hair loss after using it for 2-3 weeks']}
dfVerbatim = pd.DataFrame(dataVerbatim, columns = ['verbatim'])
for index, frame in dfData.iterrows():
if pd.notnull(frame['KKR']) & dfVerbatim['verbatim'].str.contains(frame['KKR'], case=False).bool() :
k=(frame['MBI']).lower()
l=(frame['KKR']).lower()
print("MBI:",l)
#print("MBI:",k)
The code gives output as:
allergy
hair loss
skin allergy
But I need out as:
skin allergy
hair loss
Here I have coded to extract the terms from user input data. But it extracts both "allergy" and "skin allergy" whereas I need only "skin allergy" here. Please help me...
回答1:
import re
list_of_strings=["skin allergy","hair loss","allergy","hair", "skin"]
sentence="She experienced skin allergy and hair loss after using it for 2-3 weeks"
pattern = re.compile(r"(\b" + "|".join(list_of_strings) + r")\b")
m = pattern.findall(sentence)
print(m)
来源:https://stackoverflow.com/questions/58836195/comparing-data-using-lookup-and-output-only-longest-phrase-in-the-data-using-pyt