Comparing data using lookup and output only longest phrase in the data using python?

问题

I have a csv which contains "KKR" map to "MBI" data. I want to perform a lookup from a user given data to extract the longest matched phrase from KKR (ignore small phrase if it contains words of long phrase)

#os.chdir("kkr_lookup")
data = pd.read_csv("KKR_MBI_MAP.csv")

dfData = pd.DataFrame(data)

dfVerbatim = pd.DataFrame()

dataVerbatim = {'verbatim': ['She experienced skin allergy and hair loss after using it for 2-3 weeks']}
dfVerbatim = pd.DataFrame(dataVerbatim, columns = ['verbatim'])


for index, frame in dfData.iterrows():
    if pd.notnull(frame['KKR']) & dfVerbatim['verbatim'].str.contains(frame['KKR'], case=False).bool() :
        k=(frame['MBI']).lower()
        l=(frame['KKR']).lower()
        print("MBI:",l)
        #print("MBI:",k)

The code gives output as:

allergy
hair loss
skin allergy

But I need out as:

skin allergy
hair loss

Here I have coded to extract the terms from user input data. But it extracts both "allergy" and "skin allergy" whereas I need only "skin allergy" here. Please help me...

回答1:

import re

list_of_strings=["skin allergy","hair loss","allergy","hair", "skin"]
sentence="She experienced skin allergy and hair loss after using it for 2-3 weeks"
pattern = re.compile(r"(\b" + "|".join(list_of_strings) + r")\b")

m = pattern.findall(sentence)
print(m)

来源：https://stackoverflow.com/questions/58836195/comparing-data-using-lookup-and-output-only-longest-phrase-in-the-data-using-pyt

标签

python

pattern-matching

lookup

string-matching

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!