问题
I have a table column, that contain description of the treatment done to resolve an issue, this text contian keywords.
In other list, I have the list of categories, with the different keywords that helps to identify it.
For example:
Category | keywords
AAAA | keyword1
AAAA | keyword2 and keyword3
AAAA | keyword3 and not keyword4
BBBB | keyword4
BBBB | keyword5 and keyword6
BBBB | keyword7
how can fill the category column in my previous table (that contain the description), using the keywords in it.
For example:
Description | category
this free text keyword1 is done | AAAA
free sample2 keyword4 keyword3 | BBBB
the language I'm using is python,
I found a similar case, but using Excel: https://exceljet.net/formula/categorize-text-with-keywords
kIND REGARDS
回答1:
I would start by creating a list of tuples where the first element is the category and the second is a dictionary with list of keywords that should be included/excluded from the description. For example
keyword_tuple = [('AAAA', {'in': ['kwrd1'], 'out':[]}),
('AAAA', {'in': ['kwrd2', 'kwrd3'], 'out': []),
('AAAA', {'in': ['kwrd3'], 'out': ['kwrd4']}),
('BBBB', {'in': ['kwrd4'], 'out': [])]
After you have initialized correctly your keyword_tuple
you can loop through your descriptions list to determine to which category they belong. Let's store the results in a list of tuples called result_tuple
where the first element is the description and the second the corresponding category.
result_tuple = []
for description in description_list:
# Find categories that satisfy the include condition
categories_in = [cat[0] for cat in keyword_tuple if all([kw in description for kw in cat[1]['in']])]
# Find categories that satisfy the exclude condition
categories_out = [cat[0] for cat in keyword_tuple if all([kw not in description for kw in cat[1]['out']])]
# Find the categories that satisfy both
# If there are multiple categories satisfying the condition, you need to come with a decision rule
categories = list(set(categories_in).intersection(categories_out))
# Append to the result list (Takes the first that is satisfied)
if len(categories) > 0:
category = categories[0]
else:
category = 'NO CATEGORY'
result_tuple.append(description, category)
来源:https://stackoverflow.com/questions/50388822/categorise-text-in-column-using-keywords