how to find similarity between two question even though the words are differentiate

青春壹個敷衍的年華 提交于 2019-12-10 12:35:00

问题


is there is any way to find the meaning of the string is similar or not,,, even though the words in the string are differentiated

Till now i tried fuzzy-wuzzy,levenstein distance,cosine similarity to match the string but all are matches the words not the meaning of the words

Str1 = "what are types of negotiation"
Str2 = "what are advantages of negotiation"
Str3 = "what are categories of negotiation"
Ratio = fuzz.ratio(Str1.lower(),Str2.lower())
Partial_Ratio = fuzz.partial_ratio(Str1.lower(),Str2.lower())
Token_Sort_Ratio = fuzz.token_sort_ratio(Str1,Str2)
Ratio1 = fuzz.ratio(Str1.lower(),Str3.lower())
Partial_Ratio1 = fuzz.partial_ratio(Str1.lower(),Str3.lower())
Token_Sort_Ratio1 = fuzz.token_sort_ratio(Str1,Str3)
print("fuzzywuzzy")
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str2," ",Partial_Ratio)
print(Str1," ",Str2," ",Token_Sort_Ratio)
print(Str1," ",Str3," ",Ratio1)
print(Str1," ",Str3," ",Partial_Ratio1)
print(Str1," ",Str3," ",Token_Sort_Ratio1)
print("levenshtein ratio")
Ratio = levenshtein_ratio_and_distance(Str1,Str2,ratio_calc = True)
Ratio1 = levenshtein_ratio_and_distance(Str1,Str3,ratio_calc = True)
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str3," ",Ratio)

output:
fuzzywuzzy
what are types of negotiation   what are advantages of negotiation   86
what are types of negotiation   what are advantages of negotiation   76
what are types of negotiation   what are advantages of negotiation   73
what are types of negotiation   what are categories of negotiation   86
what are types of negotiation   what are categories of negotiation   76
what are types of negotiation   what are categories of negotiation   73
levenshtein ratio
what are types of negotiation   what are advantages of negotiation               
0.8571428571428571
what are types of negotiation   what are categories of negotiation       
0.8571428571428571



expected output:
"what are the types of negotiation skill?"
"what are the categories in negotiation skill?"
output:similar
"what are the types of negotiation skill?"
"what are the advantages of negotiation skill?"
output:not similar

回答1:


You want to score the semantic similarity of two strings.

Fuzzy-wuzzy and Levenshtein distance score only characters distance.

You need to account semantic information. So, you need a semantic representation for your string.

Maybe a simple but effective method consists in:

  1. Compute two vectors which represent your two strings, using pretrained word embeddings for your language (eg FastText - get_sentence_vector https://fasttext.cc/docs/en/python-module.html#model-object)
  2. Compute cosine similarity between two vectors (1: equal strings; 0: really different strings).

Surely, there are better and more complex methods. To deeply understand this subject, I suggest this post (https://medium.com/@adriensieg/text-similarities-da019229c894), which is rich of explanations and code implementations.



来源:https://stackoverflow.com/questions/58076861/how-to-find-similarity-between-two-question-even-though-the-words-are-differenti

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!