Calculate cosine similarity given 2 sentence strings

前端 未结 6 2145
春和景丽
春和景丽 2020-11-22 14:00

From Python: tf-idf-cosine: to find document similarity , it is possible to calculate document similarity using tf-idf cosine. Without importing external libraries, are that

6条回答
  •  长情又很酷
    2020-11-22 14:41

    Thanks @vpekar for your implementation. It helped a lot. I just found that it misses the tf-idf weight while calculating the cosine similarity. The Counter(word) returns a dictionary which has the list of words along with their occurence.

    cos(q, d) = sim(q, d) = (q · d)/(|q||d|) = (sum(qi, di)/(sqrt(sum(qi2)))*(sqrt(sum(vi2))) where i = 1 to v)

    • qi is the tf-idf weight of term i in the query.
    • di is the tf-idf
    • weight of term i in the document. |q| and |d| are the lengths of q and d.
    • This is the cosine similarity of q and d . . . . . . or, equivalently, the cosine of the angle between q and d.

    Please feel free to view my code here. But first you will have to download the anaconda package. It will automatically set you python path in Windows. Add this python interpreter in Eclipse.

提交回复
热议问题