Simple implementation of N-Gram, tf-idf and Cosine similarity in Python

前端 未结 5 1703
逝去的感伤
逝去的感伤 2020-11-28 17:58

I need to compare documents stored in a DB and come up with a similarity score between 0 and 1.

The method I need to use has to be very simple. Implementing a vanil

5条回答
  •  野趣味
    野趣味 (楼主)
    2020-11-28 18:23

    For our Information Retrieval Course, we use some code that is written by our professor in Java. Sorry, no python port. "It is being released for educational and research purposes only under the GNU General Public License."

    You can check out the documentation http://userweb.cs.utexas.edu/~mooney/ir-course/doc/

    But more specifically check out: http://userweb.cs.utexas.edu/users/mooney/ir-course/doc/ir/vsr/HashMapVector.html

    You can download it http://userweb.cs.utexas.edu/users/mooney/ir-course/

提交回复
热议问题