Algorithm to find articles with similar text

后端未结

关注

 15  2461

梦谈多话 2020-11-28 18:10

I have many articles in a database (with title,text), I\'m looking for an algorithm to find the X most similar articles, something like Stack Overflow\'s \"Related Questions

15条回答

生来不讨喜 (楼主)

2020-11-28 18:45
you can use the following
1. Minhash/LSH https://en.wikipedia.org/wiki/MinHash
(also see: http://infolab.stanford.edu/~ullman/mmds/book.pdf Minhash chapter), also see http://ann-benchmarks.com/ for state of the art
1. collaborative filtering if you have info of users interaction with articles (clicks/likes/views): https://en.wikipedia.org/wiki/Collaborative_filtering
2. word2vec or similar embeddings to compare articles in 'semantic' vector space: https://en.wikipedia.org/wiki/Word2vec
3. Latent semantic analysis: https://en.wikipedia.org/wiki/Latent_semantic_analysis
4. Use Bag-of-words and apply some distance measure, like Jaccard coefficient to compute set similarity https://en.wikipedia.org/wiki/Jaccard_index, https://en.wikipedia.org/wiki/Bag-of-words_model
0 讨论(0)

查看其它15个回答
发布评论:

提交评论
- 加载中...