Algorithm to find articles with similar text

后端 未结 15 2461
梦谈多话
梦谈多话 2020-11-28 18:10

I have many articles in a database (with title,text), I\'m looking for an algorithm to find the X most similar articles, something like Stack Overflow\'s \"Related Questions

15条回答
  •  生来不讨喜
    2020-11-28 18:45

    you can use the following

    1. Minhash/LSH https://en.wikipedia.org/wiki/MinHash

    (also see: http://infolab.stanford.edu/~ullman/mmds/book.pdf Minhash chapter), also see http://ann-benchmarks.com/ for state of the art

    1. collaborative filtering if you have info of users interaction with articles (clicks/likes/views): https://en.wikipedia.org/wiki/Collaborative_filtering

    2. word2vec or similar embeddings to compare articles in 'semantic' vector space: https://en.wikipedia.org/wiki/Word2vec

    3. Latent semantic analysis: https://en.wikipedia.org/wiki/Latent_semantic_analysis

    4. Use Bag-of-words and apply some distance measure, like Jaccard coefficient to compute set similarity https://en.wikipedia.org/wiki/Jaccard_index, https://en.wikipedia.org/wiki/Bag-of-words_model

提交回复
热议问题