What tried and true algorithms for suggesting related articles are out there?

前端 未结 5 2140
遇见更好的自我
遇见更好的自我 2020-12-07 10:28

Pretty common situation, I\'d wager. You have a blog or news site and you have plenty of articles or blags or whatever you call them, and you want to, at the bottom of each,

5条回答
  •  半阙折子戏
    2020-12-07 11:26

    You should read the book "Programming Collective Intelligence: Building Smart Web 2.0 Applications" (ISBN 0596529325)!

    For some method and code: First ask yourself, whether you want to find direct similarities based on word matches, or whether you want to show similar articles that may not directly relate to the current one, but belong to the same cluster of articles.

    See Cluster analysis / Partitional clustering.

    A very simple (but theoretical and slow) method for finding direct similarities would be:

    Preprocess:

    1. Store flat word list per article (do not remove duplicate words).
    2. "Cross join" the articles: count number of words in article A that match same words in article B. You now have a matrix int word_matches[narticles][narticles] (you should not store it like that, similarity of A->B is same as B->A, so a sparse matrix saves almost half the space).
    3. Normalize the word_matches counts to range 0..1! (find max count, then divide any count by this) - you should store floats there, not ints ;)

    Find similar articles:

    1. select the X articles with highest matches from word_matches

提交回复
热议问题