News clustering

前端 未结 3 810
深忆病人
深忆病人 2021-01-30 18:34

How does Google News and Techmeme cluster news items that are similar? Are there any well know algorithm that is used to achieve this?

Appreciate your help.

Than

3条回答
  •  别跟我提以往
    2021-01-30 19:18

    One fairly common way to cluster text based on content is to use Principle Component Analysis on the word vectors (a vector of n dimensions where each possible word represents one dimension and the magnitude in each direction, for each vector, is the number occurrences of the word in that particular article), followed by just a simple clustering such as K-Means.

提交回复
热议问题