Algorithm to find articles with similar text

后端未结

关注

 15  2458

I have many articles in a database (with title,text), I\'m looking for an algorithm to find the X most similar articles, something like Stack Overflow\'s \"Related Questions

相关标签:

15条回答

闹比i

2020-11-28 18:47

If you are looking for words that wound alike, you could convert to soundex and the the soundex words to match ... worked for me

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦如初夏

2020-11-28 18:48

I tried some method but none works well.One may get a relatively satified result like this: First: get a Google SimHash code for every paragraph of all text and store it in databse. Second: Index for the SimHash code. Third: process your text to be compared as above,get a SimHash code and search all the text by SimHash index which apart form a Hamming distance like 5-10. Then compare simility with term vector. This may works for big data.

0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2020-11-28 18:52

Seconding the Lucene suggestion for full-text, but note that java is not a requirement; a .NET port is available. Also see the main Lucene page for links to other projects, including Lucy, a C port.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3