How to find that one text is similar to the part of another?

独自空忆成欢 提交于 2019-12-13 03:45:17

问题


We know how to make an assessment of the similarity of two whole texts for example by Word Mover’s Distance. How to find piece inside one text that is similar to another text?


回答1:


You could break the text into chunks – ideally by natural groupings, like sentences or paragraphs – then do pairwise comparisons of every chunk against every other, using some text-distance measure.

Word Mover's Distance can give impressive results, but it quite slow/expensive to calculate, especially for large texts and large numbers of pairwise comparisons. Other more-simple summary vectors for text – such as a simple average of all the text's word-vectors, or a text-vector learned from the text like 'Paragraph Vector' (aka Doc2Vec) – will be much faster and might be good enough, or at least be a good quick 1st pass to limit the number of candidate pairs you do something more expensive on.



来源:https://stackoverflow.com/questions/55609922/how-to-find-that-one-text-is-similar-to-the-part-of-another

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!