Shorten a text and only keep important sentences

好久不见. 提交于 2019-12-03 16:46:45

Usually you want to keep the sentences that have words that are more unique to that article.

That is, the more "generic" the sentence is, the less it describes this particular article.

The normal way to do this is Bayesian analysis much like a spam-filter. First determine which words in the entire article appear more often than you'd expect, then find the sentences that feature those words.

This is a hot research topic in Computational Linguistics. The shallow approach, using Bayesian Filtering, is not likely to yield perfect results - but you probably don't need perfect results anyway.

In CL, the 80-20 rule quickly becomes the 95-5 rule, so if you are content with what you can achieve through shallow methods, skip this answer.

If you want to see whether you can improve on your results, you could try to find some better resources. The task you're referring to is called 'text summarization' in the research community, and it has its own web page which is hopelessly outdated. Mani and Maybury (1999) is probably a good overview (I haven't read it myself,) but also quite antiquated. More recent is Martin Hassels dissertation on the topic, and also quite exhaustive, including language-independent (read: statistical, i.e. shallow) methods.

As always, Google will be able to help you, too. Just search for text summarization.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!