Dataset link for Text Summarization?

非 Y 不嫁゛ 提交于 2019-12-22 08:08:15

问题


Anyone have dataset download link for text summarization like DUC 2007 or TREC? Please, help me.


回答1:


You can use http://archive.ics.uci.edu/ml/datasets/Legal+Case+Reports for extraction based text summarization approach. It contains catchPhrase, which can be act as selected sentence for training. But catchphrase may not be as much appropriate.




回答2:


You can access DUC dataset after completing some organization and individual agreements ..kindly refer http://www-nlpir.nist.gov/projects/duc/data.html for more information




回答3:


You can write a sitemap crawler in scrapy for

  • buzzfeed
  • huffingtonpost
  • deadspin
  • gizmodo

That may give you around 1.45 million abstract and articles.

Also you can check this harvardnlp sent summary dataset and CNN Dailymail dataset, which can give some articles story.

Warning: As all these are different sources, their way of writing may differ.




回答4:


You could try to use "BBC News Summary" dataset from Kaggle: link

Inside you will find two folders: with original articles and with their summaries. There are 5 categories of news: business, entertainment, politics, sport, tech. It's around 500 article-summary couples for each of those topics.



来源:https://stackoverflow.com/questions/14959104/dataset-link-for-text-summarization

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!