How to scrape newspaper articles from website using selenium and beautifulsoup in python?

后端 未结 3 964
南笙
南笙 2021-01-16 23:44

I am trying to collect the date, title, and content from the newspaper (the new york times).

Date and title I got, but the full article I couldn\'t able to. Below i

3条回答
  •  孤独总比滥情好
    2021-01-17 00:26

    In order to scrape Newspaper Articles you can simply use goose library which is simple and elegant. This results you in cleaned article text and title too. For Date you can try using BeautifulSoup.

    from goose import Goose
    from requests import get
    
    response = get('http://www.nytimes.com/2015/05/19/health/study-finds-dense-breast-tissue-isnt-always-a-high-cancer-risk.html?src=me&ref=general')
    extractor = Goose()
    article = extractor.extract(raw_html=response.content)
    text = article.cleaned_text
    title = article.title
    

提交回复
热议问题