Python: Newspaper Module - Any way to pool getting articles straight from URLs?

后端 未结 4 1083
广开言路
广开言路 2021-01-01 04:06

I\'m using the Newspaper module for python found here.

In the tutorials, it describes how you can pool the building of different newspapers s.t. it generates them at

4条回答
  •  一个人的身影
    2021-01-01 04:28

    I was able to do this by creating a Source for each article URL. (disclaimer: not a python developer)

    import newspaper
    
    urls = [
      'http://www.baltimorenews.net/index.php/sid/234363921',
      'http://www.baltimorenews.net/index.php/sid/234323971',
      'http://www.atlantanews.net/index.php/sid/234323891',
      'http://www.wpbf.com/news/funeral-held-for-gabby-desouza/33874572',  
    ]
    
    class SingleSource(newspaper.Source):
        def __init__(self, articleURL):
            super(StubSource, self).__init__("http://localhost")
            self.articles = [newspaper.Article(url=url)]
    
    sources = [SingleSource(articleURL=u) for u in urls]
    
    newspaper.news_pool.set(sources)
    newspaper.news_pool.join()
    
    for s in sources:
      print s.articles[0].html
    

提交回复
热议问题