Scraping text without javascript code using scrapy

后端 未结 3 1577
野性不改
野性不改 2020-12-18 07:03

I\'m currently setting up a bunch of spiders using scrapy. These spiders are supposed to extract only text (articles, forum posts, paragraphs, etc) from the

3条回答
  •  悲&欢浪女
    2020-12-18 07:38

    Try utils functions from w3lib.html:

    from w3lib.html import remove_tags, remove_tags_with_content
    
    input = hxs.select('//div[@id="content"]').extract()
    output = remove_tags(remove_tags_with_content(input, ('script', )))
    

提交回复
热议问题