Scraping text without javascript code using scrapy

后端未结

关注

 3  1577

野性不改 2020-12-18 07:03

I\'m currently setting up a bunch of spiders using scrapy. These spiders are supposed to extract only text (articles, forum posts, paragraphs, etc) from the

3条回答

悲&欢浪女 (楼主)

2020-12-18 07:38

Try utils functions from w3lib.html:

from w3lib.html import remove_tags, remove_tags_with_content

input = hxs.select('//div[@id="content"]').extract()
output = remove_tags(remove_tags_with_content(input, ('script', )))

0 讨论(0)

查看其它3个回答