How can I get all the plain text from a website with Scrapy?

前端 未结 3 1445
夕颜
夕颜 2020-11-30 03:24

I would like to have all the text visible from a website, after the HTML is rendered. I\'m working in Python with Scrapy framework. With xpath(\'//body//text()\')

3条回答
  •  刺人心
    刺人心 (楼主)
    2020-11-30 04:03

    The xpath('//body//text()') doesn't always drive dipper into the nodes in your last used tag(in your case body.) If you type xpath('//body/node()/text()').extract() you will see the nodes which are in you html body. You can try xpath('//body/descendant::text()').

提交回复
热议问题