问题
I have a raw html string that I want to convert to scrapy HTML response object so that I can use the selectors css
and xpath
, similar to scrapy's response
. How can I do it?
回答1:
First of all, if it is for debugging or testing purposes, you can use the Scrapy shell:
$ cat index.html
<div id="test">
Test text
</div>
$ scrapy shell index.html
>>> response.xpath('//div[@id="test"]/text()').extract()[0].strip()
u'Test text'
There are different objects available in the shell during the session, like response
and request
.
Or, you can instantiate an HtmlResponse class and provide the HTML string in body
:
>>> from scrapy.http import HtmlResponse
>>> response = HtmlResponse(url="my HTML string", body='<div id="test">Test text</div>', encoding='utf-8')
>>> response.xpath('//div[@id="test"]/text()').extract()[0].strip()
u'Test text'
回答2:
alecxe's answer is write, but this is the correct way to instantiate a Selector
from text
in scrapy:
>>> from scrapy.selector import Selector
>>> body = '<html><body><span>good</span></body></html>'
>>> Selector(text=body).xpath('//span/text()').get()
'good'
来源:https://stackoverflow.com/questions/27323740/scrapy-convert-html-string-to-htmlresponse-object