I am new on Scrapy, and I am sorry if this question is trivial. I have read the document on Scrapy from official webpage. And while I look through the document, I met this example:
import scrapy
from myproject.items import MyItem
class MySpider(scrapy.Spider):
name = ’example.com’
allowed_domains = [’example.com’]
start_urls = [
’http://www.example.com/1.html’,
’http://www.example.com/2.html’,
’http://www.example.com/3.html’,
]
def parse(self, response):
for h3 in response.xpath(’//h3’).extract():
yield MyItem(title=h3)
for url in response.xpath(’//a/@href’).extract():
yield scrapy.Request(url, callback=self.parse)
I know, the parse method must return an item or/and request, but where are these return values returned to?
One is an item and the other is request, I think these two type would be handled differently and in the case of CrawlSpider
, it has Rule with callback. What about this callback's return value? where to ? same as parse()
?
I am very confused on Scrapy procedure, even i read the document....
According to the documentation:
The parse() method is in charge of processing the response and returning scraped data (as Item objects) and more URLs to follow (as Request objects).
In other words, returned/yielded items and requests are handled differently, items are being handed to the item pipelines and item exporters, but requests are being put into the Scheduler
which pipes the requests to the Downloader
for making a request and returning a response. Then, the engine receives the response and gives it to the spider for processing (to the callback
method).
The whole data-flow process is described in the Architecture Overview page in a very detailed manner.
Hope that helps.
来源:https://stackoverflow.com/questions/26195982/python-scrapy-parse-function-where-is-the-return-value-returned-to