scrapy getting values from multiple sites

蓝咒 提交于 2019-12-13 02:59:00

问题


I'm trying to pass a value from a function.

i looked up the docs and just didn't understand it. ref:

def parse_page1(self, response):
    item = MyItem()
    item['main_url'] = response.url
    request = scrapy.Request("http://www.example.com/some_page.html",
                             callback=self.parse_page2)
    request.meta['item'] = item
    yield request

def parse_page2(self, response):
    item = response.meta['item']
    item['other_url'] = response.url
    yield item

here is a psudo code of what i want to achive:

import scrapy

class GotoSpider(scrapy.Spider):
    name = 'goto'
    allowed_domains = ['first.com', 'second.com]
    start_urls = ['http://first.com/']

def parse(self, response):
    name = response.xpath(...)
    price = scrapy.Request(second.com, callback = self.parse_check)
    yield(name, price)


def parse_check(self, response):
    price = response.xpath(...)
    return price

回答1:


This is how you can pass any value, link etc to other methods:

import scrapy

class GotoSpider(scrapy.Spider):
    name = 'goto'
    allowed_domains = ['first.com', 'second.com']
    start_urls = ['http://first.com/']

    def parse(self, response):
        name = response.xpath(...)
        link = response.xpath(...)  # link for second.com where you may find the price
        request = scrapy.Request(url=link, callback = self.parse_check)
        request.meta['name'] = name
        yield request

    def parse_check(self, response):
        name = response.meta['name']
        price = response.xpath(...)
        yield {"name":name,"price":price} #Assuming that in your "items.py" the fields are declared as name, price


来源:https://stackoverflow.com/questions/46258343/scrapy-getting-values-from-multiple-sites

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!