scrapy: request url must be str or unicode got list

可紊 提交于 2020-01-05 12:16:20

问题


I cant quite figure out what's wrong with this code. I would like to scrape the first page, and then, for each link on that page, go to the second page to extract the item description. When i run the code below, i get: exception.TypeError: url must be str or unicode, got list. here is my code:

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.item import Item, Field
from scrapy.contrib.loader import ItemLoader
from scrapy.contrib.loader.processor import MapCompose,  Join
from scrapy.contrib.loader import XPathItemLoader
from my.items import myItem

class mySpider(Spider):
    name = "my"
    allowed_domains = ["my.com"]
    start_urls = ['http://sjg.my.com/cf_jy.cfm']

    def parse(self, response):
        s = Selector(response)
        rows = s.xpath('//table[@class="table-order"]//tr')
        for row in rows:
            l = XPathItemLoader(item=myItem(), selector=row)
            l.default_input_processor = MapCompose(unicode.strip)
            l.default_output_processor = Join()
            l.add_xpath('title', './/a[contains(@href,"cf_jy.cfm?hu_pg")]/text()')
            l.add_xpath('url1', './/a/@href')
            l.add_xpath('dates', './/td[4]/text()')
            l.add_xpath('rev', './/td[@align="right"]/text()')
            l.add_xpath('typ', './/td[3]/text()')
            l.add_value('name', u'gsf')
            request = Request(l.get_xpath('.//a/@href'), callback=self.parse_link,meta={'l':l})
            yield request      

    def parse_link(self, response):
        l = response.meta["l"]
        s = Selector(response)
        q = s.xpath("//div[@class='content-main']/td[@class='text']/p/text()").extract()
        l.add_value('description',q)
        yield l.load_item()

Thanks in advance.


回答1:


According to Scrapy Request's first argument takes string. But in your code l.get_xpath('.//a/@href') is returning a list. So try to send only string to Request's url.

For example:

Request("Some_link_goes_here", callback=self.parse_link,meta={'l':l})


来源:https://stackoverflow.com/questions/24906897/scrapy-request-url-must-be-str-or-unicode-got-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!