Following hyperlink and “Filtered offsite request”

前端 未结 2 1717
悲哀的现实
悲哀的现实 2020-12-16 13:26

I know that there are several related threads out there, and they have helped me a lot, but I still can\'t get all the way. I am at the point where running the code doesn\'t

相关标签:
2条回答
  • 2020-12-16 13:44

    try make this dont_filter=true

    yield Request(url=url2, meta{'address':hxs.select("id('searchresult')/tr/td[1]/a[@href]/text()").extract()}, callback=self.parse2,dont_filter=True)

    0 讨论(0)
  • 2020-12-16 13:56

    You need to modify your yielded Request in parse to use parse2 as its callback.

    EDIT: allowed_domains shouldn't include the http prefix eg:

    allowed_domains = ["boliga.dk"]
    

    Try that and see if your spider still runs correctly instead of leaving allowed_domains blank

    0 讨论(0)
提交回复
热议问题