Scrapy: scraping a list of links

梦想与她 提交于 2019-12-01 11:34:46

I'm assuming that the urls you want to follow lead to pages with the same or similar structure. If that's the case, you should do something like this:

from scrapy.contrib.spiders import CrawlSpider
from scrapy.selector import Selector
from scrapy.http import Request

class YourCrawler(CrawlSpider):

   name = 'yourCrawler'
   allowed_domains = 'domain.com'
   start_urls = ["htttp://www.domain.com/example/url"]


   def parse(self, response):
      #parse any elements you need from the start_urls and, optionally, store them as Items.
      # See http://doc.scrapy.org/en/latest/topics/items.html

      s = Selector(response)
      urls = s.xpath('//div[@id="example"]//a/@href').extract()
      for url in urls:
         yield Request(url, callback=self.parse_following_urls, dont_filter=True)


   def parse_following_urls(self, response):
       #Parsing rules go here

Otherwise, if urls you want to follow lead to pages with different structure, then you can define specific methods for them (something like parse1, parse2, parse3...).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!