Scrapy: scraping a list of links

前端未结

关注

 1  1870

离开以前 2021-01-15 12:03

This question is somewhat a follow-up of this question that I asked previously.

I am trying to scrape a website which contains some links on the first page. Somethin

1条回答

温柔的废话 (楼主)

2021-01-15 12:24

I'm assuming that the urls you want to follow lead to pages with the same or similar structure. If that's the case, you should do something like this:

from scrapy.contrib.spiders import CrawlSpider
from scrapy.selector import Selector
from scrapy.http import Request

class YourCrawler(CrawlSpider):

   name = 'yourCrawler'
   allowed_domains = 'domain.com'
   start_urls = ["htttp://www.domain.com/example/url"]


   def parse(self, response):
      #parse any elements you need from the start_urls and, optionally, store them as Items.
      # See http://doc.scrapy.org/en/latest/topics/items.html

      s = Selector(response)
      urls = s.xpath('//div[@id="example"]//a/@href').extract()
      for url in urls:
         yield Request(url, callback=self.parse_following_urls, dont_filter=True)


   def parse_following_urls(self, response):
       #Parsing rules go here

Otherwise, if urls you want to follow lead to pages with different structure, then you can define specific methods for them (something like parse1, parse2, parse3...).

0 讨论(0)